�� Information Theory Society

Submitted by admin on Mon, 10/28/2024 - 01:24

Presents the table of contents for this issue of the publication.

Corrections to “Generalization Bounds via Information Density and Conditional Information Density” [Nov 20 824-839]

Submitted by admin on Mon, 10/28/2024 - 01:24

An error in the proof of the data-dependent tail bounds on the generalization error presented in Hellström and Durisi (2020) is identified, and a correction is proposed. Furthermore, we note that the absolute continuity requirements in Hellström and Durisi (2020) need to be strengthened to avoid measurability issues.

Guest Editorial for Special Issue on Coded Computing

Submitted by admin on Mon, 10/28/2024 - 01:24

Computing is the next frontier for information theory. Intellectually, the goal of coded computing has been of interest from the days of von Neumann and Shannon. von Neumann examined this issue in his 1956 paper “Probabilistic Logics and the Synthesis of Reliable Organisms From Unreliable Components,” which was in turn motivated intellectually by Shannon’s 1948 paper, and by the application of understanding reliability of seemingly noisy biological systems.

Quantization of Distributed Data for Learning

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates.

Bivariate Polynomial Coding for Efficient Distributed Matrix Multiplication

Submitted by admin on Mon, 10/28/2024 - 01:24

Coded computing is an effective technique to mitigate “stragglers” in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the computation time depend only on the fastest workers. However, these schemes completely ignore the work done by the straggling workers resulting in a waste of computational resources.

Communication-Efficient and Byzantine-Robust Distributed Learning With Error Feedback

Submitted by admin on Mon, 10/28/2024 - 01:24

We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al. (2018), which uses more complicated schemes (coordinate-wise median, trimmed mean).

Coded Sequential Matrix Multiplication for Straggler Mitigation

Submitted by admin on Mon, 10/28/2024 - 01:24

In this work, we consider a sequence of $J$ matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For $i\in \{1,2,\ldots,J\}$ , job- $i$ begins in round- $i$ and has to be completed by round- $(i+T)$ . In order to provide resiliency against slow workers (stragglers), previous works focus on coding across workers, which is the special case of $T=0$ . We propose here two schemes with $T > 0$ , which allow for coding across workers as well as the dimension of time.

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

Submitted by admin on Mon, 10/28/2024 - 01:24

In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov’s momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion.

Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication

Submitted by admin on Mon, 10/28/2024 - 01:24

We propose two coding schemes for distributed matrix multiplication in the presence of stragglers. These coding schemes are adaptations of Luby Transform (LT) codes and Raptor codes to distributed matrix multiplication and are termed Factored LT (FLT) codes and Factored Raptor (FRT) codes. We show that all nodes in the Tanner graph of a randomly sampled code have a tree-like neighborhood with high probability. This ensures that the density evolution analysis gives a reasonable estimate of the average recovery threshold of FLT codes.

Compressing Gradients by Exploiting Temporal Correlation in Momentum-SGD

Submitted by admin on Mon, 10/28/2024 - 01:24

An increasing bottleneck in decentralized optimization is communication. Bigger models and growing datasets mean that decentralization of computation is important and that the amount of information exchanged is quickly growing. While compression techniques have been introduced to cope with the latter, none has considered leveraging the temporal correlations that exist in consecutive vector updates. An important example is distributed momentum-SGD where temporal correlation is enhanced by the low-pass-filtering effect of applying momentum.

��

Table of contents

Corrections to “Generalization Bounds via Information Density and Conditional Information Density” [Nov 20 824-839]

Guest Editorial for Special Issue on Coded Computing

Quantization of Distributed Data for Learning

Bivariate Polynomial Coding for Efficient Distributed Matrix Multiplication

Communication-Efficient and Byzantine-Robust Distributed Learning With Error Feedback

Coded Sequential Matrix Multiplication for Straggler Mitigation

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication

Compressing Gradients by Exploiting Temporal Correlation in Momentum-SGD

Subscribe to Our Mailing List