Structured updates (low-rank or masked) and sketched updates (quantized, rotated, subsampled) reduce uplink communication in federated learning by up to two orders of magnitude on convolutional and recurrent networks.
Learning multiple layers of features from tiny images
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.
A two-level DMFT tracks bulk and outlier spectral dynamics in wide networks, predicting width-consistent outlier growth and hyperparameter transfer under muP scaling for deep linear nets while noting bulk restructuring for large-output tasks.
NegBio-VAE introduces negative binomial latents with dispersion to handle overdispersion in discrete VAE models, yielding better reconstruction, generation, and downstream representations than Poisson VAE baselines.
XferNAS transfers knowledge across neural architecture searches to reduce search time by a factor of 33 on CIFAR-10/100 while achieving new records of 1.99% and 14.06% error.
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.
citing papers explorer
-
Federated Learning: Strategies for Improving Communication Efficiency
Structured updates (low-rank or masked) and sketched updates (quantized, rotated, subsampled) reduce uplink communication in federated learning by up to two orders of magnitude on convolutional and recurrent networks.
-
Hyperbolic Concept Bottleneck Models
HypCBM reformulates concept activations as geometric containment in hyperbolic space to produce sparse, hierarchy-aware signals that match Euclidean models trained on 20 times more data.
-
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
A two-level DMFT tracks bulk and outlier spectral dynamics in wide networks, predicting width-consistent outlier growth and hyperparameter transfer under muP scaling for deep linear nets while noting bulk restructuring for large-output tasks.
-
Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling
NegBio-VAE introduces negative binomial latents with dispersion to handle overdispersion in discrete VAE models, yielding better reconstruction, generation, and downstream representations than Poisson VAE baselines.
-
XferNAS: Transfer Neural Architecture Search
XferNAS transfers knowledge across neural architecture searches to reduce search time by a factor of 33 on CIFAR-10/100 while achieving new records of 1.99% and 14.06% error.
-
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.