SMA uses a submodular mutual information objective on data sets to deliver competitive zero-shot classification and retrieval performance on CLIP benchmarks with only tens of thousands of samples, orders of magnitude fewer than standard approaches.
Learning multiple layers of features from tiny images
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A hierarchical spiking transformer using Q-K attention achieves 85.65% top-1 accuracy on ImageNet-1K, the first direct-trained SNN to exceed 85%.
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
Presents a coded distributed black-box optimization framework resilient to stragglers via error-correcting codes on search directions, extending evolution strategies, with experiments showing faster runtimes on adversarial attacks.
Gradient-based optimization learns symmetric Gaussian mixture modes for 2-bit fixed-point weight quantization, claiming state-of-the-art performance and self-adaptive weights.
GNC convolves stochastic gradient noise to smooth sharp minima in large-batch SGD, outperforming isotropic noise for better generalization in distributed deep learning.
citing papers explorer
-
SMA: Submodular Modality Aligner For Data Efficient Multimodal Learning
SMA uses a submodular mutual information objective on data sets to deliver competitive zero-shot classification and retrieval performance on CLIP benchmarks with only tens of thousands of samples, orders of magnitude fewer than standard approaches.
-
QKFormer: Hierarchical Spiking Transformer using Q-K Attention
A hierarchical spiking transformer using Q-K attention achieves 85.65% top-1 accuracy on ImageNet-1K, the first direct-trained SNN to exceed 85%.
-
Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
-
Distributed Black-Box Optimization via Error Correcting Codes
Presents a coded distributed black-box optimization framework resilient to stragglers via error-correcting codes on search directions, extending evolution strategies, with experiments showing faster runtimes on adversarial attacks.
-
Learning Multimodal Fixed-Point Weights using Gradient Descent
Gradient-based optimization learns symmetric Gaussian mixture modes for 2-bit fixed-point weight quantization, claiming state-of-the-art performance and self-adaptive weights.
-
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD
GNC convolves stochastic gradient noise to smooth sharp minima in large-batch SGD, outperforming isotropic noise for better generalization in distributed deep learning.