Dataset distillation creates a tiny synthetic training set that, when used with a fixed network initialization, produces models whose performance approximates that of models trained on the full original dataset.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
HFF replaces binary goodness-of-fit in Forward-Forward with hyperspherical prototypes for direct multi-class decisions, enabling single-forward-pass inference and training that scales to ImageNet while closing much of the gap to backpropagation.
SURGE proposes a dual-path gradient compensator and adaptive gradient scaler to mitigate gradient mismatch in binary neural network training via auxiliary backpropagation.
PTA adapts VLMs at test time by maintaining and updating class-specific knowledge prototypes from test samples, achieving higher accuracy than cache-based methods with far less speed loss.
Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.
citing papers explorer
-
Dataset Distillation
Dataset distillation creates a tiny synthetic training set that, when used with a fixed network initialization, produces models whose performance approximates that of models trained on the full original dataset.