Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.
Learning multiple layers of features from tiny images
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
dataset 1polarities
use dataset 1representative citing papers
For any fixed nonconstant teacher T, the best constant student has alignment cost exactly equal to the teacher mutual information I_T(X;T); a latent-only witness below this threshold with margin cannot be constant.
MoCo-EA uses optimized Bézier curves for crossover in evolutionary adversarial attacks by exploiting mode connectivity of successful perturbations.
TC-JEPA conditions masked feature prediction on text captions via sparse cross-attention to produce more semantically rich visual representations and outperforms contrastive methods on fine-grained tasks.
GXD estimates the first-order functional cost of replacing a neuron via gradient attribution to make adaptive resets more reliable for preserving plasticity in continual learning.
citing papers explorer
-
Is Monotonic Sampling Necessary in Diffusion Models?
Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.
-
A Testable Certificate for Constant Collapse in Teacher-Guided VAEs
For any fixed nonconstant teacher T, the best constant student has alignment cost exactly equal to the teacher mutual information I_T(X;T); a latent-only witness below this threshold with margin cannot be constant.
-
MoCo-EA: Exploiting Adversarial Mode Connectivity for Efficient Evolutionary Attacks
MoCo-EA uses optimized Bézier curves for crossover in evolutionary adversarial attacks by exploiting mode connectivity of successful perturbations.
-
Text-Conditional JEPA for Learning Semantically Rich Visual Representations
TC-JEPA conditions masked feature prediction on text captions via sparse cross-attention to produce more semantically rich visual representations and outperforms contrastive methods on fine-grained tasks.
-
Attribution-Based Neuron Utility for Plasticity Restoration in Deep Networks
GXD estimates the first-order functional cost of replacing a neuron via gradient attribution to make adaptive resets more reliable for preserving plasticity in continual learning.