Mean-field perturbation theory of dropout at the edge of chaos yields distinct universality classes for smooth versus kinked activations, critical scaling laws for correlation decay, and front-loaded dropout schedules that reduce test loss.
International Conference on Learning Representations , year =
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
method 1polarities
use method 1representative citing papers
CHASM introduces a cross-frequency harmonized axis-separable spectral mixer using a shared channel eigenbasis plus per-frequency positive gains, yielding consistent gains over same-backbone baselines in medical and natural image tasks.
Spectrum-adaptive post-hoc generalization bounds for multi-layer Transformers are derived using layerwise Schatten quantities whose indices are chosen after training based on singular-value profiles.
Viewpoint-conditioned feature selection improves thermal vehicle re-identification mAP by 19.7% on RGBNT100 and 12.8% on a new maritime dataset by adapting RGB ViT extractors.
Setting β in balanced Adam to achieve a refresh count R_β ≈1000 based on effective learning horizon T_ES improves validation robustness over fixed-β baselines across 11 vision and language experiments.
citing papers explorer
-
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
Mean-field perturbation theory of dropout at the edge of chaos yields distinct universality classes for smooth versus kinked activations, critical scaling laws for correlation decay, and front-loaded dropout schedules that reduce test loss.
-
CHASM: Cross-frequency Harmonized Axis-Separable Mixing for Spectral Token Operators
CHASM introduces a cross-frequency harmonized axis-separable spectral mixer using a shared channel eigenbasis plus per-frequency positive gains, yielding consistent gains over same-backbone baselines in medical and natural image tasks.
-
Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers
Spectrum-adaptive post-hoc generalization bounds for multi-layer Transformers are derived using layerwise Schatten quantities whose indices are chosen after training based on singular-value profiles.
-
VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision
Viewpoint-conditioned feature selection improves thermal vehicle re-identification mAP by 19.7% on RGBNT100 and 12.8% on a new maritime dataset by adapting RGB ViT extractors.
-
Refresh-Scaling the Memory of Balanced Adam
Setting β in balanced Adam to achieve a refresh count R_β ≈1000 based on effective learning horizon T_ES improves validation robustness over fixed-β baselines across 11 vision and language experiments.