Cautious weight decay

URLhttps://arxiv · 2025 · arXiv 2510.12402

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

Toto 2.0: Time Series Forecasting Enters the Scaling Era

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Toto 2.0 is a family of open time series foundation models that demonstrates reliable scaling and sets new state-of-the-art results on three forecasting benchmarks.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

citing papers explorer

Showing 3 of 3 citing papers.

Toto 2.0: Time Series Forecasting Enters the Scaling Era cs.LG · 2026-05-19 · unverdicted · none · ref 10
Toto 2.0 is a family of open time series foundation models that demonstrates reliable scaling and sets new state-of-the-art results on three forecasting benchmarks.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 166
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
Parcae: Scaling Laws For Stable Looped Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 12
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

Cautious weight decay

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer