pith. sign in

Cautious weight decay

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.LG 2 cs.CV 1

years

2026 3

verdicts

UNVERDICTED 3

roles

background 2

polarities

background 1 unclear 1

representative citing papers

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

citing papers explorer

Showing 3 of 3 citing papers.

  • Toto 2.0: Time Series Forecasting Enters the Scaling Era cs.LG · 2026-05-19 · unverdicted · none · ref 10

    Toto 2.0 is a family of open time series foundation models that demonstrates reliable scaling and sets new state-of-the-art results on three forecasting benchmarks.

  • Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 166

    VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

  • Parcae: Scaling Laws For Stable Looped Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 12

    Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.