pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2025 1

verdicts

UNVERDICTED 1

representative citing papers

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining

cs.LG · 2025-11-24 · unverdicted · novelty 5.0

Curriculum pretraining with ascending data quality outperforms random order under constant learning rate but loses most benefit under standard decay; moderate decay or final-checkpoint averaging recovers a 1.64% average benchmark gain on 1.5B models trained for 30B tokens.

citing papers explorer

Showing 1 of 1 citing paper.

  • How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining cs.LG · 2025-11-24 · unverdicted · none · ref 4

    Curriculum pretraining with ascending data quality outperforms random order under constant learning rate but loses most benefit under standard decay; moderate decay or final-checkpoint averaging recovers a 1.64% average benchmark gain on 1.5B models trained for 30B tokens.