pith. sign in

Transformers learn to implement multi-step gradient descent with chain of thought

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 2 2025 1

clear filters

representative citing papers

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

citing papers explorer

Showing 1 of 1 citing paper after filters.