pith. sign in

International Conference on Learning Representations (ICLR) , year =

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

Asymmetric Scaling Laws from Sparse Features

stat.ML · 2026-05-22 · unverdicted · novelty 5.0

A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.

Unified Neural Scaling Laws

cs.LG · 2026-05-25 · unverdicted · novelty 4.0

Presents a single functional form for neural scaling that unifies multiple scaling dimensions and claims higher extrapolation accuracy than prior forms across diverse tasks and architectures.

citing papers explorer

Showing 10 of 10 citing papers.

  • Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets cs.LG · 2022-01-06 · unverdicted · none · ref 10

    Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.

  • How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization cs.LG · 2026-05-13 · unverdicted · none · ref 256

    The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.

  • Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 94

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  • Scaling Laws and Interpretability of Learning from Repeated Data cs.LG · 2022-05-21 · accept · none · ref 22

    Repeating 0.1% of training data 100 times degrades an 800M parameter model's performance to that of a 400M model by damaging copying mechanisms and induction heads associated with generalization.

  • A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 39

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  • Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 181

    Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

  • Asymmetric Scaling Laws from Sparse Features stat.ML · 2026-05-22 · unverdicted · none · ref 77

    A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.

  • Unified Neural Scaling Laws cs.LG · 2026-05-25 · unverdicted · none · ref 21

    Presents a single functional form for neural scaling that unifies multiple scaling dimensions and claims higher extrapolation accuracy than prior forms across diverse tasks and architectures.

  • Position: Ideas Should be the Center of Machine Learning Research cs.LG · 2026-05-14 · conditional · none · ref 44

    Machine learning research should prioritize ideas by testing their predicted behavioral signatures in modern models through custom experiments instead of leaderboard chasing or abstract theorems.

  • Six Open Questions in Machine-Learned Interatomic Potential Foundation Models cond-mat.mtrl-sci · 2026-06-05 · unverdicted · none · ref 69

    This perspective article develops a definition of foundational MLIPs and poses six open questions that the authors believe will define future research in machine-learned interatomic potentials.