pith. sign in

arxiv: 2606.06533 · v1 · pith:53HTHUTHnew · submitted 2026-06-03 · 💻 cs.AI · cs.CL

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

Pith reviewed 2026-06-28 06:23 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords training dynamicsAI sciencescaling lawspost-hoc analysismechanistic interpretabilitymodel behavioroptimization dynamics
0
0 comments X

The pith

A science of AI must study training dynamics to understand how model behaviors emerge rather than analyzing them after training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that AI models are snapshots of evolving training processes shaped by data and optimization, not fixed artifacts. Current research often fixes issues after training, but a true science needs to study these dynamics for deeper understanding. Such a science would let researchers predict final behaviors from early signals, intervene during training if things go wrong, and design procedures that produce better models from the start. Scaling laws already allow prediction of loss, and the authors call for extending this to capabilities, biases, and safety. They draw on philosophy of science and review progress in fields like interpretability to outline requirements and open problems.

Core claim

Models are not static objects but snapshots of time-evolving processes shaped by data, objectives, architectures, and optimization dynamics. A science of AI must move beyond post-hoc fixes and study the training dynamics that produce model behavior, supporting prediction from early signals, intervention on trajectories, and design of training procedures for desired properties. Scaling laws demonstrate prediction for loss, and the challenge is to extend this to other behaviors while meeting standards from the history and philosophy of science.

What carries the argument

Training dynamics, the time-evolving processes during optimization that determine model behavior.

If this is right

  • Predicting model outcomes becomes possible using signals available early in training.
  • Intervening to correct undesired behaviors during the training process rather than after.
  • Designing training procedures that reliably produce models with specific capabilities, reduced biases, and improved safety.
  • Extending the predictive power of scaling laws from loss to capabilities, robustness, and safety-relevant properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Work on mechanistic interpretability could be reframed to track how circuits and features develop over the course of training.
  • Problems in fairness and memorization might be better understood by observing their emergence rather than their final state.
  • New benchmarks and experiments that evaluate models at multiple points during training would be needed to test these ideas.

Load-bearing premise

Theories of training dynamics can be developed that enable prediction, intervention, and design of model properties beyond what is currently possible with scaling laws for loss.

What would settle it

Finding that early training signals do not predict final model capabilities or safety properties in large models any better than post-training analysis would falsify the position.

read the original abstract

What would it mean to have a scientific understanding of AI? Models are not static objects: they are snapshots of time-evolving processes shaped by data, objectives, architectures, and optimization dynamics. Yet much of AI research treats models as fixed artifacts, analyzing behaviors after training rather than asking why they emerge. This position paper argues that a science of AI must move beyond post-hoc fixes and study the training dynamics that produce model behavior. Such a science should support progressively stronger forms of understanding: predicting outcomes from early training signals, intervening when trajectories go wrong, and ultimately designing training procedures that more reliably produce desired properties. Scaling laws have made prediction routine for loss; the challenge is extending this success to capabilities, biases, robustness, and safety-relevant behaviors. We articulate requirements for such theories grounded in the history and philosophy of science, examine progress in mechanistic interpretability, fairness, memorization, and simplicity bias, and identify concrete open problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. This position paper claims that a science of AI must study training dynamics rather than relying on post-hoc fixes after training, to enable progressively stronger forms of understanding: predicting outcomes from early training signals, intervening on trajectories, and designing training procedures for desired properties including capabilities, biases, robustness, and safety. It articulates requirements drawn from the history and philosophy of science, surveys progress in subfields such as mechanistic interpretability, fairness, memorization, and simplicity bias, and identifies concrete open problems, while noting that scaling laws have succeeded for loss but the challenge is extending this to other behaviors.

Significance. If the proposed research program can be realized, it would shift AI research from treating models as fixed artifacts to understanding them as outcomes of time-evolving processes, potentially enabling more reliable control over model properties at scale and extending the predictive success of scaling laws beyond loss.

major comments (1)
  1. Abstract: the claim that theories grounded in training dynamics will support prediction, intervention, and design for capabilities, biases, robustness, and safety (beyond loss) is asserted without any derivation, data, formal argument, or concrete example of how this extension would work; this feasibility assumption is load-bearing for the normative claim that such a science 'must' be pursued to achieve the described stronger forms of understanding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback on our position paper. The primary concern is addressed point-by-point below. We agree that the abstract could better signal the supporting material in the body and will make a targeted revision.

read point-by-point responses
  1. Referee: [—] Abstract: the claim that theories grounded in training dynamics will support prediction, intervention, and design for capabilities, biases, robustness, and safety (beyond loss) is asserted without any derivation, data, formal argument, or concrete example of how this extension would work; this feasibility assumption is load-bearing for the normative claim that such a science 'must' be pursued to achieve the described stronger forms of understanding.

    Authors: As a position paper, the manuscript advocates for a research program rather than deriving or empirically validating its feasibility. The abstract is intentionally concise and summarizes the core thesis; the body articulates requirements from the history and philosophy of science, surveys existing progress (e.g., circuit-level analysis during training in mechanistic interpretability, emergence of biases over the course of training in fairness research, dynamics of memorization, and simplicity bias), and frames extension beyond loss as an open challenge with scaling laws as a partial precedent. These sections supply the concrete examples and partial evidence the referee correctly notes are absent from the abstract alone. We will revise the abstract to briefly reference the surveyed subfields and to distinguish the established success on loss from the open extension to other behaviors, thereby making the load-bearing assumption more transparent without altering the position itself. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This position paper advances a normative argument that AI research should prioritize theories of training dynamics to enable prediction, intervention, and design of model properties. It contains no equations, derivations, fitted parameters, or technical steps whose validity reduces to self-referential inputs. The abstract and structure rely on philosophy of science requirements and surveys of subfields without any self-citation load-bearing chains, ansatzes, or renamings that would make the central claim equivalent to its own premises by construction. The argument is self-contained as an aspirational research program.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No technical derivations or empirical claims are made in the abstract, so the ledger is empty of free parameters, axioms, and invented entities.

pith-pipeline@v0.9.1-grok · 5714 in / 1028 out tokens · 20444 ms · 2026-06-28T06:23:53.457263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 14 canonical work pages · 2 internal anchors

  1. [1]

    Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, 9 Position: Building a Science of AI Q., Gerecke, W., O’Donnell, T

    URL https://escholarship.org/uc/ item/2h78n1hj. Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, 9 Position: Building a Science of AI Q., Gerecke, W., O’Donnell, T. J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., Nowaczynski, A., Wang, B., Stepniewska-Dziubinska, M. M., Zhang, S., Ojew- ole, A., Guney, M. E., Biderman, S., Watkins, A. M....

  2. [2]

    a person holding a sign that says

    URL https://openreview.net/forum? id=oDbiL9CLoS. Anderson, J.Fundamentals of Aerodynamics (SI units). McGraw Hill, 2011. Andriushchenko, M. and Flammarion, N. Does refusal train- ing in LLMs generalize to the past tense? InThe Thir- teenth International Conference on Learning Represen- tations, 2025. URL https://openreview.net/ forum?id=aJUuere4fM. Angwin...

  3. [3]

    11 Stage-wise Distortion–Perception Traversal in Zero-shot Inverse Problems with Diffusion Models Xu, X

    URL https://aclanthology.org/2020. acl-main.485/. Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V ., and Kalai, A. T. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.),Advances in Neural Information Processing Systems, volume 29. Curran Associates,...

  4. [4]

    acl-long.477/

    URL https://aclanthology.org/2022. acl-long.477/. Cloud, A., Goldman-Wetzler, J., Wybitul, E., Miller, J., and Turner, A. M. Gradient routing: Masking gradients to localize computation in neural networks.arXiv preprint arXiv:2410.04332, 2024. Cotterell, R., Mielke, S. J., Eisner, J., and Roark, B. Are all languages equally hard to language-model? In Walke...

  5. [5]

    Nature645(8081), 633–638 (Sep 2025)

    doi: 10.1038/s41586-025-09422-z. Deng, J., Li, T.-W., Zhang, S., Liu, S., Pan, Y ., Huang, H., Wang, X., Hu, P., Zhang, X., et al. dattri: A library for efficient data attribution.Advances in Neural Information Processing Systems, 37:136763–136781, 2024. Dentan, J., Buscaldi, D., Shabou, A., and Vanier, S. Predict- ing and analyzing memorization within fi...

  6. [6]

    naacl-long.148/

    URL https://aclanthology.org/2025. naacl-long.148/. Grant, N. and Hill, K. Google’s photo app still can’t find gorillas. and neither can ap- ple’s., May 2023. URL https://www. nytimes.com/2023/05/22/technology/ ai-photo-labels-google-apple.html . Accessed: 2026-01-28. Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A....

  7. [7]
  8. [8]

    Deep Residual Learning for Image Recognition

    URL https://doi.ieeecomputersociety. org/10.1109/CVPR.2016.90. Heap, T., Lawson, T., Farnik, L., and Aitchison, L. Sparse autoencoders can interpret randomly initialized trans- formers.arXiv e-prints, pp. arXiv–2501, 2025. URL https://arxiv.org/abs/2501.17727. Held, W., Hall, D., Liang, P., and Yang, D. Relative scaling laws for llms, 2026. URL https://ar...

  9. [9]

    Deep Learning Scaling is Predictable, Empirically

    URL https://www.theguardian. com/technology/2018/jan/12/ google-racism-ban-gorilla-black-people . Accessed: 2026-01-28. Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M. M. A., Yang, Y ., and Zhou, Y . Deep learning scaling is predictable, empirically.arXiv preprint arXiv:1712.00409, 2017. Hoffmann, J., Borgeaud, S.,...

  10. [10]

    emnlp-main.598/

    URL https://aclanthology.org/2024. emnlp-main.598/. Ilyas, A. and Engstrom, L. Magic: Near-optimal data attribu- tion for deep learning.arXiv preprint arXiv:2504.16430, 2025. Ilyas, A., Park, S. M., Engstrom, L., Leclerc, G., and Madry, A. Datamodels: Predicting predictions from training data. arXiv preprint arXiv:2202.00622, 2022. Juneja, J., Bansal, R.,...

  11. [11]

    Kaplan, J., McCandlish, S., Henighan, T., Brown, T

    URL https://openreview.net/forum? id=eiEIKGuqaf. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https: //arxiv.org/abs/2001.08361. Kirk, R., Mediratta, I., Nalmpantis, C., Luketina, J., Ham- bro, ...

  12. [12]

    URL https://openreview.net/forum? id=VRhVS59yhP. Kuhn, T. S.The Essential Tension: Selected Studies in Scientific Tradition and Change. University of Chicago Press, Chicago, 1977. Leavitt, M. L. and Morcos, A. S. Selectivity considered harmful: evaluating the causal impact of class selectiv- ity in {dnn}s. InInternational Conference on Learning Representa...

  13. [13]

    findings-emnlp.877/

    URL https://aclanthology.org/2025. findings-emnlp.877/. Lu, C., Lu, C., Lange, R. T., Yamada, Y ., Hu, S., Foerster, J., Ha, D., and Clune, J. Towards end-to-end automation of AI research.Nature, 651:914–919, 2026. doi: 10.1038/ s41586-026-10265-5. Masud, S., Khan, M. A., Goyal, V ., Akhtar, M. S., and Chakraborty, T. Probing critical learning dynamics of...

  14. [15]

    emnlp-main.41/

    URL https://aclanthology.org/2024. emnlp-main.41/. M´eloux, M., Dirupo, G., Portet, F., and Peyrard, M. The dead salmons of ai interpretability.arXiv preprint arXiv:2512.18792, 2025. Michaelov, J. A., Levy, R. P., and Bergen, B. K. Language model behavioral phases are consistent across architec- ture, training data, and scale. InThe Thirty-ninth Annual Co...

  15. [16]

    Mickel, J., De-Arteaga, M., Leqi, L., and Tian, K

    URL https://openreview.net/forum? id=HenpVfO3Wp. Mickel, J., De-Arteaga, M., Leqi, L., and Tian, K. More of the same: Persistent representational harms under in- creased representation.Advances in Neural Information Processing Systems, 38:61277–61315, 2025. Mlodozeniec, B., Reid, I., Power, S., Krueger, D., Erdogdu, M., Turner, R. E., and Grosse, R. Distr...

  16. [17]

    URL https://openreview.net/forum? id=frVo9MzRuU. Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y ., Chen, A., Conerly, T., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Johnston, S., Jones, A., Kernion, J., Lovitt, L., Ndousse, K., Amodei, D., Brown, T., Clark, J., Kaplan, J., McCandlish...

  17. [18]

    V-DPO: Mitigating hallucination in large vision language models via vision-guided direct preference optimization

    URL https://openreview.net/forum? id=3E8YNv1HjU. Qi, X., Zeng, Y ., Xie, T., Chen, P.-Y ., Jia, R., Mittal, P., and Henderson, P. Fine-tuning aligned language models compromises safety, even when users do not intend to! InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview. net/forum?id=hTEGyKf0dZ. 15 Position: B...

  18. [19]

    Wang, Y ., Qu, W., Zhai, S., Jiang, Y ., Zichen, L., Liu, Y ., Dong, Y ., and Zhang, J

    URL https://openreview.net/forum? id=K0E_F0gFDgA. Seshadri, P., Singh, S., and Elazar, Y . The bias amplification paradox in text-to-image generation. In Duh, K., Gomez, H., and Bethard, S. (eds.),Proceedings of the 2024 Con- ference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies (Volume 1: Lo...

  19. [20]

    naacl-long.353/

    URL https://aclanthology.org/2024. naacl-long.353/. Shani, C., Reif, Y ., Roll, N., Jurafsky, D., and Shutova, E. The roots of performance disparity in multilingual language models: Intrinsic modeling difficulty or design choices?arXiv preprint arXiv:2601.07220, 2026. URL https://arxiv.org/abs/2601.07220. Singh, A. K., Chan, S. C., Moskovitz, T., Grant, E...

  20. [21]

    Singh, S., Romanou, A., Fourrier, C., Adelani, D

    URL https://openreview.net/forum? id=Of0GBzow8P. Singh, S., Romanou, A., Fourrier, C., Adelani, D. I., Ngui, J. G., Vila-Suero, D., Limkonchotiwat, P., Marchisio, K., Leong, W. Q., Susanto, Y ., Ng, R., Longpre, S., Ruder, S., Ko, W.-Y ., Bosselut, A., Oh, A., Martins, A., Choshen, L., Ippolito, D., Ferrante, E., Fadaee, M., Ermis, B., and Hooker, S. Glob...

  21. [22]

    Global MMLU : Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

    doi: 10.18653/v1/2025.acl-long.919. URL https: //aclanthology.org/2025.acl-long.919/. 16 Position: Building a Science of AI Tao, C., Liu, Q., Dou, L., Muennighoff, N., Wan, Z., Luo, P., Lin, M., and Wong, N. Scaling laws with vocab- ulary: Larger models deserve larger vocabularies. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Pa- quet, U., Tomczak...

  22. [23]

    Wei, J., Godbole, A., Khan, M

    URL https://openreview.net/forum? id=jA235JGM09. Wei, J., Godbole, A., Khan, M. A., Wang, R. Y ., Zhu, X., Flemings, J., Kashyap, N., Gummadi, K. P., Neiswanger, W., and Jia, R. Hubble: a model suite to advance the study of LLM memorization. InThe Fourteenth International Conference on Learning Representations,

  23. [24]

    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    URL https://openreview.net/forum? id=ZfdnZhOP0k. Wright, W.Some aeronautical experiments. US Govern- ment Printing Office, 1901. Xu, Z.-Q. J., Zhang, Y ., Luo, T., Xiao, Y ., and Ma, Z. Fre- quency principle: Fourier analysis sheds light on deep neural networks.Communications in Computational Physics, 28(5):1746–1767, 2020. doi: 10.4208/cicp. OA-2020-0085...