Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

Catherine Arnett; Fazl Barez; Mohammad Aflah Khan; Naomi Saphra; Niloofar Mireshghallah; Stella Biderman

arxiv: 2606.06533 · v1 · pith:53HTHUTHnew · submitted 2026-06-03 · 💻 cs.AI · cs.CL

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

Stella Biderman , Mohammad Aflah Khan , Niloofar Mireshghallah , Catherine Arnett , Fazl Barez , Naomi Saphra This is my paper

Pith reviewed 2026-06-28 06:23 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords training dynamicsAI sciencescaling lawspost-hoc analysismechanistic interpretabilitymodel behavioroptimization dynamics

0 comments

The pith

A science of AI must study training dynamics to understand how model behaviors emerge rather than analyzing them after training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that AI models are snapshots of evolving training processes shaped by data and optimization, not fixed artifacts. Current research often fixes issues after training, but a true science needs to study these dynamics for deeper understanding. Such a science would let researchers predict final behaviors from early signals, intervene during training if things go wrong, and design procedures that produce better models from the start. Scaling laws already allow prediction of loss, and the authors call for extending this to capabilities, biases, and safety. They draw on philosophy of science and review progress in fields like interpretability to outline requirements and open problems.

Core claim

Models are not static objects but snapshots of time-evolving processes shaped by data, objectives, architectures, and optimization dynamics. A science of AI must move beyond post-hoc fixes and study the training dynamics that produce model behavior, supporting prediction from early signals, intervention on trajectories, and design of training procedures for desired properties. Scaling laws demonstrate prediction for loss, and the challenge is to extend this to other behaviors while meeting standards from the history and philosophy of science.

What carries the argument

Training dynamics, the time-evolving processes during optimization that determine model behavior.

If this is right

Predicting model outcomes becomes possible using signals available early in training.
Intervening to correct undesired behaviors during the training process rather than after.
Designing training procedures that reliably produce models with specific capabilities, reduced biases, and improved safety.
Extending the predictive power of scaling laws from loss to capabilities, robustness, and safety-relevant properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Work on mechanistic interpretability could be reframed to track how circuits and features develop over the course of training.
Problems in fairness and memorization might be better understood by observing their emergence rather than their final state.
New benchmarks and experiments that evaluate models at multiple points during training would be needed to test these ideas.

Load-bearing premise

Theories of training dynamics can be developed that enable prediction, intervention, and design of model properties beyond what is currently possible with scaling laws for loss.

What would settle it

Finding that early training signals do not predict final model capabilities or safety properties in large models any better than post-training analysis would falsify the position.

read the original abstract

What would it mean to have a scientific understanding of AI? Models are not static objects: they are snapshots of time-evolving processes shaped by data, objectives, architectures, and optimization dynamics. Yet much of AI research treats models as fixed artifacts, analyzing behaviors after training rather than asking why they emerge. This position paper argues that a science of AI must move beyond post-hoc fixes and study the training dynamics that produce model behavior. Such a science should support progressively stronger forms of understanding: predicting outcomes from early training signals, intervening when trajectories go wrong, and ultimately designing training procedures that more reliably produce desired properties. Scaling laws have made prediction routine for loss; the challenge is extending this success to capabilities, biases, robustness, and safety-relevant behaviors. We articulate requirements for such theories grounded in the history and philosophy of science, examine progress in mechanistic interpretability, fairness, memorization, and simplicity bias, and identify concrete open problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This position paper argues AI research should prioritize training dynamics over post-hoc analysis but provides no evidence or sketch showing the approach can extend beyond loss prediction.

read the letter

The main takeaway is that current AI work often treats models as finished products and tries to patch problems afterward, when a better science would track how behaviors form during training. The paper makes this case by contrasting scaling laws, which already let us predict loss from early signals, with the harder task of predicting capabilities, biases, or safety properties the same way.

It does a clean job pulling requirements from philosophy of science for what a useful theory would need: the ability to predict, intervene, and eventually design training runs. The survey of mechanistic interpretability, fairness, memorization, and simplicity bias is useful because it shows where some dynamical understanding already exists and where it does not. The list of open problems is specific enough to be actionable.

The weak point is that the central hope—that dynamical theories will actually deliver prediction and control at scale—is asserted rather than demonstrated. The paper gives no toy example, no failed attempt, and no argument why the success with loss should transfer to other properties. Without that, the recommendation stays at the level of a research program rather than a supported claim.

This is for people already thinking about the limits of post-training evaluation and who want a structured way to talk about shifting priorities. It will not give technical readers new methods or results. A serious editor should send it to peer review; position pieces that organize existing work and name concrete gaps can still move the conversation even when they do not prove feasibility.

Referee Report

1 major / 0 minor

Summary. This position paper claims that a science of AI must study training dynamics rather than relying on post-hoc fixes after training, to enable progressively stronger forms of understanding: predicting outcomes from early training signals, intervening on trajectories, and designing training procedures for desired properties including capabilities, biases, robustness, and safety. It articulates requirements drawn from the history and philosophy of science, surveys progress in subfields such as mechanistic interpretability, fairness, memorization, and simplicity bias, and identifies concrete open problems, while noting that scaling laws have succeeded for loss but the challenge is extending this to other behaviors.

Significance. If the proposed research program can be realized, it would shift AI research from treating models as fixed artifacts to understanding them as outcomes of time-evolving processes, potentially enabling more reliable control over model properties at scale and extending the predictive success of scaling laws beyond loss.

major comments (1)

Abstract: the claim that theories grounded in training dynamics will support prediction, intervention, and design for capabilities, biases, robustness, and safety (beyond loss) is asserted without any derivation, data, formal argument, or concrete example of how this extension would work; this feasibility assumption is load-bearing for the normative claim that such a science 'must' be pursued to achieve the described stronger forms of understanding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback on our position paper. The primary concern is addressed point-by-point below. We agree that the abstract could better signal the supporting material in the body and will make a targeted revision.

read point-by-point responses

Referee: [—] Abstract: the claim that theories grounded in training dynamics will support prediction, intervention, and design for capabilities, biases, robustness, and safety (beyond loss) is asserted without any derivation, data, formal argument, or concrete example of how this extension would work; this feasibility assumption is load-bearing for the normative claim that such a science 'must' be pursued to achieve the described stronger forms of understanding.

Authors: As a position paper, the manuscript advocates for a research program rather than deriving or empirically validating its feasibility. The abstract is intentionally concise and summarizes the core thesis; the body articulates requirements from the history and philosophy of science, surveys existing progress (e.g., circuit-level analysis during training in mechanistic interpretability, emergence of biases over the course of training in fairness research, dynamics of memorization, and simplicity bias), and frames extension beyond loss as an open challenge with scaling laws as a partial precedent. These sections supply the concrete examples and partial evidence the referee correctly notes are absent from the abstract alone. We will revise the abstract to briefly reference the surveyed subfields and to distinguish the established success on loss from the open extension to other behaviors, thereby making the load-bearing assumption more transparent without altering the position itself. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This position paper advances a normative argument that AI research should prioritize theories of training dynamics to enable prediction, intervention, and design of model properties. It contains no equations, derivations, fitted parameters, or technical steps whose validity reduces to self-referential inputs. The abstract and structure rely on philosophy of science requirements and surveys of subfields without any self-citation load-bearing chains, ansatzes, or renamings that would make the central claim equivalent to its own premises by construction. The argument is self-contained as an aspirational research program.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No technical derivations or empirical claims are made in the abstract, so the ledger is empty of free parameters, axioms, and invented entities.

pith-pipeline@v0.9.1-grok · 5714 in / 1028 out tokens · 20444 ms · 2026-06-28T06:23:53.457263+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 14 canonical work pages · 2 internal anchors

[1]

Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, 9 Position: Building a Science of AI Q., Gerecke, W., O’Donnell, T

URL https://escholarship.org/uc/ item/2h78n1hj. Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, 9 Position: Building a Science of AI Q., Gerecke, W., O’Donnell, T. J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., Nowaczynski, A., Wang, B., Stepniewska-Dziubinska, M. M., Zhang, S., Ojew- ole, A., Guney, M. E., Biderman, S., Watkins, A. M....

work page doi:10.1038/s41592-024-02272-z 2024
[2]

a person holding a sign that says

URL https://openreview.net/forum? id=oDbiL9CLoS. Anderson, J.Fundamentals of Aerodynamics (SI units). McGraw Hill, 2011. Andriushchenko, M. and Flammarion, N. Does refusal train- ing in LLMs generalize to the past tense? InThe Thir- teenth International Conference on Learning Represen- tations, 2025. URL https://openreview.net/ forum?id=aJUuere4fM. Angwin...

work page doi:10.1037/10885-012 2011
[3]

11 Stage-wise Distortion–Perception Traversal in Zero-shot Inverse Problems with Diffusion Models Xu, X

URL https://aclanthology.org/2020. acl-main.485/. Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V ., and Kalai, A. T. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.),Advances in Neural Information Processing Systems, volume 29. Curran Associates,...

work page doi:10.1109/cvpr52733.2024.01030 2020
[4]

acl-long.477/

URL https://aclanthology.org/2022. acl-long.477/. Cloud, A., Goldman-Wetzler, J., Wybitul, E., Miller, J., and Turner, A. M. Gradient routing: Masking gradients to localize computation in neural networks.arXiv preprint arXiv:2410.04332, 2024. Cotterell, R., Mielke, S. J., Eisner, J., and Roark, B. Are all languages equally hard to language-model? In Walke...

work page doi:10.18653/v1/n18-2085 2022
[5]

Nature645(8081), 633–638 (Sep 2025)

doi: 10.1038/s41586-025-09422-z. Deng, J., Li, T.-W., Zhang, S., Liu, S., Pan, Y ., Huang, H., Wang, X., Hu, P., Zhang, X., et al. dattri: A library for efficient data attribution.Advances in Neural Information Processing Systems, 37:136763–136781, 2024. Dentan, J., Buscaldi, D., Shabou, A., and Vanier, S. Predict- ing and analyzing memorization within fi...

work page doi:10.1038/s41586-025-09422-z 2024
[6]

naacl-long.148/

URL https://aclanthology.org/2025. naacl-long.148/. Grant, N. and Hill, K. Google’s photo app still can’t find gorillas. and neither can ap- ple’s., May 2023. URL https://www. nytimes.com/2023/05/22/technology/ ai-photo-labels-google-apple.html . Accessed: 2026-01-28. Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A....

Pith/arXiv arXiv 2025
[7]

Real-Time Action Recogni- tion with Enhanced Motion Vector CNNs

IEEE Computer Society. doi: 10.1109/CVPR.2016

work page doi:10.1109/cvpr.2016 2016
[8]

Deep Residual Learning for Image Recognition

URL https://doi.ieeecomputersociety. org/10.1109/CVPR.2016.90. Heap, T., Lawson, T., Farnik, L., and Aitchison, L. Sparse autoencoders can interpret randomly initialized trans- formers.arXiv e-prints, pp. arXiv–2501, 2025. URL https://arxiv.org/abs/2501.17727. Held, W., Hall, D., Liang, P., and Yang, D. Relative scaling laws for llms, 2026. URL https://ar...

work page doi:10.1109/cvpr.2016.90 2016
[9]

Deep Learning Scaling is Predictable, Empirically

URL https://www.theguardian. com/technology/2018/jan/12/ google-racism-ban-gorilla-black-people . Accessed: 2026-01-28. Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M. M. A., Yang, Y ., and Zhou, Y . Deep learning scaling is predictable, empirically.arXiv preprint arXiv:1712.00409, 2017. Hoffmann, J., Borgeaud, S.,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2024.emnlp-main 2018
[10]

emnlp-main.598/

URL https://aclanthology.org/2024. emnlp-main.598/. Ilyas, A. and Engstrom, L. Magic: Near-optimal data attribu- tion for deep learning.arXiv preprint arXiv:2504.16430, 2025. Ilyas, A., Park, S. M., Engstrom, L., Leclerc, G., and Madry, A. Datamodels: Predicting predictions from training data. arXiv preprint arXiv:2202.00622, 2022. Juneja, J., Bansal, R.,...

arXiv 2024
[11]

Kaplan, J., McCandlish, S., Henighan, T., Brown, T

URL https://openreview.net/forum? id=eiEIKGuqaf. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https: //arxiv.org/abs/2001.08361. Kirk, R., Mediratta, I., Nalmpantis, C., Luketina, J., Ham- bro, ...

Pith/arXiv arXiv 2001
[12]

URL https://openreview.net/forum? id=VRhVS59yhP. Kuhn, T. S.The Essential Tension: Selected Studies in Scientific Tradition and Change. University of Chicago Press, Chicago, 1977. Leavitt, M. L. and Morcos, A. S. Selectivity considered harmful: evaluating the causal impact of class selectiv- ity in {dnn}s. InInternational Conference on Learning Representa...

work page doi:10.18653/v1/2024.acl-long.834 1977
[13]

findings-emnlp.877/

URL https://aclanthology.org/2025. findings-emnlp.877/. Lu, C., Lu, C., Lange, R. T., Yamada, Y ., Hu, S., Foerster, J., Ha, D., and Clune, J. Towards end-to-end automation of AI research.Nature, 651:914–919, 2026. doi: 10.1038/ s41586-026-10265-5. Masud, S., Khan, M. A., Goyal, V ., Akhtar, M. S., and Chakraborty, T. Probing critical learning dynamics of...

work page doi:10.18653/v1/2024.findings-eacl 2025
[15]

emnlp-main.41/

URL https://aclanthology.org/2024. emnlp-main.41/. M´eloux, M., Dirupo, G., Portet, F., and Peyrard, M. The dead salmons of ai interpretability.arXiv preprint arXiv:2512.18792, 2025. Michaelov, J. A., Levy, R. P., and Bergen, B. K. Language model behavioral phases are consistent across architec- ture, training data, and scale. InThe Thirty-ninth Annual Co...

arXiv 2024
[16]

Mickel, J., De-Arteaga, M., Leqi, L., and Tian, K

URL https://openreview.net/forum? id=HenpVfO3Wp. Mickel, J., De-Arteaga, M., Leqi, L., and Tian, K. More of the same: Persistent representational harms under in- creased representation.Advances in Neural Information Processing Systems, 38:61277–61315, 2025. Mlodozeniec, B., Reid, I., Power, S., Krueger, D., Erdogdu, M., Turner, R. E., and Grosse, R. Distr...

arXiv 2025
[17]

URL https://openreview.net/forum? id=frVo9MzRuU. Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y ., Chen, A., Conerly, T., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Johnston, S., Jones, A., Kernion, J., Lovitt, L., Ndousse, K., Amodei, D., Brown, T., Clark, J., Kaplan, J., McCandlish...

Pith/arXiv arXiv 2022
[18]

V-DPO: Mitigating hallucination in large vision language models via vision-guided direct preference optimization

URL https://openreview.net/forum? id=3E8YNv1HjU. Qi, X., Zeng, Y ., Xie, T., Chen, P.-Y ., Jia, R., Mittal, P., and Henderson, P. Fine-tuning aligned language models compromises safety, even when users do not intend to! InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview. net/forum?id=hTEGyKf0dZ. 15 Position: B...

work page doi:10.18653/v1/ 2024
[19]

Wang, Y ., Qu, W., Zhai, S., Jiang, Y ., Zichen, L., Liu, Y ., Dong, Y ., and Zhang, J

URL https://openreview.net/forum? id=K0E_F0gFDgA. Seshadri, P., Singh, S., and Elazar, Y . The bias amplification paradox in text-to-image generation. In Duh, K., Gomez, H., and Bethard, S. (eds.),Proceedings of the 2024 Con- ference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies (Volume 1: Lo...

work page doi:10.18653/v1/2024.naacl-long 2024
[20]

naacl-long.353/

URL https://aclanthology.org/2024. naacl-long.353/. Shani, C., Reif, Y ., Roll, N., Jurafsky, D., and Shutova, E. The roots of performance disparity in multilingual language models: Intrinsic modeling difficulty or design choices?arXiv preprint arXiv:2601.07220, 2026. URL https://arxiv.org/abs/2601.07220. Singh, A. K., Chan, S. C., Moskovitz, T., Grant, E...

Pith/arXiv arXiv 2024
[21]

Singh, S., Romanou, A., Fourrier, C., Adelani, D

URL https://openreview.net/forum? id=Of0GBzow8P. Singh, S., Romanou, A., Fourrier, C., Adelani, D. I., Ngui, J. G., Vila-Suero, D., Limkonchotiwat, P., Marchisio, K., Leong, W. Q., Susanto, Y ., Ng, R., Longpre, S., Ruder, S., Ko, W.-Y ., Bosselut, A., Oh, A., Martins, A., Choshen, L., Ippolito, D., Ferrante, E., Fadaee, M., Ermis, B., and Hooker, S. Glob...

2025
[22]

Global MMLU : Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

doi: 10.18653/v1/2025.acl-long.919. URL https: //aclanthology.org/2025.acl-long.919/. 16 Position: Building a Science of AI Tao, C., Liu, Q., Dou, L., Muennighoff, N., Wan, Z., Luo, P., Lin, M., and Wong, N. Scaling laws with vocab- ulary: Larger models deserve larger vocabularies. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Pa- quet, U., Tomczak...

work page doi:10.18653/v1/2025.acl-long.919 2025
[23]

Wei, J., Godbole, A., Khan, M

URL https://openreview.net/forum? id=jA235JGM09. Wei, J., Godbole, A., Khan, M. A., Wang, R. Y ., Zhu, X., Flemings, J., Kashyap, N., Gummadi, K. P., Neiswanger, W., and Jia, R. Hubble: a model suite to advance the study of LLM memorization. InThe Fourteenth International Conference on Learning Representations,
[24]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

URL https://openreview.net/forum? id=ZfdnZhOP0k. Wright, W.Some aeronautical experiments. US Govern- ment Printing Office, 1901. Xu, Z.-Q. J., Zhang, Y ., Luo, T., Xiao, Y ., and Ma, Z. Fre- quency principle: Fourier analysis sheds light on deep neural networks.Communications in Computational Physics, 28(5):1746–1767, 2020. doi: 10.4208/cicp. OA-2020-0085...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.4208/cicp 1901

[1] [1]

Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, 9 Position: Building a Science of AI Q., Gerecke, W., O’Donnell, T

URL https://escholarship.org/uc/ item/2h78n1hj. Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, 9 Position: Building a Science of AI Q., Gerecke, W., O’Donnell, T. J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., Nowaczynski, A., Wang, B., Stepniewska-Dziubinska, M. M., Zhang, S., Ojew- ole, A., Guney, M. E., Biderman, S., Watkins, A. M....

work page doi:10.1038/s41592-024-02272-z 2024

[2] [2]

a person holding a sign that says

URL https://openreview.net/forum? id=oDbiL9CLoS. Anderson, J.Fundamentals of Aerodynamics (SI units). McGraw Hill, 2011. Andriushchenko, M. and Flammarion, N. Does refusal train- ing in LLMs generalize to the past tense? InThe Thir- teenth International Conference on Learning Represen- tations, 2025. URL https://openreview.net/ forum?id=aJUuere4fM. Angwin...

work page doi:10.1037/10885-012 2011

[3] [3]

11 Stage-wise Distortion–Perception Traversal in Zero-shot Inverse Problems with Diffusion Models Xu, X

URL https://aclanthology.org/2020. acl-main.485/. Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V ., and Kalai, A. T. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.),Advances in Neural Information Processing Systems, volume 29. Curran Associates,...

work page doi:10.1109/cvpr52733.2024.01030 2020

[4] [4]

acl-long.477/

URL https://aclanthology.org/2022. acl-long.477/. Cloud, A., Goldman-Wetzler, J., Wybitul, E., Miller, J., and Turner, A. M. Gradient routing: Masking gradients to localize computation in neural networks.arXiv preprint arXiv:2410.04332, 2024. Cotterell, R., Mielke, S. J., Eisner, J., and Roark, B. Are all languages equally hard to language-model? In Walke...

work page doi:10.18653/v1/n18-2085 2022

[5] [5]

Nature645(8081), 633–638 (Sep 2025)

doi: 10.1038/s41586-025-09422-z. Deng, J., Li, T.-W., Zhang, S., Liu, S., Pan, Y ., Huang, H., Wang, X., Hu, P., Zhang, X., et al. dattri: A library for efficient data attribution.Advances in Neural Information Processing Systems, 37:136763–136781, 2024. Dentan, J., Buscaldi, D., Shabou, A., and Vanier, S. Predict- ing and analyzing memorization within fi...

work page doi:10.1038/s41586-025-09422-z 2024

[6] [6]

naacl-long.148/

URL https://aclanthology.org/2025. naacl-long.148/. Grant, N. and Hill, K. Google’s photo app still can’t find gorillas. and neither can ap- ple’s., May 2023. URL https://www. nytimes.com/2023/05/22/technology/ ai-photo-labels-google-apple.html . Accessed: 2026-01-28. Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A....

Pith/arXiv arXiv 2025

[7] [7]

Real-Time Action Recogni- tion with Enhanced Motion Vector CNNs

IEEE Computer Society. doi: 10.1109/CVPR.2016

work page doi:10.1109/cvpr.2016 2016

[8] [8]

Deep Residual Learning for Image Recognition

URL https://doi.ieeecomputersociety. org/10.1109/CVPR.2016.90. Heap, T., Lawson, T., Farnik, L., and Aitchison, L. Sparse autoencoders can interpret randomly initialized trans- formers.arXiv e-prints, pp. arXiv–2501, 2025. URL https://arxiv.org/abs/2501.17727. Held, W., Hall, D., Liang, P., and Yang, D. Relative scaling laws for llms, 2026. URL https://ar...

work page doi:10.1109/cvpr.2016.90 2016

[9] [9]

Deep Learning Scaling is Predictable, Empirically

URL https://www.theguardian. com/technology/2018/jan/12/ google-racism-ban-gorilla-black-people . Accessed: 2026-01-28. Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M. M. A., Yang, Y ., and Zhou, Y . Deep learning scaling is predictable, empirically.arXiv preprint arXiv:1712.00409, 2017. Hoffmann, J., Borgeaud, S.,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2024.emnlp-main 2018

[10] [10]

emnlp-main.598/

URL https://aclanthology.org/2024. emnlp-main.598/. Ilyas, A. and Engstrom, L. Magic: Near-optimal data attribu- tion for deep learning.arXiv preprint arXiv:2504.16430, 2025. Ilyas, A., Park, S. M., Engstrom, L., Leclerc, G., and Madry, A. Datamodels: Predicting predictions from training data. arXiv preprint arXiv:2202.00622, 2022. Juneja, J., Bansal, R.,...

arXiv 2024

[11] [11]

Kaplan, J., McCandlish, S., Henighan, T., Brown, T

URL https://openreview.net/forum? id=eiEIKGuqaf. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https: //arxiv.org/abs/2001.08361. Kirk, R., Mediratta, I., Nalmpantis, C., Luketina, J., Ham- bro, ...

Pith/arXiv arXiv 2001

[12] [12]

URL https://openreview.net/forum? id=VRhVS59yhP. Kuhn, T. S.The Essential Tension: Selected Studies in Scientific Tradition and Change. University of Chicago Press, Chicago, 1977. Leavitt, M. L. and Morcos, A. S. Selectivity considered harmful: evaluating the causal impact of class selectiv- ity in {dnn}s. InInternational Conference on Learning Representa...

work page doi:10.18653/v1/2024.acl-long.834 1977

[13] [13]

findings-emnlp.877/

URL https://aclanthology.org/2025. findings-emnlp.877/. Lu, C., Lu, C., Lange, R. T., Yamada, Y ., Hu, S., Foerster, J., Ha, D., and Clune, J. Towards end-to-end automation of AI research.Nature, 651:914–919, 2026. doi: 10.1038/ s41586-026-10265-5. Masud, S., Khan, M. A., Goyal, V ., Akhtar, M. S., and Chakraborty, T. Probing critical learning dynamics of...

work page doi:10.18653/v1/2024.findings-eacl 2025

[14] [15]

emnlp-main.41/

URL https://aclanthology.org/2024. emnlp-main.41/. M´eloux, M., Dirupo, G., Portet, F., and Peyrard, M. The dead salmons of ai interpretability.arXiv preprint arXiv:2512.18792, 2025. Michaelov, J. A., Levy, R. P., and Bergen, B. K. Language model behavioral phases are consistent across architec- ture, training data, and scale. InThe Thirty-ninth Annual Co...

arXiv 2024

[15] [16]

Mickel, J., De-Arteaga, M., Leqi, L., and Tian, K

URL https://openreview.net/forum? id=HenpVfO3Wp. Mickel, J., De-Arteaga, M., Leqi, L., and Tian, K. More of the same: Persistent representational harms under in- creased representation.Advances in Neural Information Processing Systems, 38:61277–61315, 2025. Mlodozeniec, B., Reid, I., Power, S., Krueger, D., Erdogdu, M., Turner, R. E., and Grosse, R. Distr...

arXiv 2025

[16] [17]

URL https://openreview.net/forum? id=frVo9MzRuU. Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y ., Chen, A., Conerly, T., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Johnston, S., Jones, A., Kernion, J., Lovitt, L., Ndousse, K., Amodei, D., Brown, T., Clark, J., Kaplan, J., McCandlish...

Pith/arXiv arXiv 2022

[17] [18]

V-DPO: Mitigating hallucination in large vision language models via vision-guided direct preference optimization

URL https://openreview.net/forum? id=3E8YNv1HjU. Qi, X., Zeng, Y ., Xie, T., Chen, P.-Y ., Jia, R., Mittal, P., and Henderson, P. Fine-tuning aligned language models compromises safety, even when users do not intend to! InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview. net/forum?id=hTEGyKf0dZ. 15 Position: B...

work page doi:10.18653/v1/ 2024

[18] [19]

Wang, Y ., Qu, W., Zhai, S., Jiang, Y ., Zichen, L., Liu, Y ., Dong, Y ., and Zhang, J

URL https://openreview.net/forum? id=K0E_F0gFDgA. Seshadri, P., Singh, S., and Elazar, Y . The bias amplification paradox in text-to-image generation. In Duh, K., Gomez, H., and Bethard, S. (eds.),Proceedings of the 2024 Con- ference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies (Volume 1: Lo...

work page doi:10.18653/v1/2024.naacl-long 2024

[19] [20]

naacl-long.353/

URL https://aclanthology.org/2024. naacl-long.353/. Shani, C., Reif, Y ., Roll, N., Jurafsky, D., and Shutova, E. The roots of performance disparity in multilingual language models: Intrinsic modeling difficulty or design choices?arXiv preprint arXiv:2601.07220, 2026. URL https://arxiv.org/abs/2601.07220. Singh, A. K., Chan, S. C., Moskovitz, T., Grant, E...

Pith/arXiv arXiv 2024

[20] [21]

Singh, S., Romanou, A., Fourrier, C., Adelani, D

URL https://openreview.net/forum? id=Of0GBzow8P. Singh, S., Romanou, A., Fourrier, C., Adelani, D. I., Ngui, J. G., Vila-Suero, D., Limkonchotiwat, P., Marchisio, K., Leong, W. Q., Susanto, Y ., Ng, R., Longpre, S., Ruder, S., Ko, W.-Y ., Bosselut, A., Oh, A., Martins, A., Choshen, L., Ippolito, D., Ferrante, E., Fadaee, M., Ermis, B., and Hooker, S. Glob...

2025

[21] [22]

Global MMLU : Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

doi: 10.18653/v1/2025.acl-long.919. URL https: //aclanthology.org/2025.acl-long.919/. 16 Position: Building a Science of AI Tao, C., Liu, Q., Dou, L., Muennighoff, N., Wan, Z., Luo, P., Lin, M., and Wong, N. Scaling laws with vocab- ulary: Larger models deserve larger vocabularies. In Globerson, A., Mackey, L., Belgrave, D., Fan, A., Pa- quet, U., Tomczak...

work page doi:10.18653/v1/2025.acl-long.919 2025

[22] [23]

Wei, J., Godbole, A., Khan, M

URL https://openreview.net/forum? id=jA235JGM09. Wei, J., Godbole, A., Khan, M. A., Wang, R. Y ., Zhu, X., Flemings, J., Kashyap, N., Gummadi, K. P., Neiswanger, W., and Jia, R. Hubble: a model suite to advance the study of LLM memorization. InThe Fourteenth International Conference on Learning Representations,

[23] [24]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

URL https://openreview.net/forum? id=ZfdnZhOP0k. Wright, W.Some aeronautical experiments. US Govern- ment Printing Office, 1901. Xu, Z.-Q. J., Zhang, Y ., Luo, T., Xiao, Y ., and Ma, Z. Fre- quency principle: Fourier analysis sheds light on deep neural networks.Communications in Computational Physics, 28(5):1746–1767, 2020. doi: 10.4208/cicp. OA-2020-0085...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.4208/cicp 1901