Recognition: no theorem link
ASIA: an Autonomous System Identification Agent
Pith reviewed 2026-05-12 04:57 UTC · model grok-4.3
The pith
A large language model can serve as an autonomous agent to discover dynamical models by iterating hypotheses, code, and evaluations from only a plain-English problem description.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ASIA closes the loop between hypothesis, implementation, and evaluation in system identification without human intervention, requiring only a plain-English description of the identification problem.
What carries the argument
The ASIA framework that lets a large language model function as an autonomous coding agent for model discovery and training in dynamical systems.
If this is right
- The agent explores different model architectures and training strategies on its own.
- Resulting models are evaluated for quality on system identification benchmarks.
- The approach reduces the expert time needed for empirical tuning.
- Limitations include possible test data leakage and challenges to reproducibility.
Where Pith is reading between the lines
- This method could be applied to automate modeling in related fields such as control systems or time-series forecasting.
- Future versions might incorporate safeguards to ensure the agent does not access hidden test data during its search.
- Transparency tools could be added to log the agent's reasoning steps for better scientific validation.
Load-bearing premise
That the large language model can carry out effective iterative model search and discovery on system identification tasks without causing test leakage or reducing methodological transparency.
What would settle it
Testing the agent on a fresh benchmark dataset where no training examples overlap with its possible knowledge and verifying whether it still produces models whose accuracy matches or exceeds that of manually tuned ones.
Figures
read the original abstract
Over the years, research in system identification has provided a rich set of methods for learning dynamical models, together with well-established theoretical guarantees. In practice, however, the choice of model class, training algorithm, and hyperparameter tuning is still largely left to empirical trial-and-error, requiring substantial expert time and domain experience. Motivated by recent advances in agentic artificial intelligence, we present ASIA, a framework that delegates this iterative search to a large language model acting as an autonomous coding agent. Building on existing agentic platforms, ASIA closes the loop between hypothesis, implementation, and evaluation without human intervention, requiring only a plain-English description of the identification problem. We conduct an empirical study of ASIA on two system identification benchmarks and analyse the agent's search behaviour, the architectures and training strategies it discovers, and the quality of the resulting models. We also discuss the potential of the approach and its current limitations, including implicit test leakage, reduced methodological transparency, and reproducibility concerns.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ASIA, a framework that employs a large language model as an autonomous coding agent to perform system identification. Given only a plain-English description of the identification problem, the agent iteratively generates hypotheses, implements models in code, evaluates them, and refines the search without human intervention. The paper reports an empirical study on two system identification benchmarks, analyzing the agent's search behavior, the architectures and training strategies discovered, and the quality of the resulting models, while discussing limitations including implicit test leakage, reduced methodological transparency, and reproducibility concerns.
Significance. If the empirical results hold under proper controls, ASIA could automate a labor-intensive aspect of dynamical modeling that currently relies on expert trial-and-error, potentially accelerating research in control and system identification by leveraging recent agentic AI capabilities. The work provides a concrete demonstration of closing the hypothesis-implementation-evaluation loop from natural language input alone.
major comments (2)
- [Empirical study / Abstract] Empirical study section (and abstract): the central claim that ASIA autonomously closes the loop 'without human intervention' rests on an empirical evaluation whose supporting evidence is not quantified. No performance metrics, baseline comparisons, or model-quality statistics are reported for the two benchmarks, leaving it impossible to assess whether discovered models are competitive or merely functional.
- [Limitations / Empirical study] Limitations discussion and empirical evaluation: implicit test leakage is explicitly listed as a current limitation, yet the manuscript provides no description of controls (agent isolation from test splits, prompt auditing, data-access logging, or post-hoc verification) nor any quantitative bound on leakage risk. This directly weakens the 'plain-English input only' and autonomy assertions, as inadvertent exposure to evaluation data cannot be ruled out.
minor comments (1)
- [Abstract] The two benchmarks used in the empirical study are not named in the abstract or early sections; naming them (and providing brief descriptions) would improve clarity for readers unfamiliar with the specific identification tasks.
Simulated Author's Rebuttal
Thank you for the detailed and constructive review of our manuscript introducing ASIA. We appreciate the referee's focus on strengthening the empirical claims and limitations discussion. Below we respond point-by-point to the major comments and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Empirical study / Abstract] Empirical study section (and abstract): the central claim that ASIA autonomously closes the loop 'without human intervention' rests on an empirical evaluation whose supporting evidence is not quantified. No performance metrics, baseline comparisons, or model-quality statistics are reported for the two benchmarks, leaving it impossible to assess whether discovered models are competitive or merely functional.
Authors: We thank the referee for identifying this important gap. The current manuscript provides a qualitative description of the agent's search behavior, the model architectures and training strategies discovered, and a high-level assessment of model quality on the two benchmarks, but it does not include explicit quantitative performance metrics (such as normalized mean squared error or fit percentages), statistical summaries, or comparisons against standard system identification baselines. This does limit the ability to judge competitiveness. We will revise the empirical study section (and update the abstract accordingly) to report concrete performance statistics for the discovered models on both benchmarks, include baseline comparisons where feasible, and add tables or figures summarizing model quality. These additions will directly support the autonomy and effectiveness claims. revision: yes
-
Referee: [Limitations / Empirical study] Limitations discussion and empirical evaluation: implicit test leakage is explicitly listed as a current limitation, yet the manuscript provides no description of controls (agent isolation from test splits, prompt auditing, data-access logging, or post-hoc verification) nor any quantitative bound on leakage risk. This directly weakens the 'plain-English input only' and autonomy assertions, as inadvertent exposure to evaluation data cannot be ruled out.
Authors: We agree that the limitations section is currently too brief on this issue. While the manuscript flags implicit test leakage as a concern, it does not describe any implemented controls, auditing procedures, or attempt to bound the risk. We will substantially expand the limitations discussion to detail the experimental setup (including environment isolation and prompt construction practices), any post-experiment verification steps performed, and an honest assessment of remaining leakage pathways and their potential impact on the 'plain-English input only' claim. Where quantitative bounds are not available, we will explain the reasons and outline mitigation strategies for future work. revision: yes
Circularity Check
No circularity in empirical framework description
full rationale
The paper describes an agentic LLM framework for system identification and reports empirical results on two benchmarks. No mathematical derivations, equations, fitted parameters, or first-principles claims appear in the provided text or abstract. The central assertions rest on external benchmark evaluations rather than any self-referential reduction, self-citation chain, or ansatz smuggled via prior work. No load-bearing step reduces a prediction to its own inputs by construction, satisfying the default expectation of no significant circularity for non-derivational empirical papers.
Axiom & Free-Parameter Ledger
invented entities (1)
-
ASIA autonomous coding agent
no independent evidence
Reference graph
Works this paper leans on
-
[1]
From System Models to Class Models: An In-Context Learning Paradigm , year=
Forgione, Marco and Pura, Filippo and Piga, Dario , journal=. From System Models to Class Models: An In-Context Learning Paradigm , year=
-
[2]
Can Transformers Learn Optimal Filtering for Unknown Systems? , year=
Du, Zhe and Balim, Haldun and Oymak, Samet and Ozay, Necmiye , journal=. Can Transformers Learn Optimal Filtering for Unknown Systems? , year=
-
[3]
Nonlinear system identification for a nano-drone benchmark , journal =. 2026 , author =
work page 2026
-
[4]
Ljung, Lennart , title =
-
[5]
Three Benchmarks Addressing Open Challenges in Nonlinear System Identification , journal =. 2017 , note =
work page 2017
-
[6]
Journal of Machine Learning Research , volume =
Bergstra, James and Bengio, Yoshua , title =. Journal of Machine Learning Research , volume =
- [7]
-
[8]
and de Freitas, Nando , title =
Shahriari, Bobak and Swersky, Kevin and Wang, Ziyu and Adams, Ryan P. and de Freitas, Nando , title =. Proceedings of the IEEE , volume =
-
[9]
Rasmussen, Carl Edward and Williams, Christopher K. I. , title =
- [10]
-
[11]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , title =. arXiv preprint arXiv:2408.06292 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
2025 , howpublished =
work page 2025
-
[13]
Cascaded Tanks Benchmark -- Leaderboard , author =. 2024 , howpublished =
work page 2024
-
[14]
Nature Machine Intelligence , volume =
Hasani, Ramin and Lechner, Mathias and Amini, Alexander and Liebenwein, Lucas and Ray, Aaron and Tschaikowski, Max and Teschl, Gerald and Rus, Daniela , title =. Nature Machine Intelligence , volume =
-
[15]
Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure , booktitle =
-
[16]
Discovering model structure of dynamical systems with combinatorial
Rath, Lucas and von Rohr, Alexander and Schultze, Andreas and Trimpe, Sebastian and Corves, Burkhard , journal=. Discovering model structure of dynamical systems with combinatorial
- [17]
- [18]
- [19]
-
[20]
B.C. Able and R.A. Tagg and M. Rush , title=. Advances in Enzymology , address=. 1954 , volume=
work page 1954
- [21]
- [22]
-
[23]
Dictionary of the American Language
The American Heritage. Dictionary of the American Language
- [24]
- [25]
- [26]
- [27]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.