pith. sign in

arxiv: 2605.30889 · v1 · pith:FGRQ2PIInew · submitted 2026-05-29 · ⚛️ physics.chem-ph · cs.LG

MLIPilot: LLM-Driven Auto-Research for Machine-Learned Interatomic Potentials

classification ⚛️ physics.chem-ph cs.LG
keywords trainingagentsmlipilotauto-researchbenchmarkschangesconstraineddataset
0
0 comments X
read the original abstract

Constructing production-quality machine-learned interatomic potentials (MLIPs) requires balancing accuracy, dynamical stability, and computational throughput under constraints that are not captured by a single training loss. We introduce MLIPilot, an auto-research framework in which tool-calling large language models propose hypotheses, edit MLIP training code, launch HPC jobs, and accept or revert changes using a fixed, physically constrained scorecard. We evaluate MLIPilot on MACE potential optimization using both commercial and open-weight LLM agents, including GPT-5.5, GPT-4.1, Mistral-24B, and Qwen3-32B. The benchmarks span molecular and periodic settings: a QM7-derived dataset for which we generated B3LYP/6-31G(d) energies and forces, and a Cu EMT dataset with periodic copper supercells labeled by ASE's Effective Medium Theory calculator. Across these benchmarks, the strongest agents move initially constraint-violating baselines to accepted models by discovering useful training strategies, including output normalization, loss-function changes, progressive training schedules, and model-capacity adjustments. These results suggest that LLM agents can serve as autonomous operators for scientific machine-learning workflows when their search is constrained by domain-specific validation criteria, shifting part of MLIP development from manual trial-and-error toward auditable, automated experimentation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.