ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Zexi Liu , Jingyi Chai , Xinyu Zhu , Shuo Tang , Rui Ye , Bo Zhang , Lei Bai , Siheng Chen

Authors on Pith no claims yet

classification 💻 cs.CL cs.AIcs.LG

keywords agentstrainingagenticautonomouslearningml-agentb-sizedcomputational

read the original abstract

The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action step, accelerating experience collection and improving training efficiency; (3) an agentic ML-specific reward module, which unifies varied ML feedback signals into consistent rewards for RL optimization. Leveraging this framework, we train ML-Agent, driven by a 7B-sized Qwen-2.5 LLM for autonomous ML. Despite training on only 9 ML tasks, our 7B-sized ML-Agent achieves comparable performance to agents using much larger proprietary LLMs (e.g., GPT-5) but at significantly lower computational cost, demonstrating strong performance and cross-task generalization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AgenticRecTune: Multi-Agent with Self-Evolving Skillhub for Recommendation System Optimization
cs.IR 2026-04 unverdicted novelty 5.0

AgenticRecTune deploys five LLM agents (Actor, Critic, Insight, Skill, Online) and a self-evolving Skillhub to handle end-to-end configuration optimization for multi-stage recommendation systems.
AgenticRecTune: Multi-Agent with Self-Evolving Skillhub for Recommendation System Optimization
cs.IR 2026-04 unverdicted novelty 4.0

AgenticRecTune deploys Actor, Critic, Insight, Skill, and Online agents plus a self-evolving Skillhub to propose, filter, test, and learn from recommendation system configurations using Gemini LLMs.