Structured Agent Distillation for Large Language Model

Changdi Yang; Dong Huang; Geng Yuan; Hao Tang; Jun Liu; Peiyan Dong; Pu Zhao; Tianqi Li; Wei Niu; Wenbin Zhang

arxiv: 2505.13820 · v5 · pith:BPZE5AAGnew · submitted 2025-05-20 · 💻 cs.LG · cs.AI· cs.CL

Structured Agent Distillation for Large Language Model

Jun Liu , Zhenglun Kong , Peiyan Dong , Changdi Yang , Tianqi Li , Hao Tang , Geng Yuan , Wei Niu

show 5 more authors

Wenbin Zhang Pu Zhao Xue Lin Dong Huang Yanzhi Wang

This is my paper

classification 💻 cs.LG cs.AIcs.CL

keywords agentslargedistillationagentlanguagemodelmodelsreasoning

0 comments

read the original abstract

Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into [REASON] and [ACT] spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Test-Time Deep Thinking to Explore Implicit Rules
cs.AI 2026-05 unverdicted novelty 6.0

TTExplore trains a 7B thinker via task-score RL to infer implicit rules at test time, raising agent success by 14-19 points on five embodied tasks.
SOD: Step-wise On-policy Distillation for Small Language Model Agents
cs.CL 2026-05 unverdicted novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents
cs.LG 2026-06 unverdicted novelty 5.0

IAPO is an RL method that aligns model input attributions with a teacher to improve tool-calling in multimodal SLMs, reporting 3% average VQA accuracy gains on Qwen2.5-VL-3B across six tests.