pith. sign in

arxiv: 2505.13820 · v5 · pith:BPZE5AAGnew · submitted 2025-05-20 · 💻 cs.LG · cs.AI· cs.CL

Structured Agent Distillation for Large Language Model

classification 💻 cs.LG cs.AIcs.CL
keywords agentslargedistillationagentlanguagemodelmodelsreasoning
0
0 comments X
read the original abstract

Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into [REASON] and [ACT] spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Test-Time Deep Thinking to Explore Implicit Rules

    cs.AI 2026-05 unverdicted novelty 6.0

    TTExplore trains a 7B thinker via task-score RL to infer implicit rules at test time, raising agent success by 14-19 points on five embodied tasks.

  2. SOD: Step-wise On-policy Distillation for Small Language Model Agents

    cs.CL 2026-05 unverdicted novelty 6.0

    SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

  3. IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

    cs.LG 2026-06 unverdicted novelty 5.0

    IAPO is an RL method that aligns model input attributions with a teacher to improve tool-calling in multimodal SLMs, reporting 3% average VQA accuracy gains on Qwen2.5-VL-3B across six tests.