Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

· 2026 · cs.AI · arXiv 2605.22166

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

LLM agents are shaped not only by their language models, but also by the runtime harness that mediates observation, tool use, action execution, feedback interpretation, and trajectory control. While existing agent adaptation methods mainly update model parameters, many failures in deterministic, rule-governed domains stem from mismatches at the model--environment interface. We propose Life-Harness, a lifecycle-aware runtime harness that improves frozen LLM agents without changing model weights or evaluation environments. Life-Harness evolves from training trajectories by converting recurring interaction failures into reusable interventions across environment contracts, procedural skills, action realization, and trajectory regulation, and remains fixed for evaluation on unseen tasks. On seven deterministic environments from $\tau$-bench, $\tau^2$-bench, and AgentBench, Life-Harness improves 116 out of 126 model--environment settings across 18 model backbones, with an average relative improvement of 88.5%. Harnesses evolved only from Qwen3-4B-Instruct trajectories transfer to 17 other models, showing that Life-Harness captures reusable environment-side structure rather than model-specific behavior. These results position runtime interface adaptation as a complementary alternative to model-centric agent training. Code is available at https://github.com/Tianshi-Xu/Life-Harness.

representative citing papers

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

cs.AI · 2026-06-02 · unverdicted · novelty 6.0

TBS is an interval-based multi-agent LLM simulation framework that separates structured internal evaluative states from public utterance generation and shows these states vary systematically with turn-allocation, silence, and memory conditions.

MUSE: A Unified Agentic Harness for MLLMs

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

MUSE is a unified agentic harness that improves off-the-shelf MLLMs on visual spatial planning, perception, multimodal reasoning, and fine-grained discrimination benchmarks through structured execution modules and verifier-guided repair without model retraining.

citing papers explorer

Showing 2 of 2 citing papers.

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation cs.AI · 2026-06-02 · unverdicted · none · ref 40 · internal anchor
TBS is an interval-based multi-agent LLM simulation framework that separates structured internal evaluative states from public utterance generation and shows these states vary systematically with turn-allocation, silence, and memory conditions.
MUSE: A Unified Agentic Harness for MLLMs cs.CV · 2026-06-02 · unverdicted · none · ref 50 · internal anchor
MUSE is a unified agentic harness that improves off-the-shelf MLLMs on visual spatial planning, perception, multimodal reasoning, and fine-grained discrimination benchmarks through structured execution modules and verifier-guided repair without model retraining.

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

fields

years

verdicts

representative citing papers

citing papers explorer