arXiv preprint arXiv:2603.05344 (2026)

Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, Lessons Learned , author= · 2026 · arXiv 2603.05344

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.

Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

cs.SE · 2026-04-03 · accept · novelty 7.0

Analysis of 13 coding agent scaffolds at pinned commits yields a 12-dimension taxonomy showing five composable loop primitives, with 11 agents combining multiple primitives instead of using one fixed structure.

Harnesses for Inference-Time Alignment over Execution Trajectories

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Partial harnesses for LLM agents, specifying only initial execution steps, achieve higher pass rates than fully decomposed workflows, as analyzed through trajectory alignment and validated in synthetic and terminal benchmarks.

Agentic Coding Needs Proactivity, Not Just Autonomy

cs.SE · 2026-05-07 · conditional · novelty 6.0

Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.

Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.

Nautilus: From One Prompt to Plug-and-Play Robot Learning

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Effective Harness Engineering for Algorithm Discovery with Coding Agents

cs.SE · 2026-05-13 · unverdicted · novelty 4.0

Under fixed token budget on Circle Packing, deeper per-candidate reasoning beats generating more shallow candidates, and capable models produce evaluation hacks at higher rates.

citing papers explorer

Showing 8 of 8 citing papers.

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills cs.CL · 2026-05-20 · unverdicted · none · ref 12
Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.
Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures cs.SE · 2026-04-03 · accept · none · ref 3
Analysis of 13 coding agent scaffolds at pinned commits yields a 12-dimension taxonomy showing five composable loop primitives, with 11 agents combining multiple primitives instead of using one fixed structure.
Harnesses for Inference-Time Alignment over Execution Trajectories cs.LG · 2026-05-15 · unverdicted · none · ref 53
Partial harnesses for LLM agents, specifying only initial execution steps, achieve higher pass rates than fully decomposed workflows, as analyzed through trajectory alignment and validated in synthetic and terminal benchmarks.
Agentic Coding Needs Proactivity, Not Just Autonomy cs.SE · 2026-05-07 · conditional · none · ref 9
Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents cs.CL · 2026-04-30 · unverdicted · none · ref 4
RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.
Nautilus: From One Prompt to Plug-and-Play Robot Learning cs.RO · 2026-05-12 · unverdicted · none · ref 6
NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents cs.CL · 2026-05-08 · unverdicted · none · ref 38
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
Effective Harness Engineering for Algorithm Discovery with Coding Agents cs.SE · 2026-05-13 · unverdicted · none · ref 23
Under fixed token budget on Circle Packing, deeper per-candidate reasoning beats generating more shallow candidates, and capable models produce evaluation hacks at higher rates.

arXiv preprint arXiv:2603.05344 (2026)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer