hub

Title resolution pending

· 2026 · arXiv 2602.02475

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

DART: Semantic Recoverability for Structured Tool Agents

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

DART is a modular runtime that certifies semantically recoverable boundaries for failed tool-agent instances and selects admissible restore points that preserve downstream commitments or blocks recovery.

Holistic Evaluation and Failure Diagnosis of AI Agents

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

A span-decomposed evaluation framework for AI agents achieves state-of-the-art results on GAIA and SWE-Bench with up to 3.5x gains in localization accuracy by breaking traces into independent per-span judgments.

Auditable Agents

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms for detect/enforce/recover.

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

cs.SE · 2026-02-28 · unverdicted · novelty 6.0

ContextCov compiles agent instruction files into static, runtime, and architectural guardrails, raising constraint compliance to 88.3% on SWE-bench Lite tasks versus 67% and 50.3% for prompt and reflection baselines.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

cs.CL · 2026-05-18 · unverdicted · novelty 5.0

SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.

Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

cs.AI · 2026-05-20 · 2 refs

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

cs.SE · 2026-05-09

citing papers explorer

Showing 2 of 2 citing papers after filters.

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents cs.AI · 2026-05-20 · unreviewed · ref 2 · 2 links
Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents cs.SE · 2026-05-09 · unreviewed · ref 3

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer