hub

Title resolution pending

· 2026 · arXiv 2602.02475

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

DART: Semantic Recoverability for Structured Tool Agents

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

DART is a modular runtime that certifies semantically recoverable boundaries for failed tool-agent instances and selects admissible restore points that preserve downstream commitments or blocks recovery.

Holistic Evaluation and Failure Diagnosis of AI Agents

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

A span-decomposed evaluation framework for AI agents achieves state-of-the-art results on GAIA and SWE-Bench with up to 3.5x gains in localization accuracy by breaking traces into independent per-span judgments.

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

cs.SE · 2026-05-09 · unverdicted · novelty 7.0

PROBE structures runtime telemetry into diagnoses and evidence-grounded guidance, raising recovery rates by 12.45 points over baselines on 257 unresolved software repair and AIOps cases.

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

cs.AI · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

Insights Generator is a multi-agent system that generates evidence-backed natural-language insights characterizing systematic patterns across corpora of LLM agent execution traces.

Auditable Agents

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms for detect/enforce/recover.

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

cs.SE · 2026-02-28 · unverdicted · novelty 6.0

ContextCov compiles agent instruction files into static, runtime, and architectural guardrails, raising constraint compliance to 88.3% on SWE-bench Lite tasks versus 67% and 50.3% for prompt and reflection baselines.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

cs.CL · 2026-05-18 · unverdicted · novelty 5.0

SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.

Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.

citing papers explorer

Showing 10 of 10 citing papers.

DART: Semantic Recoverability for Structured Tool Agents cs.AI · 2026-05-22 · unverdicted · none · ref 26
DART is a modular runtime that certifies semantically recoverable boundaries for failed tool-agent instances and selects admissible restore points that preserve downstream commitments or blocks recovery.
Holistic Evaluation and Failure Diagnosis of AI Agents cs.AI · 2026-05-14 · unverdicted · none · ref 1
A span-decomposed evaluation framework for AI agents achieves state-of-the-art results on GAIA and SWE-Bench with up to 3.5x gains in localization accuracy by breaking traces into independent per-span judgments.
Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents cs.SE · 2026-05-09 · unverdicted · none · ref 3
PROBE structures runtime telemetry into diagnoses and evidence-grounded guidance, raising recovery rates by 12.45 points over baselines on 257 unresolved software repair and AIOps cases.
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents cs.AI · 2026-05-20 · unverdicted · none · ref 2 · 2 links
Insights Generator is a multi-agent system that generates evidence-backed natural-language insights characterizing systematic patterns across corpora of LLM agent execution traces.
Auditable Agents cs.AI · 2026-04-07 · unverdicted · none · ref 1
No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms for detect/enforce/recover.
ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files cs.SE · 2026-02-28 · unverdicted · none · ref 17
ContextCov compiles agent instruction files into static, runtime, and architectural guardrails, raising constraint compliance to 88.3% on SWE-bench Lite tasks versus 67% and 50.3% for prompt and reflection baselines.
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution cs.CL · 2026-05-18 · unverdicted · none · ref 4
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents cs.CL · 2026-05-08 · unverdicted · none · ref 24
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management cs.LG · 2026-05-07 · unverdicted · none · ref 16
PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.
Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents cs.AI · 2026-05-04 · unverdicted · none · ref 24
A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer