Title resolution pending

Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 baseline 1 dataset 1

citation-polarity summary

background 1 baseline 1 use dataset 1

representative citing papers

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

cs.SE · 2026-05-09 · unverdicted · novelty 7.0

PROBE structures runtime telemetry into diagnoses and evidence-grounded guidance, raising recovery rates by 12.45 points over baselines on 257 unresolved software repair and AIOps cases.

PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.

DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow

cs.HC · 2025-09-16 · unverdicted · novelty 6.0

DoubleAgents shows that a distributed-cognition design with coordination agent, dashboard, and policy module increases user comfort and reliance on AI agents for coordination tasks over time.

Governance by Construction for Generalist Agents

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

CUGA introduces a runtime governance architecture that enforces policies at five checkpoints in generalist agent execution pipelines for predictable and compliant behavior.

Robust Agent Compensation (RAC): Teaching AI Agents to Compensate

cs.AI · 2026-05-05 · unverdicted · novelty 5.0

RAC is a log-based recovery paradigm implemented as an architectural extension to agent frameworks, achieving 1.5-8X better latency and token economy than LLM-based recovery on τ-bench and REALM-Bench.

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

cs.AI · 2026-04-12 · unverdicted · novelty 5.0

Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.

citing papers explorer

Showing 6 of 6 citing papers.

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents cs.SE · 2026-05-09 · unverdicted · none · ref 37
PROBE structures runtime telemetry into diagnoses and evidence-grounded guidance, raising recovery rates by 12.45 points over baselines on 257 unresolved software repair and AIOps cases.
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 72
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow cs.HC · 2025-09-16 · unverdicted · none · ref 45
DoubleAgents shows that a distributed-cognition design with coordination agent, dashboard, and policy module increases user comfort and reliance on AI agents for coordination tasks over time.
Governance by Construction for Generalist Agents cs.AI · 2026-05-20 · unverdicted · none · ref 20
CUGA introduces a runtime governance architecture that enforces policies at five checkpoints in generalist agent execution pipelines for predictable and compliant behavior.
Robust Agent Compensation (RAC): Teaching AI Agents to Compensate cs.AI · 2026-05-05 · unverdicted · none · ref 34
RAC is a log-based recovery paradigm implemented as an architectural extension to agent frameworks, achieving 1.5-8X better latency and token economy than LLM-based recovery on τ-bench and REALM-Bench.
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis cs.AI · 2026-04-12 · unverdicted · none · ref 24
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer