Title resolution pending

Yao, Shunyu, Zhao, Jeffrey, Yu, Dian, Du, Nan, Shafran, Izhak, Narasimhan, Karthik

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

CoCoDA co-evolves a typed compositional DAG of primitive and composite tools with the agent planner, using signature-based retrieval and a size-based reward to scale libraries efficiently and let an 8B model match or beat a 32B model on math and code benchmarks.

ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

cs.AI · 2026-05-04 · unverdicted · novelty 7.0

ANNEAL uses Failure-Driven Knowledge Acquisition to localize faults, generate validated symbolic patches, and commit persistent repairs to a knowledge graph, achieving 0% recurring failure rates where baselines retain 72-100%.

NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

NeuroMAS reframes multi-agent language systems as neural architectures where LLM agents learn coordination via reinforcement learning rather than predefined roles.

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

HELM raises long-horizon VLA success from 58.4% to 81.5% on LIBERO-LONG by combining episodic memory retrieval, learned failure prediction, and replanning, outperforming context extension or adaptation alone.

An Executable Benchmarking Suite for Tool-Using Agents

cs.SE · 2026-05-10 · unverdicted · novelty 5.0

The paper delivers a unified executable benchmarking suite for tool-using agents that enforces a shared evidence-admission contract across web, code, and micro-task environments.

AppAgent: Multimodal Agents as Smartphone Users

cs.CV · 2023-12-21 · unverdicted · novelty 5.0

AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.

SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning

cs.AI · 2026-04-20 · unverdicted · novelty 4.0

SPREG detects logical failures in LLM long-chain reasoning through real-time entropy spikes and performs structured plan repairs using historical distributions, reporting a 20% absolute accuracy gain on AIME25.

citing papers explorer

Showing 8 of 8 citing papers.

CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents cs.AI · 2026-05-08 · unverdicted · none · ref 2
CoCoDA co-evolves a typed compositional DAG of primitive and composite tools with the agent planner, using signature-based retrieval and a size-based reward to scale libraries efficiently and let an 8B model match or beat a 32B model on math and code benchmarks.
ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning cs.AI · 2026-05-04 · unverdicted · none · ref 23
ANNEAL uses Failure-Driven Knowledge Acquisition to localize faults, generate validated symbolic patches, and commit persistent repairs to a knowledge graph, achieving 0% recurring failure rates where baselines retain 72-100%.
NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning cs.AI · 2026-05-16 · unverdicted · none · ref 31
NeuroMAS reframes multi-agent language systems as neural architectures where LLM agents learn coordination via reinforcement learning rather than predefined roles.
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions cs.AI · 2026-05-13 · unverdicted · none · ref 16
A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation cs.LG · 2026-04-20 · unverdicted · none · ref 10
HELM raises long-horizon VLA success from 58.4% to 81.5% on LIBERO-LONG by combining episodic memory retrieval, learned failure prediction, and replanning, outperforming context extension or adaptation alone.
An Executable Benchmarking Suite for Tool-Using Agents cs.SE · 2026-05-10 · unverdicted · none · ref 4
The paper delivers a unified executable benchmarking suite for tool-using agents that enforces a shared evidence-admission contract across web, code, and micro-task environments.
AppAgent: Multimodal Agents as Smartphone Users cs.CV · 2023-12-21 · unverdicted · none · ref 88
AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.
SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning cs.AI · 2026-04-20 · unverdicted · none · ref 35
SPREG detects logical failures in LLM long-chain reasoning through real-time entropy spikes and performs structured plan repairs using historical distributions, reporting a 20% absolute accuracy gain on AIME25.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer