hub

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao · 2023

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

browse 12 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Benchmarking LLM-Driven Network Configuration Repair

cs.NI · 2026-04-24 · unverdicted · novelty 8.0

Cornetto is the first benchmark that synthesizes 231 network misconfiguration problems across topologies of 20-754 nodes and uses formal verification to show that nine state-of-the-art LLMs often introduce regressions and degrade at scale.

SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval

cs.AI · 2026-05-21 · conditional · novelty 7.0

SGR-Bench evaluates agentic LLM systems on state-gated retrieval tasks where evidence is only accessible after configuring site-specific states, with the strongest system reaching 66.18% item-level F1 and failures dominated by retrieval-scope drift.

Harnessing LLM Agents with Skill Programs

cs.AI · 2026-05-18 · conditional · novelty 6.0

HASP upgrades textual skills into executable Program Functions that intervene in LLM agent loops at inference, post-training, or self-evolution, delivering 25% gains over ReAct and 30.4% over Search-R1 on reasoning benchmarks.

STS: Efficient Sparse Attention with Speculative Token Sparsity

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.

GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.

PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

cs.CR · 2026-04-08 · unverdicted · novelty 6.0

PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.

RAGEN-2: Reasoning Collapse in Agentic RL

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

cs.CR · 2025-07-09 · conditional · novelty 6.0

Testing 18 LLMs found 94.4% vulnerable to direct prompt injection for malware installation, 83.3% to RAG backdoor attacks, and 100% to inter-agent trust exploitation in multi-agent systems.

Willful Disobedience: Automatically Detecting Failures in Agentic Traces

cs.SE · 2026-03-25 · unverdicted · novelty 5.0

AgentPex extracts rules from prompts and automatically flags specification violations in agent execution traces that outcome-only benchmarks miss.

Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation

cs.CL · 2026-03-15 · unverdicted · novelty 5.0

A small language model fine-tuned on tool-augmented chain-of-thought data generated by a larger LLM learns to selectively call tools, delivering better content moderation accuracy at lower inference cost.

AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems

q-fin.TR · 2026-05-01 · unverdicted · novelty 4.0

AgenticAITA proposes a training-free multi-agent LLM framework for autonomous trading using a deliberative pipeline, Z-score triggers, and safety gates, shown to run correctly in a five-day live dry-run with 157 invocations.

From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience

cs.AI · 2026-04-13 · unverdicted · novelty 4.0

ReflectiChain uses latent trajectory rehearsal and retrospective agentic RL inside an LLM world model to raise average step rewards by 250% and restore supply-chain operability from 13.3% to 88.5% on the Semi-Sim benchmark under extreme shocks.

citing papers explorer

Showing 12 of 12 citing papers.

Benchmarking LLM-Driven Network Configuration Repair cs.NI · 2026-04-24 · unverdicted · none · ref 46
Cornetto is the first benchmark that synthesizes 231 network misconfiguration problems across topologies of 20-754 nodes and uses formal verification to show that nine state-of-the-art LLMs often introduce regressions and degrade at scale.
SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval cs.AI · 2026-05-21 · conditional · none · ref 42
SGR-Bench evaluates agentic LLM systems on state-gated retrieval tasks where evidence is only accessible after configuring site-specific states, with the strongest system reaching 66.18% item-level F1 and failures dominated by retrieval-scope drift.
Harnessing LLM Agents with Skill Programs cs.AI · 2026-05-18 · conditional · none · ref 20
HASP upgrades textual skills into executable Program Functions that intervene in LLM agent loops at inference, post-training, or self-evolution, delivering 25% gains over ReAct and 30.4% over Search-R1 on reasoning benchmarks.
STS: Efficient Sparse Attention with Speculative Token Sparsity cs.LG · 2026-05-15 · unverdicted · none · ref 35
STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation cs.LG · 2026-05-12 · unverdicted · none · ref 4 · 2 links
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy cs.CR · 2026-04-08 · unverdicted · none · ref 22
PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.
RAGEN-2: Reasoning Collapse in Agentic RL cs.LG · 2026-04-07 · unverdicted · none · ref 65
Template collapse is a distinct failure mode in agentic RL invisible to entropy; mutual information proxies diagnose it better and SNR-aware filtering using reward variance improves input-dependent reasoning and task performance across planning, math, navigation, and code tasks.
The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise cs.CR · 2025-07-09 · conditional · none · ref 35
Testing 18 LLMs found 94.4% vulnerable to direct prompt injection for malware installation, 83.3% to RAG backdoor attacks, and 100% to inter-agent trust exploitation in multi-agent systems.
Willful Disobedience: Automatically Detecting Failures in Agentic Traces cs.SE · 2026-03-25 · unverdicted · none · ref 39
AgentPex extracts rules from prompts and automatically flags specification violations in agent execution traces that outcome-only benchmarks miss.
Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation cs.CL · 2026-03-15 · unverdicted · none · ref 13
A small language model fine-tuned on tool-augmented chain-of-thought data generated by a larger LLM learns to selectively call tools, delivering better content moderation accuracy at lower inference cost.
AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems q-fin.TR · 2026-05-01 · unverdicted · none · ref 13
AgenticAITA proposes a training-free multi-agent LLM framework for autonomous trading using a deliberative pipeline, Z-score triggers, and safety gates, shown to run correctly in a five-day live dry-run with 157 invocations.
From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience cs.AI · 2026-04-13 · unverdicted · none · ref 51
ReflectiChain uses latent trajectory rehearsal and retrospective agentic RL inside an LLM world model to raise average step rewards by 250% and restore supply-chain operability from 13.3% to 88.5% on the Semi-Sim benchmark under extreme shocks.

React: Synergizing reasoning and acting in language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer