Title resolution pending

Yao, Shunyu, Zhao, Jeffrey, Yu, Dian, Du, Nan, Shafran, Izhak, Narasimhan, Karthik · 2023

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

cs.IR · 2026-04-23 · unverdicted · novelty 7.0

PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

cs.AI · 2026-04-20 · unverdicted · novelty 5.0

JTPRO co-optimizes prompts and tool descriptions via reflection to raise overall success rate by 5-20% over baselines on multi-tool benchmarks.

Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents

cs.MA · 2026-05-06 · unverdicted · novelty 4.0

The LMMP framework improves tool-calling accuracy and task success rates for Earth observation agents by grounding plans in multimodal features and remote sensing expert knowledge via a two-stage training process.

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

cs.AI · 2026-05-01

citing papers explorer

Showing 5 of 5 citing papers.

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs cs.IR · 2026-04-23 · unverdicted · none · ref 2
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 89
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents cs.AI · 2026-04-20 · unverdicted · none · ref 7
JTPRO co-optimizes prompts and tool descriptions via reflection to raise overall success rate by 5-20% over baselines on multi-tool benchmarks.
Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents cs.MA · 2026-05-06 · unverdicted · none · ref 43
The LMMP framework improves tool-calling accuracy and task success rates for Earth observation agents by grounding plans in multimodal features and remote sensing expert knowledge via a two-stage training process.
To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling cs.AI · 2026-05-01 · unreviewed · ref 18

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer