Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Scieval: A multi-level large language model evaluation benchmark for scientific research , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.

Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

DCM-Agent improves LLM performance on multi-paradigm optimization problems by 11-21% via dual-cluster memory construction and dynamic inference guidance.

LIMO: Less is More for Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.

Is a team only as strong as its weakest link? Quantifying the short-board effect with AI Agents

physics.soc-ph · 2026-05-08 · unverdicted · novelty 5.0

LLM multi-agent simulations reveal a cumulative product effect from multiple weak links on team performance and identify distinct capability regimes including a Sisyphus predicament.

citing papers explorer

Showing 4 of 4 citing papers.

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving cs.CL · 2026-04-23 · unverdicted · none · ref 107
OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.
Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving cs.CL · 2026-04-22 · unverdicted · none · ref 91
DCM-Agent improves LLM performance on multi-paradigm optimization problems by 11-21% via dual-cluster memory construction and dynamic inference guidance.
LIMO: Less is More for Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 131
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
Is a team only as strong as its weakest link? Quantifying the short-board effect with AI Agents physics.soc-ph · 2026-05-08 · unverdicted · none · ref 12
LLM multi-agent simulations reveal a cumulative product effect from multiple weak links on team performance and identify distinct capability regimes including a Sisyphus predicament.

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

fields

years

verdicts

representative citing papers

citing papers explorer