On the self-verification limitations of large language models on reasoning and planning tasks

Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati · 2024 · arXiv 2402.08115

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Zero-Shot Active Feature Acquisition via LLM-Elicitation

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

A framework elicits discriminative MRF statistics from an LLM and closes the model via maximum entropy to enable zero-shot active feature acquisition, outperforming baselines on IBD patient data especially for hardest cases.

CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.

Agentic AI Translate: An Agentic Translator Prototype for Translation as Communication Design

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

Describes a conceptual agentic prototype for AI translation that operationalizes skopos theory and GEMBA-MQM verification into a four-stage cycle with user dialogue and memory for coherence.

Weighted Rules under the Stable Model Semantics

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

cs.LG · 2025-05-21 · unverdicted · novelty 6.0

Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

cs.AI · 2026-05-13 · conditional · novelty 5.0

SPIN enforces DAG-valid plans and prefix-based stopping for LLM agents, cutting executed tasks from 1061 to 623 and tool calls from 11.81 to 6.82 per run on AssetOpsBench while raising success from 0.638 to 0.706.

U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

cs.AI · 2025-03-31 · unverdicted · novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

citing papers explorer

Showing 2 of 2 citing papers after filters.

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning cs.LG · 2025-05-21 · unverdicted · none · ref 76
Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems cs.AI · 2025-03-31 · unverdicted · none · ref 110
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

On the self-verification limitations of large language models on reasoning and planning tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer