Association for Computational Linguistics

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL · 2025 · arXiv 2511.16660

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

representative citing papers

ThinkProbe: Beyond Accuracy -- Structural Profiling of Open-Ended LLM Reasoning Traces via Non-Generative Thought Graphs

cs.CL · 2026-06-27 · unverdicted · novelty 7.0

ThinkProbe builds non-generative Thought Graphs from 4200 LLM traces across 7 models and 200 questions to extract 5D cognitive profiles, finding model-level stability in reasoning structure that exceeds domain effects in four dimensions.

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

ReasoningFlow represents LLM reasoning traces as DAGs, finding structural similarity across models and that most erroneous steps are unused in final answers.

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.

Faithful Mobile GUI Agents with Guided Advantage Estimator

cs.AI · 2026-05-02 · unverdicted · novelty 7.0

Faithful-Agent raises Trap SR in GUI agents from 13.88% to 80.21% via faithfulness-oriented SFT and GuAE-enhanced RFT with consistency rewards while retaining general performance.

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya

cs.AI · 2026-02-14 · conditional · novelty 7.0

Fine-tuning LLMs on Navya-Nyaya's six-phase reasoning structure yields 100% semantic correctness on held-out logical problems despite only 40% strict format adherence.

ReasonOps: Operator Segmentation for LLM Reasoning Traces

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

Unsupervised clustering on sentence-initial 3-token pivots extracts 7 universal reasoning operators from 44k traces across 12 LLMs that enable model fingerprinting and answer-correctness prediction.

Be Faithful When Response: Returning Fluent and Grounded Answers for Vision-Language Models Reinforcement Learning

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

Faithful Warm-Start pre-training on causally consistent vision-language samples improves accuracy, stabilizes RL, and reduces unsupported reasoning in VLMs.

citing papers explorer

Showing 6 of 6 citing papers after filters.

ThinkProbe: Beyond Accuracy -- Structural Profiling of Open-Ended LLM Reasoning Traces via Non-Generative Thought Graphs cs.CL · 2026-06-27 · unverdicted · none · ref 6
ThinkProbe builds non-generative Thought Graphs from 4200 LLM traces across 7 models and 200 questions to extract 5D cognitive profiles, finding model-level stability in reasoning structure that exceeds domain effects in four dimensions.
ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces cs.CL · 2026-06-03 · unverdicted · none · ref 6
ReasoningFlow represents LLM reasoning traces as DAGs, finding structural similarity across models and that most erroneous steps are unused in final answers.
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition cs.CL · 2026-05-12 · unverdicted · none · ref 146
DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
Faithful Mobile GUI Agents with Guided Advantage Estimator cs.AI · 2026-05-02 · unverdicted · none · ref 10
Faithful-Agent raises Trap SR in GUI agents from 13.88% to 80.21% via faithfulness-oriented SFT and GuAE-enhanced RFT with consistency rewards while retaining general performance.
ReasonOps: Operator Segmentation for LLM Reasoning Traces cs.AI · 2026-05-28 · unverdicted · none · ref 18
Unsupervised clustering on sentence-initial 3-token pivots extracts 7 universal reasoning operators from 44k traces across 12 LLMs that enable model fingerprinting and answer-correctness prediction.
Be Faithful When Response: Returning Fluent and Grounded Answers for Vision-Language Models Reinforcement Learning cs.AI · 2026-06-29 · unverdicted · none · ref 9
Faithful Warm-Start pre-training on causally consistent vision-language samples improves accuracy, stabilizes RL, and reduces unsupported reasoning in VLMs.

Association for Computational Linguistics

fields

years

verdicts

representative citing papers

citing papers explorer