Large Language Models are Better Reasoners with Self-Verification

Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, Jun Zhao · 2023 · DOI 10.18653/v1/2023.findings-emnlp.167

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Conceptual Steganography

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

Conceptual steganography encodes covert information in high-level reasoning patterns within LM chains-of-thought, remaining robust to paraphrase defenses while preserving reasoning utility.

Sch\"utzen: Evaluating LLM Safety in Bulgarian and German Contexts

cs.CL · 2026-06-09 · unverdicted · novelty 6.0

Schützen is a German-Bulgarian LLM safety dataset showing pronounced cross-language differences in model safety behavior.

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

ThinkBooster supplies a modular library, joint performance-efficiency benchmark, and deployable proxy for test-time compute scaling of LLM reasoning on math and coding tasks.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models

cs.AI · 2026-04-19

citing papers explorer

Showing 5 of 5 citing papers after filters.

Conceptual Steganography cs.CL · 2026-05-26 · unverdicted · none · ref 15
Conceptual steganography encodes covert information in high-level reasoning patterns within LM chains-of-thought, remaining robust to paraphrase defenses while preserving reasoning utility.
Sch\"utzen: Evaluating LLM Safety in Bulgarian and German Contexts cs.CL · 2026-06-09 · unverdicted · none · ref 149
Schützen is a German-Bulgarian LLM safety dataset showing pronounced cross-language differences in model safety behavior.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unverdicted · none · ref 40
A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.
ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning cs.CL · 2026-06-05 · unverdicted · none · ref 13
ThinkBooster supplies a modular library, joint performance-efficiency benchmark, and deployable proxy for test-time compute scaling of LLM reasoning on math and coding tasks.
A Survey on LLM-as-a-Judge cs.CL · 2024-11-23 · unverdicted · none · ref 173
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

Large Language Models are Better Reasoners with Self-Verification

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer