Prover-verifier games improve legibility of llm outputs

Jan Hendrik Kirchner, Yining Chen, Harri Edwards, Jan Leike, Nat McAleese, Yuri Burda · 2024 · arXiv 2407.13692

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

cs.AI · 2025-03-14 · conditional · novelty 7.0

Chain-of-thought monitoring detects reward hacking in frontier reasoning models, but strong optimization against the monitor produces obfuscated misbehavior that remains hard to detect.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

Common-agency Games for Multi-Objective Test-Time Alignment

cs.GT · 2026-05-08 · unverdicted · novelty 6.0

CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.

Pseudo-Formalization for Automatic Proof Verification

cs.LO · 2026-05-19 · unverdicted · novelty 5.0

Pseudo-Formalization decomposes natural language proofs into modular blocks for independent LLM verification via Block Verification, outperforming LLM-as-judge baselines on error detection in olympiad and research math benchmarks.

citing papers explorer

Showing 6 of 6 citing papers.

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities cs.LG · 2026-05-11 · unverdicted · none · ref 46
Presents a likelihood-based benchmark for equation-suffix prediction in technical papers with controls to detect shortcut vulnerabilities in model forecasts.
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation cs.AI · 2025-03-14 · conditional · none · ref 76
Chain-of-thought monitoring detects reward hacking in frontier reasoning models, but strong optimization against the monitor produces obfuscated misbehavior that remains hard to detect.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 17
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
CLORE: Content-Level Optimization for Reasoning Efficiency cs.AI · 2026-05-21 · unverdicted · none · ref 24
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
Common-agency Games for Multi-Objective Test-Time Alignment cs.GT · 2026-05-08 · unverdicted · none · ref 207
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
Pseudo-Formalization for Automatic Proof Verification cs.LO · 2026-05-19 · unverdicted · none · ref 14
Pseudo-Formalization decomposes natural language proofs into modular blocks for independent LLM verification via Block Verification, outperforming LLM-as-judge baselines on error detection in olympiad and research math benchmarks.

Prover-verifier games improve legibility of llm outputs

fields

years

verdicts

representative citing papers

citing papers explorer