pith. sign in

hub

Prover-verifier games improve legibility of llm outputs

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

clear filters

representative citing papers

Tandem Reinforcement Learning with Verifiable Rewards

cs.AI · 2026-06-26 · unverdicted · novelty 7.0

TRL extends tandem training to RLVR pipelines, matching GRPO solo reasoning on Qwen3-4B math tasks while improving handoff robustness, reducing distributional drift, and increasing CoT legibility for the junior.

Pseudo-Formalization for Automatic Proof Verification

cs.LO · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

Pseudo-Formalization decomposes proofs into self-contained natural language modules for independent LLM-based Block Verification, outperforming LLM-as-judge baselines on olympiad and research math benchmarks while releasing ArxivMathGradingBench.

Self-Trained Verification for Training- and Test-Time Self-Improvement

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Self-trained verification trains verifiers to imitate informed versions of themselves using reference solutions, improving test-time V-R loops and training-time self-improvement with reported gains of 2x on hard math and 14x on scientific reasoning.

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

Calibrating Conservatism for Scalable Oversight

cs.AI · 2026-05-27 · unverdicted · novelty 5.0

CCO aggregates scoring functions into a calibrated penalty using conformal decision theory to enforce target violation rates for AI oversight on benchmarks like modified SWE-bench and MACHIAVELLI.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.