Heimdall: test-time scaling on the generative verification.arXiv preprint arXiv:2504.10337, 2025

Wenlei Shi, Xing Jin · 2025 · arXiv 2504.10337

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

cs.LG · 2026-03-20 · unverdicted · novelty 7.0

SCRL adds selective positive pseudo-labeling and entropy-gated negative pseudo-labeling to test-time RL, reducing noise from weak consensus and improving LLM reasoning on benchmarks.

Pseudo-Formalization for Automatic Proof Verification

cs.LO · 2026-05-19 · unverdicted · novelty 5.0

Pseudo-Formalization decomposes natural language proofs into modular blocks for independent LLM verification via Block Verification, outperforming LLM-as-judge baselines on error detection in olympiad and research math benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time cs.LG · 2026-03-20 · unverdicted · none · ref 15
SCRL adds selective positive pseudo-labeling and entropy-gated negative pseudo-labeling to test-time RL, reducing noise from weak consensus and improving LLM reasoning on benchmarks.
Pseudo-Formalization for Automatic Proof Verification cs.LO · 2026-05-19 · unverdicted · none · ref 29
Pseudo-Formalization decomposes natural language proofs into modular blocks for independent LLM verification via Block Verification, outperforming LLM-as-judge baselines on error detection in olympiad and research math benchmarks.

Heimdall: test-time scaling on the generative verification.arXiv preprint arXiv:2504.10337, 2025

fields

years

verdicts

representative citing papers

citing papers explorer