Pseudo-Formalization decomposes proofs into self-contained natural language modules for independent LLM-based Block Verification, outperforming LLM-as-judge baselines on olympiad and research math benchmarks while releasing ArxivMathGradingBench.
Escaping the cognitive well: Efficient competition math with off-the-shelf models.arXiv preprint arXiv:2602.16793
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
OpenDeepThink uses Bradley-Terry aggregation of LLM pairwise judgments to rank and evolve parallel reasoning traces, improving Gemini 3.1 Pro Codeforces Elo by 405 points over eight rounds.
citing papers explorer
No citing papers match the current filters.