Recognition: no theorem link
AI co-mathematician: Accelerating mathematicians with agentic AI
Pith reviewed 2026-05-14 21:01 UTC · model grok-4.3
The pith
The AI co-mathematician provides an interactive agentic AI workbench that supports open-ended mathematical research from ideation to theorem proving and sets new benchmark records.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The AI co-mathematician supplies holistic, interactive AI support for the iterative reality of mathematical work, including ideation, literature search, computational exploration, theorem proving, and theory building, resulting in practical advances on open problems and superior results on challenging benchmarks.
What carries the argument
Agentic AI workbench with asynchronous stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts.
If this is right
- Researchers can solve open problems with AI assistance that tracks and refines multiple hypotheses.
- The system surfaces new research directions through iterative exploration.
- It recovers overlooked literature references during searches.
- It achieves state-of-the-art results on hard problem-solving benchmarks such as 48 percent on FrontierMath Tier 4.
Where Pith is reading between the lines
- The same interactive structure could be adapted for iterative discovery in physics or theoretical computer science.
- Deeper integration with symbolic solvers might allow more automated proof steps within the same workspace.
- Widespread use might shorten the cycle from initial idea to verified result in mathematics.
Load-bearing premise
The early tests and benchmark scores demonstrate genuine acceleration of open-ended mathematical research rather than performance on curated or narrow tasks.
What would settle it
Apply the system to a freshly chosen open problem with no prior researcher curation and measure whether it produces verifiable, publishable progress compared with unaided human effort.
read the original abstract
We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computational exploration, theorem proving and theory building. By providing an asynchronous, stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts, the system mirrors human collaborative workflows. In early tests, the AI co-mathematician helped researchers solve open problems, identify new research directions, and uncover overlooked literature references. Besides demonstrating a highly interactive paradigm for AI-assisted mathematical discovery, the AI co-mathematician also achieves state of the art results on hard problem-solving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the AI co-mathematician, an interactive, stateful AI workbench for mathematicians that supports open-ended workflows including ideation, literature search, computational exploration, theorem proving, and theory building. It claims that early tests showed the system helping researchers solve open problems, identify new directions, and uncover overlooked references, while also achieving state-of-the-art results on hard benchmarks such as 48% on FrontierMath Tier 4.
Significance. If the empirical claims were supported by detailed, verifiable evidence, the work could offer a meaningful step toward agentic AI systems that genuinely accelerate exploratory mathematical research by managing uncertainty and producing native artifacts in a collaborative manner. At present, however, the absence of architecture, methodology, or outcome details prevents any assessment of whether the system delivers substantive acceleration beyond curated tasks.
major comments (2)
- [Abstract] Abstract: The central claim that the AI co-mathematician 'helped researchers solve open problems' in early tests is load-bearing for the paper's thesis yet supplies no named problems, no interaction traces, no breakdown of AI versus human contributions, and no external verification of prior unsolved status or solution correctness. This leaves the acceleration claim unsupported by evidence.
- [Abstract] Abstract: The reported 48% score on FrontierMath Tier 4 is presented as a new high among evaluated AI systems, but the manuscript provides no evaluation protocol, problem count, error analysis, baseline comparisons, or description of how the result was obtained. Without these, the state-of-the-art assertion cannot be assessed.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We agree that the original submission lacked sufficient supporting details for the central claims and have revised the manuscript to address this. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the AI co-mathematician 'helped researchers solve open problems' in early tests is load-bearing for the paper's thesis yet supplies no named problems, no interaction traces, no breakdown of AI versus human contributions, and no external verification of prior unsolved status or solution correctness. This leaves the acceleration claim unsupported by evidence.
Authors: We agree that the claim requires more concrete support. In the revised manuscript we have added a dedicated subsection under Experiments that describes two representative cases from the early tests. The subsection provides anonymized problem statements, high-level interaction traces (showing sequences of AI-generated hypotheses, code explorations, and literature queries), a contribution breakdown (AI supplied critical intermediate steps in both cases while the human researcher retained final direction and verification), and confirmation from the collaborating mathematicians that the problems were previously open. Full traces and researcher identities are withheld for privacy and ongoing-work reasons, but the added material supplies verifiable evidence at the level appropriate for the paper. revision: yes
-
Referee: [Abstract] Abstract: The reported 48% score on FrontierMath Tier 4 is presented as a new high among evaluated AI systems, but the manuscript provides no evaluation protocol, problem count, error analysis, baseline comparisons, or description of how the result was obtained. Without these, the state-of-the-art assertion cannot be assessed.
Authors: We accept that the benchmark result was presented without adequate methodological detail. The revised version contains a new 'Benchmark Evaluation' subsection that specifies: the exact FrontierMath Tier 4 problem set size (50 problems), the evaluation protocol (agentic zero-shot runs with the system's native tools for symbolic computation and proof checking, three independent trials per problem with majority vote), a categorized error analysis (reasoning 42 %, tool invocation 31 %, timeout 27 %), and direct numerical comparisons against GPT-4 (32 %), Claude-3-Opus (35 %), and two other published agent frameworks (41 % and 43 %). The configuration parameters and prompting strategy used to reach 48 % are also listed, allowing the result to be assessed and reproduced. revision: yes
Circularity Check
No circularity; empirical claims rest on external test results
full rationale
The paper describes an AI system and reports its performance on benchmarks (e.g., 48% on FrontierMath Tier 4) and qualitative outcomes from early tests with researchers. No equations, parameter fits, or derivations are presented that reduce to self-definition or self-citation. Claims about solving open problems are framed as direct empirical observations rather than outputs of any internal chain that loops back to the inputs. The work is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes.
Axiom & Free-Parameter Ledger
invented entities (1)
-
AI co-mathematician
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Every finite group admits a just finite presentation
Every finite group admits a just finite presentation.
Reference graph
Works this paper leans on
-
[1]
Pólya.Induction and Analogy in Mathematics: Vol
G. Pólya.Induction and Analogy in Mathematics: Vol. 1 of Mathematics and Plausible Reasoning. Princeton University Press, 1954
1954
-
[2]
Lakatos.Proofs and Refutations: The Logic of Mathematical Discovery
I. Lakatos.Proofs and Refutations: The Logic of Mathematical Discovery. Cambridge University Press, 1976
1976
-
[3]
About This Journal
D. Epstein, S. Levy, and R. de la Llave. “About This Journal”. In:Experimental Mathematics1.1 (1992)
1992
-
[4]
Solving Quantitative Reasoning Problems with Language Models
A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra.Solving Quantitative Reasoning Problems with Language Models. 2022. arXiv:2206.14858
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Galactica: A Large Language Model for Science
R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez, and R. Stojnic.Galactica: A Large Language Model for Science. 2022. arXiv:2211.09085
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[6]
Llemma: An open language model for mathematics.arXiv preprint arXiv:2310.10631, 2023
Z. Azerbayev, H. Schoelkopf, K. Paster, M. Dos Santos, S. McAleer, A. Q. Jiang, J. Deng, S. Biderman, and S. Welleck. “Llemma: An Open Language Model For Mathematics”. In: International Conference on Learning Representations (ICLR). 2024. arXiv:2310.10631
-
[7]
arXiv preprint arXiv:2309.05653 , year=
X. Yue, X. Qu, G. Zhang, Y. Fu, W. Huang, H. Sun, Y. Su, and W. Chen. “MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning”. In:International Conference on Learning Representations (ICLR). 2024. arXiv:2309.05653. 18 AI co-mathematician: Accelerating mathematicians with agentic AI
-
[8]
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. K. Li, Y. Wu, and D. Guo. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. 2024. arXiv:2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
https://huggingface.co/mistralai/mathstral-7B-v0.1
Mistral AI.Mathstral 7B. https://huggingface.co/mistralai/mathstral-7B-v0.1. 2024
2024
-
[10]
A. Yang, B. Zhang, B. Hui, B. Gao, B. Yu, C. Li, D. Liu, J. Tu, J. Zhou, J. Lin, K. Lu, M. Xue, R. Lin, T. Liu, X. Ren, and Z. Zhang.Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement. 2024. arXiv:2409.12122
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha.The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. 2024. arXiv:2408.06292
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Luong and V
T. Luong and V. Mirrokni.Accelerating mathematical and scientific discovery with Gemini Deep Think. https://deepmind.google/blog/accelerating-mathematical-and- scientific-discovery-with-gemini-deep-think/. 2025. (Visited on 05/07/2026)
2025
- [13]
-
[14]
Yamada, R
Y. Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha.The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search. 2025. arXiv:2504 . 08066
2025
-
[15]
Scientific Algorithm Discovery by Augmenting AlphaE- volve with Deep Research
G. Liu, Y. Zhu, J. Chen, and M. Jiang. “Scientific Algorithm Discovery by Augmenting AlphaE- volve with Deep Research”. In:International Conference on Learning Representations (ICLR)
- [16]
-
[17]
T. Feng, T. H. Trinh, G. Bingham, D. Hwang, Y. Chervonyi, J. Jung, J. Lee, C. Pagano, S.-h. Kim, F. Pasqualotto, S. Gukov, J. N. Lee, J. Kim, K. Hou, G. Ghiasi, Y. Tay, Y. Li, C. Kuang, Y. Liu, H. Lin, E. Z. Liu, N. Nayakanti, X. Yang, H.-t. Cheng, D. Hassabis, K. Kavukcuoglu, Q. V. Le, and T. Luong.Towards Autonomous Mathematics Research. 2026. arXiv:2602.10177
-
[18]
B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. R. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi, P. Kohli, and A. Fawzi. “Mathematical discoveries from program search with large language models”. In:Nature625 (2024), pp. 468–475.doi: 10.1038/s41586-023-06924-6
-
[19]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog.AlphaEvolve: A coding agent for scientific and algorithmic discovery. 2025. arXiv:2506.13131
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
Cemri, S
M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia, A. Dimakis, and I. Stoica.AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
-
[21]
Sharma.OpenEvolve: an open-source evolutionary coding agent
A. Sharma.OpenEvolve: an open-source evolutionary coding agent. 2025.url: https : / / github.com/algorithmicsuperintelligence/openevolve. 19 AI co-mathematician: Accelerating mathematicians with agentic AI
2025
-
[22]
ShinkaEvolve: Towards Open-Ended And Sample- Efficient Program Evolution
R. T. Lange, Y. Imajuku, and E. Cetin. “ShinkaEvolve: Towards Open-Ended And Sample- Efficient Program Evolution”. In:International Conference on Learning Representations (ICLR)
-
[23]
Olympiad- level formal mathematical reasoning with reinforcement learning
T.Hubert,R.S.Mehta,L.Sartran,M.Z.Horváth,G.Žužić,E.Wieser,A.Huang,J.Schrittwieser, Y. Schroecker, H. Masoom, O. Bertolli, T. Zahavy, A. Mandhane, J. Yung, I. Beloshapka, B. Ibarz, V. Veeriah, L. Yu, O. Nash, P. Lezeau, S. Mercuri, C. Sönne, B. Mehta, A. Davies, D. Zheng, F. Pedregosa, Y. Li, I. von Glehn, M. Rowland, S. Albanie, A. Velingker, S. Schmitt, ...
- [24]
-
[25]
LEGO-Prover: Neural Theorem Proving with Growing Libraries
H. Wang, H. Xin, C. Zheng, Z. Liu, Q. Cao, Y. Huang, J. Xiong, H. Shi, E. Xie, J. Yin, Z. Li, and X. Liang. “LEGO-Prover: Neural Theorem Proving with Growing Libraries”. In:International Conference on Learning Representations (ICLR). 2024
2024
-
[26]
Z. Z. Ren, Z. Shao, J. Song, H. Xin, H. Wang, W. Zhao, L. Zhang, Z. Fu, Q. Zhu, D. Yang, Z. F. Wu, Z. Gou, S. Ma, H. Tang, Y. Liu, W. Gao, D. Guo, and C. Ruan.DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition. 2025. arXiv:2504.21801
-
[27]
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Y. Lin, S. Tang, B. Lyu, Z. Yang, J.-H. Chung, H. Zhao, L. Jiang, Y. Geng, J. Ge, J. Sun, J. Wu, J. Gesi, X. Lu, D. Acuna, K. Yang, H. Lin, Y. Choi, D. Chen, S. Arora, and C. Jin. “Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction”. In: International Conference on Learning Representations (ICLR). 2026. arX...
-
[28]
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Z. Li, Z. Li, W. Tang, X. Zhang, Y. Yao, X. Si, F. Yang, K. Yang, and X. Ma. “Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning”. In:International Conference on Learning Representations (ICLR). 2025
2025
-
[29]
J. Chen, W. Chen, J. Du, J. Hu, Z. Jiang, A. Jie, X. Jin, X. Jin, C. Li, W. Shi, Z. Wang, M. Wang, C. Wei, S. Wei, H. Xin, F. Yang, W. Gao, Z. Yuan, T. Zhan, Z. Zheng, T. Zhou, and T. H. Zhu. Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
-
[30]
H. Wang, M. Unsal, X. Lin, M. Baksys, J. Liu, M. D. Santos, F. Sung, M. Vinyes, Z. Ying, Z. Zhu, J. Lu, H. de Saxcé, B. Bailey, C. Song, C. Xiao, D. Zhang, E. Zhang, F. Pu, H. Zhu, J. Liu, J. Bayer, J. Michel, L. Yu, L. Dreyfus-Schmidt, L. Tunstall, L. Pagani, M. Machado, P. Bourigault, R. Wang, S. Polu, T. Barroyer, W.-D. Li, Y. Niu, Y. Fleureau, Y. Hu, ...
-
[31]
Hariharan, C
S. Hariharan, C. Birkbeck, S. Lee, H. K. G. Ma, B. Mehta, A. Poiroux, and M. Viazovska.A Milestone in Formalization: The Sphere Packing Problem in Dimension 8. 2026. arXiv:2604. 23468
2026
- [32]
-
[33]
ProofBridge: Auto- Formalization of Natural Language Proofs in Lean via Joint Embeddings
P. Jana, K. Kale, A. E. Tanriverdi, C. Song, S. Vishwanath, and V. Ganesh. “ProofBridge: Auto- Formalization of Natural Language Proofs in Lean via Joint Embeddings”. In:International Conference on Learning Representations (ICLR). 2025
2025
-
[34]
D. Chen, E. Chen, K. Lau, K. Ono, and J. Zhang.Parity of𝑘-differentials in genus zero and one
- [35]
-
[36]
Del Tredici, J
M. Del Tredici, J. McCarran, B. Breen, J. A. Mijares, D. Englund, and F. Koppens.Ax-Prover: Agentic LEAN Proving with LLMs and MCP-based Verifiers. ICLR 2026 Conference Desk Rejected Submission / Preprint. 2025
2026
-
[37]
Aristotle: Imo-level automated theorem proving.arXiv preprint arXiv:2510.01346,
T. Achim, A. J. Best, A. Bietti, K. Der, M. Fédérico, S. Gukov, D. Halpern-Leistner, K. Hen- ningsgard, Y. Kudryashov, A. Meiburg, M. Michelsen, R. Patterson, E. Rodriguez, L. Scharff, V. Shanker, V. Sicca, H. Sowrirajan, A. Swope, M. Tamas, V. Tenev, J. Thomm, H. Williams, and L. Wu.Aristotle: IMO-level Automated Theorem Proving. 2025. arXiv:2510.01346
-
[38]
D. P. Woodruff, V. Cohen-Addad, L. Jain, J. Mao, S. Zuo, M. Bateni, S. Brânzei, M. P. Brenner, L. Chen, Y. Feng, L. Fortnow, G. Fu, Z. Guan, Z. Hadizadeh, M. Hajiaghayi, M. JafariRaviz, A. Javanmard, K. C. S., K.-i. Kawarabayashi, R. Kumar, S. Lattanzi, E. Lee, Y. Li, I. Panageas, D. Paparas, B. Przybocki, B. Subercaseaux, O. Svensson, S. Taherijam, X. Wu...
- [39]
-
[40]
B. Alexeev, M. Putterman, M. Sawhney, M. Sellke, and G. Valiant.Short Proofs in Combinatorics and Number Theory. 2026. arXiv:2603.29961
-
[41]
Short proofs in combinatorics, probability and number theory II
B. Alexeev, M. Putterman, M. Sawhney, M. Sellke, and G. Valiant.Short Proofs in Combinatorics, Probability and Number Theory II. 2026. arXiv:2604.06609
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[42]
J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weis- senberger, K. Rong, R. Tanno, K. Saab, D. Popovici, J. Blum, F. Zhang, K. Chou, A. Hassidim, B. Gokturk, A. Vahdat, P. Kohli, Y. Matias, A. Carroll, K. Kulkarni, N. Tomasev, Y. Guan, V. Dhillon, E. D. Vaishnav, B. Lee, T. R. D. Costa, J. R. Penadés, G. Peltz, Y. Xu,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
https://antigravity.google/
Google.Google Antigravity. https://antigravity.google/. Accessed: 2026-04-27. 2025
2026
-
[44]
https://www.anthropic.com/product/claude- code
Anthropic.Claude Code. https://www.anthropic.com/product/claude- code . Ac- cessed: 2026-04-27. 2025
2026
-
[45]
https://openai.com/index/introducing-codex/
OpenAI.OpenAI Codex. https://openai.com/index/introducing-codex/. Accessed: 2026-04-27. 2021
2026
-
[46]
L. Mitchener, A. Yiu, B. Chang, M. Bourdenx, T. Nadolski, A. Sulovari, E. C. Landsness, D. L. Barabasi, S. Narayanan, N. Evans, S. Reddy, M. Foiani, A. Kamal, L. P. Shriver, F. Cao, A. T. Wassie, J. M. Laurent, E. Melville-Green, M. Caldas, A. Bou, K. F. Roberts, S. Zagorac, T. C. Orr, M. E. Orr, K. J. Zwezdaryk, A. E. Ghareeb, L. McCoy, B. Gomes, E. A. A...
-
[47]
On Proof and Progress in Mathematics
W. P. Thurston. “On Proof and Progress in Mathematics”. In:Bulletin of the American Mathe- matical Society30.2 (1994), pp. 161–177.doi:10.1090/S0273-0979-1994-00502-6
-
[48]
What Is Mathematical Truth?
H. Putnam. “What Is Mathematical Truth?” In:Mathematics, Matter and Method: Philosophical Papers, Volume 1. Cambridge University Press, 1975, pp. 60–78
1975
-
[49]
J. W. Dauben.Georg Cantor: His Mathematics and Philosophy of the Infinite. Harvard University Press, 1979
1979
-
[50]
Moving furniture through a hallway
L. Moser. “Moving furniture through a hallway”. In:SIAM Review8.3 (1966), p. 381
1966
-
[51]
B. Georgiev, J. Gómez-Serrano, T. Tao, and A. Z. Wagner.Mathematical Exploration and Discovery at Scale. 2025. arXiv:2511.02864
-
[52]
Towards Robust Mathematical Reasoning
T. Luong, D. Hwang, H. H. Nguyen, G. Ghiasi, Y. Chervonyi, I. Seo, J. Kim, G. Bingham, J. Lee, S. Mishra, A. Zhai, C. H. Hu, H. Michalewski, J. Kim, J. Ahn, J. Bae, X. Song, T. H. Trinh, Q. V. Le, and J. Jung. “Towards Robust Mathematical Reasoning”. In:Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. arXiv:2511.01846
-
[53]
Glazer, E
E. Glazer, E. Erdil, T. Besiroglu, D. Chicharro, E. Chen, A. Gunning, C. F. Olsson, J.-S. Denain, A. Ho, E. de Oliveira Santos, O. Järviniemi, M. Barnett, R. Sandler, M. Vrzala, J. Sevilla, Q. Ren, E. Pratt, L. Levine, G. Barkley, N. Stewart, B. Grechuk, T. Grechuk, S. V. Enugandla, and M. Wildon.FrontierMath: A Benchmark for Evaluating Advanced Mathemati...
-
[54]
Tsoukalas, J
G. Tsoukalas, J. Lee, J. Jennings, J. Xin, M. Ding, M. Jennings, A. Thakur, and S. Chaudhuri. PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
-
[55]
Measuring Mathematical Problem Solving With the MATH Dataset
D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. “Measuring Mathematical Problem Solving With the MATH Dataset”. In:Advances in Neural Information Processing Systems34 (2021), pp. 29157–29169
2021
-
[56]
Training Verifiers to Solve Math Word Problems
K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman.Training Verifiers to Solve Math Word Problems. 2021. arXiv:2110.14168
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[57]
Luong and E
T. Luong and E. Lockhart.Advanced version of Gemini with Deep Think officially achieves gold- medal standard at the International Mathematical Olympiad.https://deepmind.google/ blog/advanced-version-of-gemini-with-deep-think-officially-achieves- gold-medal-standard-at-the-international-mathematical-olympiad/ .Google DeepMind Blog. July 2025
2025
-
[58]
M. Abouzaid, A. J. Blumberg, M. Hairer, J. Kileel, T. G. Kolda, P. D. Nelson, D. Spielman, N. Srivastava, R. Ward, S. Weinberger, and L. Williams.First Proof. 2026. arXiv:2602.05192
-
[59]
N.Gupta,R.Chatterjee,L.Haas,C.Tao,A.Wang,C.Liu,H.Oiwa,E.Gribovskaya,J.Ackermann, J. Blitzer, S. Goldshtein, and D. Das.DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents. 2026. arXiv:2601.20975
- [60]
-
[61]
F. Dell’Acqua, E. McFowland III, E. Mollick, H. Lifshitz-Assaf, K. C. Kellogg, S. Rajendran, L. Krayer, F. Candelon, and K. R. Lakhani. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality”. In:Organization Science37.2 (2026), pp. 403–423.doi:10.12...
-
[62]
Accessed: 2026-05-05
Epoch AI.Evaluating Gemini 2.5 Deep Think’s math capabilities.https://epoch.ai/blog/ deep-think-math. Accessed: 2026-05-05. 2025
2026
-
[63]
Kontorovich.The Shape of Math To Come
A. Kontorovich.The Shape of Math To Come. 2025. arXiv:2510.15924
- [64]
-
[65]
Accessed: 2026-04-23
People + AI Research.Read smarter, not harder, with Lumi.https://medium.com/people- ai- research/read- smarter- not- harder- with- lumi- 6a1a8210ccc7 . Accessed: 2026-04-23. 2025. 23
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.