Mixed citations

First Proof.arXiv preprint

Mohammed Abouzaid, Andrew J Blumberg, Martin Hairer, Joe Kileel, Tamara G Kolda, Paul D Nelson, Daniel Spielman, Nikhil Srivastava, Rachel Ward, Shmuel Weinberger, et al · 2026 · arXiv 2602.05192

Mixed citation behavior. Most common role is background (60%).

9 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 9 citing papers

citation-role summary

background 3 dataset 2

citation-polarity summary

background 3 use dataset 2

representative citing papers

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

cs.CL · 2026-05-09 · unverdicted · novelty 8.0 · 2 refs

Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.

AI co-mathematician: Accelerating mathematicians with agentic AI

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.

$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture

cs.MS · 2026-04-08 · accept · novelty 7.0

k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.

RMA: an Agentic System for Research-Level Mathematical Problems

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

Spectral Structure in Finite Free Information Inequalities and $p$-Stam Phase Transitions

math.PR · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

Computational discovery via FlowBoost supports conjectures on the singular values of the coupling matrix E_n being 2^{-k/2} independent of n, a sharp p=2 critical exponent for p-Stam inequalities, and bifurcation of extremals for p<2.

Automated Conjecture Resolution with Formal Verification

cs.LG · 2026-04-04 · unverdicted · novelty 6.0

An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.

pAI/MSc: ML Theory Research with Humans on the Loop

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.

Artificial Intelligence and the Structure of Mathematics

cs.AI · 2026-04-07 · unverdicted · novelty 4.0

AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.

citing papers explorer

Showing 9 of 9 citing papers.

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs cs.CL · 2026-05-09 · unverdicted · none · ref 1 · 2 links
Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.
AI co-mathematician: Accelerating mathematicians with agentic AI cs.AI · 2026-05-07 · unverdicted · none · ref 52 · 2 links
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture cs.MS · 2026-04-08 · accept · none · ref 1
k-server-bench formulates potential-function discovery for the k-server conjecture as a code-based inequality-satisfaction task; current agents fully solve the resolved k=3 case and reduce violations on the open k=4 case.
RMA: an Agentic System for Research-Level Mathematical Problems cs.AI · 2026-05-20 · unverdicted · none · ref 16
RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.
Spectral Structure in Finite Free Information Inequalities and $p$-Stam Phase Transitions math.PR · 2026-04-13 · unverdicted · none · ref 2 · 2 links
Computational discovery via FlowBoost supports conjectures on the singular values of the coupling matrix E_n being 2^{-k/2} independent of n, a sharp p=2 critical exponent for p-Stam inequalities, and bifurcation of extremals for p<2.
Automated Conjecture Resolution with Formal Verification cs.LG · 2026-04-04 · unverdicted · none · ref 1
An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
pAI/MSc: ML Theory Research with Humans on the Loop cs.AI · 2026-04-22 · unverdicted · none · ref 9
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.
Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations cs.AI · 2026-04-21 · unverdicted · none · ref 32
Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.
Artificial Intelligence and the Structure of Mathematics cs.AI · 2026-04-07 · unverdicted · none · ref 2
AI agents exploring Platonic mathematical structures via proof hypergraphs may reveal the overall architecture of formal mathematics and what makes parts of it human-accessible.

First Proof.arXiv preprint

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer