hub

Gold-medalist performance in solving olympiad geometry with alphageometry2

Yuri Chervonyi, Trieu H Trinh, Miroslav Ol ˇs´ak, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V Le, Thang Luong · 2025 · arXiv 2502.03544

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

BEAVER: An Efficient Deterministic LLM Verifier

cs.AI · 2025-12-05 · unverdicted · novelty 7.0

BEAVER is the first practical deterministic verifier that maintains sound probability bounds on LLM safety properties using token tries and frontier data structures, finding 2-3x more violations than sampling at 1/10 the compute.

An Information-Theoretic Criterion for Efficient Data Synthesis

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Synthetic data improves models only in information-open generation-training loops with external signals, and coarser signals like binary correctness enable better generalization by converging to the most information-efficient component.

The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

The topological dual of a dataset is introduced as a transformation that encodes logical structures into topological ones to expose invariants in neural latent spaces for AlphaGeometry-style reasoning.

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

A hybrid pipeline lets an LLM write high-level proof sketches in a compact DSL that a lightweight kernel then expands into explicit, checkable obligations for reliable math and logic reasoning.

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

cs.AI · 2025-10-14 · unverdicted · novelty 6.0

Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.

Aristotle: IMO-level Automated Theorem Proving

cs.AI · 2025-10-01 · unverdicted · novelty 6.0

Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.

Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning

cs.CL · 2025-09-30 · unverdicted · novelty 6.0

KG-R1 trains a single RL agent to retrieve from and reason over knowledge graphs in one loop, achieving higher accuracy with fewer tokens than multi-module baselines and transferring to unseen graphs.

Riemann-Bench: A Benchmark for Moonshot Mathematics

cs.AI · 2026-04-08 · conditional · novelty 5.0

Riemann-Bench is a private benchmark of 25 research-level math problems on which all tested frontier AI models score below 10%.

Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation

cs.LG · 2026-05-01 · unverdicted · novelty 4.0

Pass-rate rewards in critic-free RL for code generation fail to outperform binary rewards because partial-pass solutions induce conflicting gradient directions that do not consistently favor full correctness.

AI for Mathematics: Progress, Challenges, and Prospects

math.HO · 2026-01-19 · unverdicted · novelty 4.0

AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.

AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research

cs.DC · 2025-12-18 · unverdicted · novelty 4.0

AI4EOSC is a federated cloud platform that integrates modular AI development, serverless AI-as-a-Service, and distributed orchestration with built-in FAIR metadata and provenance tracking for scientific AI workloads in EOSC.

citing papers explorer

Showing 11 of 11 citing papers.

BEAVER: An Efficient Deterministic LLM Verifier cs.AI · 2025-12-05 · unverdicted · none · ref 13
BEAVER is the first practical deterministic verifier that maintains sound probability bounds on LLM safety properties using token tries and frontier data structures, finding 2-3x more violations than sampling at 1/10 the compute.
An Information-Theoretic Criterion for Efficient Data Synthesis cs.LG · 2026-05-11 · unverdicted · none · ref 7
Synthetic data improves models only in information-open generation-training loops with external signals, and coarser signals like binary correctness enable better generalization by converging to the most information-efficient component.
The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data cs.AI · 2026-04-20 · unverdicted · none · ref 7
The topological dual of a dataset is introduced as a transformation that encodes logical structures into topological ones to expose invariants in neural latent spaces for AlphaGeometry-style reasoning.
ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning cs.AI · 2026-04-07 · unverdicted · none · ref 34
A hybrid pipeline lets an LLM write high-level proof sketches in a compact DSL that a lightweight kernel then expands into explicit, checkable obligations for reliable math and logic reasoning.
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics cs.AI · 2025-10-14 · unverdicted · none · ref 19
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
Aristotle: IMO-level Automated Theorem Proving cs.AI · 2025-10-01 · unverdicted · none · ref 6
Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning cs.CL · 2025-09-30 · unverdicted · none · ref 10
KG-R1 trains a single RL agent to retrieve from and reason over knowledge graphs in one loop, achieving higher accuracy with fewer tokens than multi-module baselines and transferring to unseen graphs.
Riemann-Bench: A Benchmark for Moonshot Mathematics cs.AI · 2026-04-08 · conditional · none · ref 3
Riemann-Bench is a private benchmark of 25 research-level math problems on which all tested frontier AI models score below 10%.
Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation cs.LG · 2026-05-01 · unverdicted · none · ref 4
Pass-rate rewards in critic-free RL for code generation fail to outperform binary rewards because partial-pass solutions induce conflicting gradient directions that do not consistently favor full correctness.
AI for Mathematics: Progress, Challenges, and Prospects math.HO · 2026-01-19 · unverdicted · none · ref 26
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.
AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research cs.DC · 2025-12-18 · unverdicted · none · ref 1
AI4EOSC is a federated cloud platform that integrates modular AI development, serverless AI-as-a-Service, and distributed orchestration with built-in FAIR metadata and provenance tracking for scientific AI workloads in EOSC.

Gold-medalist performance in solving olympiad geometry with alphageometry2

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer