Towards robust mathematical reasoning

Minh-Thang Luong, Dawsen Hwang, Hoang H Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk Seo, Junsu Kim, Garrett Bingham, Jonathan Lee, Swaroop Mishra, et al · 2025

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

dataset 2

citation-polarity summary

use dataset 2

representative citing papers

Pseudo-Formalization for Automatic Proof Verification

cs.LO · 2026-05-19 · unverdicted · novelty 7.0

Pseudo-Formalization decomposes proofs into self-contained natural language modules for independent LLM-based Block Verification, outperforming LLM-as-judge baselines on olympiad and research math benchmarks while releasing ArxivMathGradingBench.

Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.

Seed1.8 Model Card: Towards Generalized Real-World Agency

cs.AI · 2026-03-21 · unverdicted · novelty 5.0

Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Pseudo-Formalization for Automatic Proof Verification cs.LO · 2026-05-19 · unverdicted · none · ref 20
Pseudo-Formalization decomposes proofs into self-contained natural language modules for independent LLM-based Block Verification, outperforming LLM-as-judge baselines on olympiad and research math benchmarks while releasing ArxivMathGradingBench.
Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 43
Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.
Seed1.8 Model Card: Towards Generalized Real-World Agency cs.AI · 2026-03-21 · unverdicted · none · ref 42
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

Towards robust mathematical reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer