Openmathinstruct-1: A 1.8 million math instruction tuning dataset.arXiv preprint arXiv: Arxiv-2402.10176

Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman · 2024 · cs.CL · arXiv 2402.10176

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

From Solvers to Research: Large Language Model-Driven Formal Mathematics at the Research Frontier

cs.CL · 2026-07-08 · accept · novelty 6.0

LLM formal provers must shift from competition solvers to research agents that handle open-ended, under-specified frontier mathematics under machine-checked rigor.

SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

SimCT enlarges the supervision space in cross-tokenizer on-policy distillation using short jointly tokenizable multi-token continuations, producing consistent gains over shared-token baselines on math and code benchmarks.

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

cs.LG · 2024-06-26 · conditional · novelty 6.0

Step-DPO performs preference optimization on individual reasoning steps rather than complete answers, producing nearly 3% accuracy gains on MATH for 70B+ parameter models with 10K preference pairs.

LACE: Lattice Attention for Cross-thread Exploration

cs.AI · 2026-04-16 · unverdicted · novelty 5.0 · 3 refs

LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.

Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning

cs.LG · 2025-05-12 · conditional · novelty 5.0

KRPO uses a Kalman filter to estimate latent prompt-level reward baselines from per-group rewards in GRPO, yielding better reward curves and accuracy on math reasoning benchmarks.

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

cs.CV · 2025-02-14 · unverdicted · novelty 4.0

Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.

citing papers explorer

Showing 6 of 6 citing papers.

From Solvers to Research: Large Language Model-Driven Formal Mathematics at the Research Frontier cs.CL · 2026-07-08 · accept · none · ref 225 · internal anchor
LLM formal provers must shift from competition solvers to research agents that handle open-ended, under-specified frontier mathematics under machine-checked rigor.
SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation cs.CL · 2026-05-08 · unverdicted · none · ref 50 · 2 links
SimCT enlarges the supervision space in cross-tokenizer on-policy distillation using short jointly tokenizable multi-token continuations, producing consistent gains over shared-token baselines on math and code benchmarks.
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs cs.LG · 2024-06-26 · conditional · none · ref 24
Step-DPO performs preference optimization on individual reasoning steps rather than complete answers, producing nearly 3% accuracy gains on MATH for 70B+ parameter models with 10K preference pairs.
LACE: Lattice Attention for Cross-thread Exploration cs.AI · 2026-04-16 · unverdicted · none · ref 35 · 3 links
LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning cs.LG · 2025-05-12 · conditional · none · ref 23
KRPO uses a Kalman filter to estimate latent prompt-level reward baselines from per-group rewards in GRPO, yielding better reward curves and accuracy on math reasoning benchmarks.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 146
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.

Openmathinstruct-1: A 1.8 million math instruction tuning dataset.arXiv preprint arXiv: Arxiv-2402.10176

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer