Cmath: Can your language model pass chinese elementary school math test?arXiv preprint arXiv:2306.16636

MMLU-Pro: A more robust, challenging multi-task language understanding benchmark · 2023 · arXiv 2306.16636

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.

Validity-Calibrated Reasoning Distillation

cs.LG · 2026-04-14 · unverdicted · novelty 7.0 · 2 refs

Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.

Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

cs.LG · 2026-03-09 · unverdicted · novelty 6.0

CAMEL is a scaling law capturing nonlinear model-size and mixture interactions to extrapolate optimal data mixtures for large LLMs from small-model experiments, reducing optimization cost by 50% and improving benchmarks by up to 3%.

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

cs.LG · 2025-12-10 · conditional · novelty 6.0

LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

cs.CL · 2026-01-22 · unverdicted · novelty 5.0

MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.

Kimi K2: Open Agentic Intelligence

cs.LG · 2025-07-28 · unverdicted · novelty 5.0

Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

citing papers explorer

Showing 6 of 6 citing papers.

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 26
Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.
Validity-Calibrated Reasoning Distillation cs.LG · 2026-04-14 · unverdicted · none · ref 25 · 2 links
Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.
Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization cs.LG · 2026-03-09 · unverdicted · none · ref 29
CAMEL is a scaling law capturing nonlinear model-size and mixture interactions to extrapolate optimal data mixtures for large LLMs from small-model experiments, reducing optimization cost by 50% and improving benchmarks by up to 3%.
LLaDA2.0: Scaling Up Diffusion Language Models to 100B cs.LG · 2025-12-10 · conditional · none · ref 36
LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.
Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow cs.CL · 2026-01-22 · unverdicted · none · ref 8
MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.
Kimi K2: Open Agentic Intelligence cs.LG · 2025-07-28 · unverdicted · none · ref 84
Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

Cmath: Can your language model pass chinese elementary school math test?arXiv preprint arXiv:2306.16636

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer