CMATH: Can your language model pass Chinese elementary school math test?,

· 2023 · arXiv 2306.16636

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.

Validity-Calibrated Reasoning Distillation

cs.LG · 2026-04-14 · unverdicted · novelty 7.0 · 2 refs

Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.

Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

cs.LG · 2026-03-09 · unverdicted · novelty 6.0

CAMEL is a scaling law capturing nonlinear model-size and mixture interactions to extrapolate optimal data mixtures for large LLMs from small-model experiments, reducing optimization cost by 50% and improving benchmarks by up to 3%.

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

cs.LG · 2025-12-10 · conditional · novelty 6.0

LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.

Decoding in Order-Agnostic Language Models: Chain-Rule Deviation and Uniform Spreading

cs.CL · 2026-05-31 · unverdicted · novelty 5.0

OALMs exhibit order-dependent likelihoods up to 0.49 nats/token and a uniform confidence spread maximizes recoverability, motivating Var(log q_t) as a decoding diagnostic.

Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling

cs.LG · 2026-05-26 · unverdicted · novelty 5.0

Dense2MoE unifies pruning of attention modules with upcycling of MLPs into MoE experts to produce on-device LLMs that improve the latency-accuracy Pareto frontier.

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

AGPO adaptively sets trust-region size and exploration temperature from group reward dispersion, entropy, and KL drift, yielding higher scores than PPO and GRPO on nine math benchmarks under fixed token budget.

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

cs.CL · 2026-01-22 · unverdicted · novelty 5.0

MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.

Kimi K2: Open Agentic Intelligence

cs.LG · 2025-07-28 · unverdicted · novelty 5.0

Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

cs.CL · 2026-06-13 · unverdicted · novelty 4.0

Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.

PowLU: An Activation Function for Stable Pre-Training of LLMs

cs.CL · 2026-05-25 · unverdicted · novelty 4.0

PowLU replaces SwiGLU with a rational-power activation to reduce outlier amplification and numerical instability during large-scale LLM pre-training while matching performance.

Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

cs.AI · 2026-06-07 · unverdicted · novelty 3.0

An integrated survey organizing AI mathematical reasoning into informal, formal, discovery, and technique axes while cataloging benchmarks and assessing failure modes.

citing papers explorer

Showing 2 of 2 citing papers after filters.

LLaDA2.0: Scaling Up Diffusion Language Models to 100B cs.LG · 2025-12-10 · conditional · none · ref 36
LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.
Kimi K2: Open Agentic Intelligence cs.LG · 2025-07-28 · unverdicted · none · ref 84
Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

CMATH: Can your language model pass Chinese elementary school math test?,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer