Reasoning over mathematical objects: On-policy reward modeling and test time aggregation, 2026

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim, Ilia Kulikov, Jack Lanchantin, Xian Li, Tianjian Li, Bo Liu, Graham Neubig, Anaelia Ovalle, Swarnadeep Saha, Sainbayar Sukhbaatar, Sean Welleck, Jason Weston, Chenxi Whitehouse, Adina Wi · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

HRM-Text: Efficient Pretraining Beyond Scaling

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.

citing papers explorer

Showing 1 of 1 citing paper.

HRM-Text: Efficient Pretraining Beyond Scaling cs.CL · 2026-05-20 · unverdicted · none · ref 42
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.

Reasoning over mathematical objects: On-policy reward modeling and test time aggregation, 2026

fields

years

verdicts

representative citing papers

citing papers explorer