Integrity report for Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

A machine-verified record of the checks Pith has run against this paper: detector runs, findings, signed bundle events, and canonical identifiers.

arXiv:2605.12483 · pith:2026:URSNDDDQZODJVNCAIOQO6TBQP7

0Critical

0Advisory

9Detectors run

2026-05-26Last checked

Paper page arXiv integrity.json bundle.json

Detector runs

ai_meta_artifact completed v1.0.0 · findings 0 · 2026-05-26 20:46:07.546555+00:00

doi_compliance completed v1.0.0 · findings 0 · 2026-05-21 03:58:45.907817+00:00

doi_title_agreement completed v1.0.0 · findings 0 · 2026-05-21 01:31:33.687824+00:00

doi_compliance completed v1.0.0 · findings 0 · 2026-05-21 01:08:28.277099+00:00

doi_title_agreement completed v1.0.0 · findings 0 · 2026-05-21 01:01:33.184708+00:00

claim_evidence completed v1.0.0 · findings 0 · 2026-05-19 22:21:57.804122+00:00

ai_meta_artifact completed v1.0.0 · findings 0 · 2026-05-19 10:34:39.758162+00:00

doi_title_agreement completed v1.0.0 · findings 0 · 2026-05-19 08:01:17.948604+00:00

doi_compliance completed v1.0.0 · findings 0 · 2026-05-19 07:26:27.872311+00:00

Findings

No public integrity findings for this paper.

Signed record

The machine-readable record for this paper lives at /pith/URSNDDDQ/integrity.json. Pith Number bundles also include signed pith.integrity.v1 events where a Pith Number exists.