{"paper":{"title":"Compute Aligned Training: Optimizing for Test Time Inference","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Training LLMs with losses derived from test-time inference operators improves scaling over standard SFT and RL.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Adam Ousherovitch, Ambuj Tewari","submitted_at":"2026-04-27T19:52:38Z","abstract_excerpt":"Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the likelihood of individual samples under a base policy, creating a misalignment with test time procedures that rely on aggregated or filtered outputs. In this work, we propose Compute Aligned Training, which aligns training objectives with test-time strategies. By conceptualizing inference strategies as operators on the base policy, we derive new loss functions that"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"we provide empirical evidence that this training method substantially improves test time scaling over standard training.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That test-time inference strategies can be accurately modeled as operators on the base policy such that the derived losses produce stable and generalizable improvements without introducing new optimization pathologies.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Compute Aligned Training derives new loss functions by modeling test-time strategies as operators on the base policy, yielding empirical gains in test-time compute scaling over standard SFT and RL.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Training LLMs with losses derived from test-time inference operators improves scaling over standard SFT and RL.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8424ac93aa5bdbf66202b27db695a2a594ccab08874785978ebdffb55e4acb7f"},"source":{"id":"2604.24957","kind":"arxiv","version":2},"verdict":{"id":"d40d5277-5bf8-470d-b792-3f8d5517deca","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T04:09:47.504615Z","strongest_claim":"we provide empirical evidence that this training method substantially improves test time scaling over standard training.","one_line_summary":"Compute Aligned Training derives new loss functions by modeling test-time strategies as operators on the base policy, yielding empirical gains in test-time compute scaling over standard SFT and RL.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That test-time inference strategies can be accurately modeled as operators on the base policy such that the derived losses produce stable and generalizable improvements without introducing new optimization pathologies.","pith_extraction_headline":"Training LLMs with losses derived from test-time inference operators improves scaling over standard SFT and RL."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.24957/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_compliance","ran_at":"2026-05-19T21:35:19.056870Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"ce7eb09b67e461d6ad47a1db7ad8d61cf0464e2b14cb9a1ea6d9020b72d4067f"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}