{"paper":{"title":"TMAS: Scaling Test-Time Compute via Multi-Agent Synergy","license":"http://creativecommons.org/licenses/by/4.0/","headline":"TMAS scales test-time compute for LLMs by organizing multi-agent collaboration with hierarchical memories and hybrid rewards.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Bryan Dai, Chuan Hao, Feng Chang, George Wu, Jian Yang, Ming Yang, Nan Jing, Qing Yi, Ran Tao, Yuan Wei","submitted_at":"2026-05-11T10:44:10Z","abstract_excerpt":"Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rounds, and verification-based feedback. However, existing structured test-time scaling methods either weakly coordinate parallel reasoning trajectories or rely on noisy historical information without explicitly deciding what should be retained and reused, limiting their ability to balance exploration "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"TMAS achieves stronger iterative scaling than existing test-time scaling baselines, while hybrid reward training further improves scaling effectiveness and stability across iterations.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the proposed hierarchical memories and hybrid reward reinforcement learning scheme will reliably balance exploration and exploitation in practice and that the experimental benchmarks and baselines used are representative of broader reasoning performance.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"TMAS scales test-time compute in LLMs via multi-agent collaboration with experience banks, guideline banks, and hybrid reward training to achieve stronger iterative scaling on reasoning benchmarks than prior methods.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"TMAS scales test-time compute for LLMs by organizing multi-agent collaboration with hierarchical memories and hybrid rewards.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"5d5533c002704b7ffc342599473be73b1d5eeae1a45eac7f4279c3fe0da55ea5"},"source":{"id":"2605.10344","kind":"arxiv","version":2},"verdict":{"id":"eca4e013-d63d-4c21-aa5d-b0d00036bd99","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T04:30:54.484353Z","strongest_claim":"TMAS achieves stronger iterative scaling than existing test-time scaling baselines, while hybrid reward training further improves scaling effectiveness and stability across iterations.","one_line_summary":"TMAS scales test-time compute in LLMs via multi-agent collaboration with experience banks, guideline banks, and hybrid reward training to achieve stronger iterative scaling on reasoning benchmarks than prior methods.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the proposed hierarchical memories and hybrid reward reinforcement learning scheme will reliably balance exploration and exploitation in practice and that the experimental benchmarks and baselines used are representative of broader reasoning performance.","pith_extraction_headline":"TMAS scales test-time compute for LLMs by organizing multi-agent collaboration with hierarchical memories and hybrid rewards."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.10344/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-19T15:35:41.178796Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T11:31:18.719914Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T09:26:50.234064Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"308b12da4ae65695a90fbeb3b76c5bd78a73a9406c4c3909e372ac2be9a54859"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"6d8382de12a9f657cd4fc19a1dcc9cc5e15bc8b2aba945be13d46b7b9172cd27"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}