{"paper":{"title":"Reasoning-Intensive Regression","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MENTAT improves reasoning-intensive regression by up to 65 percent over frozen LLM prompting and encoder fine-tuning.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Diane Tchuindjo, Omar Khattab","submitted_at":"2025-08-29T16:37:42Z","abstract_excerpt":"AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e., deducing subtle numerical scores from text. Unlike standard language regression tasks such as sentiment or similarity analysis, RiR often appears instead in ad-hoc applications such as rubric-based scoring, modeling dense rewards in complex environments, or domain-specific retrieval, where much deeper analysis of context is required while only limited task-specific training data and computation are available. We cast four realistic problems as RiR tasks t"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"MENTAT achieves up to 65% improvement over both prompting frozen LLMs and fine-tuning Transformer encoders on four realistic reasoning-intensive regression tasks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The four problems cast as RiR tasks are representative of the broader class of reasoning-intensive regression problems and the reported gains are not specific to the chosen benchmarks or evaluation protocol.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MENTAT improves performance on reasoning-intensive regression tasks by up to 65% over standard LLM prompting and encoder fine-tuning by combining batch prompt optimization with neural ensembles.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"MENTAT improves reasoning-intensive regression by up to 65 percent over frozen LLM prompting and encoder fine-tuning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"2aca125d6226a21a82968dec25e5fe8378af88398fb81ec8472001fe54827726"},"source":{"id":"2508.21762","kind":"arxiv","version":4},"verdict":{"id":"3b2342f4-6234-4e50-9595-0dd74919429f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-18T20:12:44.900780Z","strongest_claim":"MENTAT achieves up to 65% improvement over both prompting frozen LLMs and fine-tuning Transformer encoders on four realistic reasoning-intensive regression tasks.","one_line_summary":"MENTAT improves performance on reasoning-intensive regression tasks by up to 65% over standard LLM prompting and encoder fine-tuning by combining batch prompt optimization with neural ensembles.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The four problems cast as RiR tasks are representative of the broader class of reasoning-intensive regression problems and the reported gains are not specific to the chosen benchmarks or evaluation protocol.","pith_extraction_headline":"MENTAT improves reasoning-intensive regression by up to 65 percent over frozen LLM prompting and encoder fine-tuning."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2508.21762/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}