{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:7ZUQ5KB5VEW526FSQILW53V57T","short_pith_number":"pith:7ZUQ5KB5","schema_version":"1.0","canonical_sha256":"fe690ea83da92ddd78b282176eeebdfceeb2934820d5c20d88f090c454aebc12","source":{"kind":"arxiv","id":"2605.13414","version":1},"attestation_state":"computed","paper":{"title":"TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Language models lack the ability to prospectively plan task selection and compute allocation under fixed token budgets.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Shubhashis Roy Dipta, Zabir Al Nazi","submitted_at":"2026-05-13T12:10:05Z","abstract_excerpt":"Deploying language models as autonomous agents requires more than per-task accuracy: when an agent faces a queue of problems under a finite token budget, it must decide which to attempt, in what order, and how much compute to commit to each, all before any execution feedback is available. This is the prospective form of metacognitive control studied for decades in human cognition, yet whether language models possess it remains untested. We introduce TRIAGE, an evaluation framework in which a model receives a task pool and a token budget calibrated to its own baseline cost, and commits to a sin"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.13414","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.AI","submitted_at":"2026-05-13T12:10:05Z","cross_cats_sorted":[],"title_canon_sha256":"a375e050a58cd644c5cef07bb34107fe6d03402acdea9d48dec3a0033e003c02","abstract_canon_sha256":"44d0f49b9c04793a66614f6aea207565aa160c2cc22fddedee1562109460ff92"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T02:44:47.395401Z","signature_b64":"iryIO2xr7Ka+4mIxxsqmUfkbe2cgssX99DckRjiOHzaHUZ8EuqmAv9kZQTKChkeItAlx7bNGFMiW4PaPWs6XAg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"fe690ea83da92ddd78b282176eeebdfceeb2934820d5c20d88f090c454aebc12","last_reissued_at":"2026-05-18T02:44:47.394932Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T02:44:47.394932Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Language models lack the ability to prospectively plan task selection and compute allocation under fixed token budgets.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Shubhashis Roy Dipta, Zabir Al Nazi","submitted_at":"2026-05-13T12:10:05Z","abstract_excerpt":"Deploying language models as autonomous agents requires more than per-task accuracy: when an agent faces a queue of problems under a finite token budget, it must decide which to attempt, in what order, and how much compute to commit to each, all before any execution feedback is available. This is the prospective form of metacognitive control studied for decades in human cognition, yet whether language models possess it remains untested. We introduce TRIAGE, an evaluation framework in which a model receives a task pool and a token budget calibrated to its own baseline cost, and commits to a sin"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"current language models exhibit substantial gaps in prospective metacognitive control, revealing a previously unmeasured capability dimension with direct implications for resource-efficient agent deployment.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That an oracle with full knowledge of each problem's solvability and cost for the model provides a valid and unbiased benchmark for measuring prospective control, and that calibrating the token budget to the model's baseline cost does not introduce hindsight or selection effects.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"TRIAGE evaluates LLMs on prospective metacognitive control by requiring a single plan for task selection, sequencing, and token allocation under a calibrated budget, revealing substantial gaps in current models across math, science, code, and knowledge tasks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Language models lack the ability to prospectively plan task selection and compute allocation under fixed token budgets.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"170b6ac8554c06c09e981990a89ba24b1da55d4aa8f808b43e8c8f2f7999cfc5"},"source":{"id":"2605.13414","kind":"arxiv","version":1},"verdict":{"id":"0223dde0-07bd-48ce-a43f-067655055605","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:24:34.695836Z","strongest_claim":"current language models exhibit substantial gaps in prospective metacognitive control, revealing a previously unmeasured capability dimension with direct implications for resource-efficient agent deployment.","one_line_summary":"TRIAGE evaluates LLMs on prospective metacognitive control by requiring a single plan for task selection, sequencing, and token allocation under a calibrated budget, revealing substantial gaps in current models across math, science, code, and knowledge tasks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That an oracle with full knowledge of each problem's solvability and cost for the model provides a valid and unbiased benchmark for measuring prospective control, and that calibrating the token budget to the model's baseline cost does not introduce hindsight or selection effects.","pith_extraction_headline":"Language models lack the ability to prospectively plan task selection and compute allocation under fixed token budgets."},"references":{"count":64,"sample":[{"doi":"","year":2025,"title":"Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , booktitle =. Scaling. 2025 , note =","work_id":"0cb91b1e-bec9-4122-84b8-e9426161b424","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in","work_id":"60a87b92-c39d-4881-9add-d41add70e6ea","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Chen, Xingyu and Xu, Jiahao and Liang, Tian and He, Zhiwei and Pang, Jianhui and Yu, Dian and Song, Linfeng and Liu, Qiuzhi and Zhou, Mengfei and Zhang, Zhuosheng and Wang, Rui and Tu, Zhaopeng and Mi","work_id":"718dbaa9-b70e-463a-85cd-fb0ae1991365","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Thoughts Are All Over the Place: On the Underthinking of o1-Like","work_id":"596b64fb-852a-4893-8378-6eb8a3e1312d","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The danger of overthinking: Examining the reasoning-action dilemma in agentic tasks","work_id":"4600dc6b-7c39-4ddb-9872-e03cb8a7efae","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":64,"snapshot_sha256":"1e1ae142fa53662ef43c6e5b467dba593b934066fed9d3acf642760da632ed9c","internal_anchors":4},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.13414","created_at":"2026-05-18T02:44:47.395002+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.13414v1","created_at":"2026-05-18T02:44:47.395002+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.13414","created_at":"2026-05-18T02:44:47.395002+00:00"},{"alias_kind":"pith_short_12","alias_value":"7ZUQ5KB5VEW5","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"7ZUQ5KB5VEW526FS","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"7ZUQ5KB5","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T","json":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T.json","graph_json":"https://pith.science/api/pith-number/7ZUQ5KB5VEW526FSQILW53V57T/graph.json","events_json":"https://pith.science/api/pith-number/7ZUQ5KB5VEW526FSQILW53V57T/events.json","paper":"https://pith.science/paper/7ZUQ5KB5"},"agent_actions":{"view_html":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T","download_json":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T.json","view_paper":"https://pith.science/paper/7ZUQ5KB5","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.13414&json=true","fetch_graph":"https://pith.science/api/pith-number/7ZUQ5KB5VEW526FSQILW53V57T/graph.json","fetch_events":"https://pith.science/api/pith-number/7ZUQ5KB5VEW526FSQILW53V57T/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T/action/timestamp_anchor","attest_storage":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T/action/storage_attestation","attest_author":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T/action/author_attestation","sign_citation":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T/action/citation_signature","submit_replication":"https://pith.science/pith/7ZUQ5KB5VEW526FSQILW53V57T/action/replication_record"}},"created_at":"2026-05-18T02:44:47.395002+00:00","updated_at":"2026-05-18T02:44:47.395002+00:00"}