{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:WM7MTQMPHRMRSQMXUN5KL77VAK","short_pith_number":"pith:WM7MTQMP","schema_version":"1.0","canonical_sha256":"b33ec9c18f3c59194197a37aa5fff502b2dbe2336ead81683e2a36d1d3f7f89e","source":{"kind":"arxiv","id":"2605.14261","version":1},"attestation_state":"computed","paper":{"title":"Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Fix the heuristic value function before seeing evaluation data to avoid setting AIVAT sample variance pathologically low or enabling p-hacking via gradient descent on the test statistic.","cross_cats":["cs.GT"],"primary_cat":"cs.AI","authors_text":"Juho Kim, Tuomas Sandholm","submitted_at":"2026-05-14T02:04:26Z","abstract_excerpt":"How should an agent's performance in a multiagent environment be evaluated when there is a limited sample size or a high cost of running a trial? The AIVAT family of variance reduction techniques was proposed to address this challenge by introducing unbiased low-variance estimators of agents' expected payoffs. An important component of AIVAT is a heuristic value function that discriminates between potentially low- and high-value counterfactual histories. A notable gap in the literature is that there is little to no constraint or guideline on how the heuristic value function should be chosen or"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.14261","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2026-05-14T02:04:26Z","cross_cats_sorted":["cs.GT"],"title_canon_sha256":"ab6dfbb22e7ab7e2bdfe20d0f4886f2f5c33c566f624112e167eecd5912d6471","abstract_canon_sha256":"c43d33b7d07dc3ab06000cedcc56c348c5204a29ce2b93b11efa12dc815c454f"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:10.481647Z","signature_b64":"OWHCKeVghhryvu2P7FYN3dcu6rAro7eTpzl5a8OYvVe28rQr3z7vRY/bcWO4OJPY6YhFN8eOxaitOUKGg+QzAA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"b33ec9c18f3c59194197a37aa5fff502b2dbe2336ead81683e2a36d1d3f7f89e","last_reissued_at":"2026-05-17T23:39:10.481164Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:10.481164Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Fix the heuristic value function before seeing evaluation data to avoid setting AIVAT sample variance pathologically low or enabling p-hacking via gradient descent on the test statistic.","cross_cats":["cs.GT"],"primary_cat":"cs.AI","authors_text":"Juho Kim, Tuomas Sandholm","submitted_at":"2026-05-14T02:04:26Z","abstract_excerpt":"How should an agent's performance in a multiagent environment be evaluated when there is a limited sample size or a high cost of running a trial? The AIVAT family of variance reduction techniques was proposed to address this challenge by introducing unbiased low-variance estimators of agents' expected payoffs. An important component of AIVAT is a heuristic value function that discriminates between potentially low- and high-value counterfactual histories. A notable gap in the literature is that there is little to no constraint or guideline on how the heuristic value function should be chosen or"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The heuristic value function should be fixed prior to observing the evaluation data to prevent setting sample variance pathologically low or p-hacking via gradient descent; uncertainty propagation then enables further variance reduction via inverse-variance weighted averaging, yielding a 43.0% reduction in samples needed on 10,000 poker hands.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the heuristic uncertainty can be quantified and propagated in a way that produces meaningful further variance reduction without introducing biases or errors that invalidate the overall estimator, and that the poker dataset and parameterization choices generalize beyond the specific experiments.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"AIVAT heuristics can be gamed for pathological low variance or p-hacking unless fixed before data observation, and uncertainty propagation yields additional variance reduction at possible cost to unbiasedness.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Fix the heuristic value function before seeing evaluation data to avoid setting AIVAT sample variance pathologically low or enabling p-hacking via gradient descent on the test statistic.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"0eb2c595a219774f41dbaf587879ac75238a7c8c4b24e2cd014d281c676440aa"},"source":{"id":"2605.14261","kind":"arxiv","version":1},"verdict":{"id":"31915378-a19d-4330-bcb5-42870937a410","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:25:12.937142Z","strongest_claim":"The heuristic value function should be fixed prior to observing the evaluation data to prevent setting sample variance pathologically low or p-hacking via gradient descent; uncertainty propagation then enables further variance reduction via inverse-variance weighted averaging, yielding a 43.0% reduction in samples needed on 10,000 poker hands.","one_line_summary":"AIVAT heuristics can be gamed for pathological low variance or p-hacking unless fixed before data observation, and uncertainty propagation yields additional variance reduction at possible cost to unbiasedness.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the heuristic uncertainty can be quantified and propagated in a way that produces meaningful further variance reduction without introducing biases or errors that invalidate the overall estimator, and that the poker dataset and parameterization choices generalize beyond the specific experiments.","pith_extraction_headline":"Fix the heuristic value function before seeing evaluation data to avoid setting AIVAT sample variance pathologically low or enabling p-hacking via gradient descent on the test statistic."},"references":{"count":15,"sample":[{"doi":"","year":2013,"title":"N. Bard, J. Hawkin, J. Rubin, and M. Zinkevich. The annual computer poker competition.AI Magazine, 34(2):112–114, 2013","work_id":"5aaa6f67-ee08-45eb-b004-b264fe5e3c40","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2006,"title":"D. Billings and M. Kan. A tool for the direct assessment of poker decisions.ICGA Journal, 29 (3):119–142, 2006","work_id":"a228e639-4ad0-487f-be26-1b5fa900237b","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2008,"title":"M. Bowling, M. Johanson, N. Burch, and D. Szafron. Strategy evaluation in extensive games with importance sampling. InProceedings of the International Conference on Machine Learning (ICML), 2008","work_id":"47b9e524-79ae-4bdf-8263-e69204421d12","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018","work_id":"ef7b56a3-56e6-44b7-91f5-2599930721da","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019","work_id":"deee810e-0e04-44e2-a5ee-100c7117dc2d","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":15,"snapshot_sha256":"0983a5c845ad995a549d0df9964cc8955e22861b0059fc30ff04cfb60e23a397","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"061b76ccae3d6150967142fc882c726f96a2c584a9135cfb9ff84827a2f5100b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14261","created_at":"2026-05-17T23:39:10.481239+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14261v1","created_at":"2026-05-17T23:39:10.481239+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14261","created_at":"2026-05-17T23:39:10.481239+00:00"},{"alias_kind":"pith_short_12","alias_value":"WM7MTQMPHRMR","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"WM7MTQMPHRMRSQMX","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"WM7MTQMP","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK","json":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK.json","graph_json":"https://pith.science/api/pith-number/WM7MTQMPHRMRSQMXUN5KL77VAK/graph.json","events_json":"https://pith.science/api/pith-number/WM7MTQMPHRMRSQMXUN5KL77VAK/events.json","paper":"https://pith.science/paper/WM7MTQMP"},"agent_actions":{"view_html":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK","download_json":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK.json","view_paper":"https://pith.science/paper/WM7MTQMP","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14261&json=true","fetch_graph":"https://pith.science/api/pith-number/WM7MTQMPHRMRSQMXUN5KL77VAK/graph.json","fetch_events":"https://pith.science/api/pith-number/WM7MTQMPHRMRSQMXUN5KL77VAK/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK/action/timestamp_anchor","attest_storage":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK/action/storage_attestation","attest_author":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK/action/author_attestation","sign_citation":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK/action/citation_signature","submit_replication":"https://pith.science/pith/WM7MTQMPHRMRSQMXUN5KL77VAK/action/replication_record"}},"created_at":"2026-05-17T23:39:10.481239+00:00","updated_at":"2026-05-17T23:39:10.481239+00:00"}