{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2026:K7DK3EPKEU7T4AWJ3RO5UZJG7D","merge_version":"pith-open-graph-merge-v1","event_count":2,"valid_event_count":2,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"fb4e4db1b648018d451418e33b7ec4fc73a9d128389207f70fab1779c6f0e4f3","cross_cats_sorted":["stat.ML"],"license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-06T16:38:30Z","title_canon_sha256":"40eecbc0876d460c72284218363e9d6bb5ee69f1a5e5182b860ec739b153c06e"},"schema_version":"1.0","source":{"id":"2605.05102","kind":"arxiv","version":3}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2605.05102","created_at":"2026-06-23T01:13:05Z"},{"alias_kind":"arxiv_version","alias_value":"2605.05102v3","created_at":"2026-06-23T01:13:05Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.05102","created_at":"2026-06-23T01:13:05Z"},{"alias_kind":"pith_short_12","alias_value":"K7DK3EPKEU7T","created_at":"2026-06-23T01:13:05Z"},{"alias_kind":"pith_short_16","alias_value":"K7DK3EPKEU7T4AWJ","created_at":"2026-06-23T01:13:05Z"},{"alias_kind":"pith_short_8","alias_value":"K7DK3EPK","created_at":"2026-06-23T01:13:05Z"}],"graph_snapshots":[{"event_id":"sha256:e766ecf723dc11b0eb8c79b989415b6e5a0fa4b2ca943d613b0c691ea9a9dc84","target":"graph","created_at":"2026-06-23T01:13:05Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"As a special case, for multi-armed bandits with A arms and horizon T, we obtain a distributional regret bound of order O(√(AT log(1/δ))), confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time."},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"The derivation assumes that the exploration bonus parameters can be chosen arbitrarily while still yielding the stated bounds, and relies on the standard stochastic assumptions for rewards and transitions without specifying potential violations or edge cases."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"Presents a UCBVI-style algorithm achieving optimal distributional regret bounds O(sqrt(AT log(1/δ))) in multi-armed bandits, confirming a 2020 conjecture."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"A simple algorithm with tunable exploration bonuses yields distributional regret bounds for multi-armed bandits and episodic reinforcement learning."}],"snapshot_sha256":"dd9fb2d175986d6243337c5589e8921005c51f323746732c197c5785c7541e5c"},"formal_canon":{"evidence_count":2,"snapshot_sha256":"8364fc6e2186767889db7ef018028881f1dee5aa76110f6132b0b2cd0aba0765"},"integrity":{"available":true,"clean":true,"detectors_run":[{"findings_count":0,"name":"ai_meta_artifact","ran_at":"2026-05-20T10:37:39.798376Z","status":"completed","version":"1.0.0"},{"findings_count":0,"name":"doi_title_agreement","ran_at":"2026-05-19T21:31:19.591896Z","status":"completed","version":"1.0.0"},{"findings_count":0,"name":"doi_compliance","ran_at":"2026-05-19T13:49:46.026025Z","status":"completed","version":"1.0.0"}],"endpoint":"/pith/2605.05102/integrity.json","findings":[],"snapshot_sha256":"09f4102a090b8e933776367b57459590c28bbc8df646e030ef7d6ec3b53e6231","summary":{"advisory":0,"by_detector":{},"critical":0,"informational":0}},"paper":{"abstract_excerpt":"We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $\\delta \\in (0,1]$, thereby characterizing the regret distribution across the full range of $\\delta$. We present a simple UCBVI-style algorithm with exploration bonus $\\min\\{c_{1,k}/N, c_{2,k}/\\sqrt{N}\\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive gener","authors_text":"Harin Lee, Min-hwan Oh","cross_cats":["stat.ML"],"headline":"A simple algorithm with tunable exploration bonuses yields distributional regret bounds for multi-armed bandits and episodic reinforcement learning.","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-06T16:38:30Z","title":"Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning"},"references":{"count":0,"internal_anchors":0,"resolved_work":0,"sample":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2605.05102","kind":"arxiv","version":3},"verdict":{"created_at":"2026-05-08T17:59:44.111780Z","id":"a4c6f088-d70c-46eb-a615-3ea7323a39b7","model_set":{"reader":"grok-4.3"},"one_line_summary":"Presents a UCBVI-style algorithm achieving optimal distributional regret bounds O(sqrt(AT log(1/δ))) in multi-armed bandits, confirming a 2020 conjecture.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"A simple algorithm with tunable exploration bonuses yields distributional regret bounds for multi-armed bandits and episodic reinforcement learning.","strongest_claim":"As a special case, for multi-armed bandits with A arms and horizon T, we obtain a distributional regret bound of order O(√(AT log(1/δ))), confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time.","weakest_assumption":"The derivation assumes that the exploration bonus parameters can be chosen arbitrarily while still yielding the stated bounds, and relies on the standard stochastic assumptions for rewards and transitions without specifying potential violations or edge cases."}},"verdict_id":"a4c6f088-d70c-46eb-a615-3ea7323a39b7"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:3c3812b53fa84a312f1b56e68b471b79682c7c37132ddb08d2e85de6eabbf0f5","target":"record","created_at":"2026-06-23T01:13:05Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"fb4e4db1b648018d451418e33b7ec4fc73a9d128389207f70fab1779c6f0e4f3","cross_cats_sorted":["stat.ML"],"license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-06T16:38:30Z","title_canon_sha256":"40eecbc0876d460c72284218363e9d6bb5ee69f1a5e5182b860ec739b153c06e"},"schema_version":"1.0","source":{"id":"2605.05102","kind":"arxiv","version":3}},"canonical_sha256":"57c6ad91ea253f3e02c9dc5dda6526f8ea6ceab9e1bb0f5bd987335be5e419f3","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"57c6ad91ea253f3e02c9dc5dda6526f8ea6ceab9e1bb0f5bd987335be5e419f3","first_computed_at":"2026-06-23T01:13:05.443871Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-06-23T01:13:05.443871Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"bZyeuBzuwV+jJMCyri3LHGuJZuBEGGZzM+FGzoUNLO7bV7IVOhR5LfcC8WJB0aLFZ8W5lEYU4Rl3vocTnu9PAw==","signature_status":"signed_v1","signed_at":"2026-06-23T01:13:05.444570Z","signed_message":"canonical_sha256_bytes"},"source_id":"2605.05102","source_kind":"arxiv","source_version":3}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:3c3812b53fa84a312f1b56e68b471b79682c7c37132ddb08d2e85de6eabbf0f5","sha256:e766ecf723dc11b0eb8c79b989415b6e5a0fa4b2ca943d613b0c691ea9a9dc84"],"state_sha256":"eeb69f8be4bd7e3b9ac541bb1232ac5e0be0b10eb0680d60fc01cd6553444c3a"}