pith:K7DK3EPK
Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
A simple algorithm with tunable exploration bonuses yields distributional regret bounds for multi-armed bandits and episodic reinforcement learning.
arxiv:2605.05102 v3 · 2026-05-06 · cs.LG · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{K7DK3EPKEU7T4AWJ3RO5UZJG7D}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
As a special case, for multi-armed bandits with A arms and horizon T, we obtain a distributional regret bound of order O(√(AT log(1/δ))), confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time.
The derivation assumes that the exploration bonus parameters can be chosen arbitrarily while still yielding the stated bounds, and relies on the standard stochastic assumptions for rewards and transitions without specifying potential violations or edge cases.
Presents a UCBVI-style algorithm achieving optimal distributional regret bounds O(sqrt(AT log(1/δ))) in multi-armed bandits, confirming a 2020 conjecture.
Formal links
Receipt and verification
| First computed | 2026-06-23T01:13:05.443871Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
57c6ad91ea253f3e02c9dc5dda6526f8ea6ceab9e1bb0f5bd987335be5e419f3
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/K7DK3EPKEU7T4AWJ3RO5UZJG7D \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 57c6ad91ea253f3e02c9dc5dda6526f8ea6ceab9e1bb0f5bd987335be5e419f3
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "fb4e4db1b648018d451418e33b7ec4fc73a9d128389207f70fab1779c6f0e4f3",
"cross_cats_sorted": [
"stat.ML"
],
"license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-06T16:38:30Z",
"title_canon_sha256": "40eecbc0876d460c72284218363e9d6bb5ee69f1a5e5182b860ec739b153c06e"
},
"schema_version": "1.0",
"source": {
"id": "2605.05102",
"kind": "arxiv",
"version": 3
}
}