{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2018:7WXAL45SI437SKSLUAJFVQBE3E","short_pith_number":"pith:7WXAL45S","schema_version":"1.0","canonical_sha256":"fdae05f3b24737f92a4ba0125ac024d922c879d667f47b8837bdc7c5263bce67","source":{"kind":"arxiv","id":"1803.06971","version":1},"attestation_state":"computed","paper":{"title":"What Doubling Tricks Can and Can't Do for Multi-Armed Bandits","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG","math.ST","stat.TH"],"primary_cat":"stat.ML","authors_text":"CNRS), Emilie Kaufmann (SEQUEL, Lilian Besson (IETR)","submitted_at":"2018-03-19T15:02:15Z","abstract_excerpt":"An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the \"Doubling Trick\". In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a ge"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"1803.06971","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"stat.ML","submitted_at":"2018-03-19T15:02:15Z","cross_cats_sorted":["cs.LG","math.ST","stat.TH"],"title_canon_sha256":"190644d1069c14da178d69355529b36cb56e104800d43032c1ec2cfa4408c36d","abstract_canon_sha256":"82bb2d2cb294f390c8c9ab5cf38bdd9a1654d80c9a6ce112b4d1ecfc4ca4b7c9"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T00:20:40.709562Z","signature_b64":"8THXyDJR7o1xEPhHHejuTC21FcTzClZDKa0ac9xL3js9SmDM6ITwZomQNYikPgjm+/mk/96iNCilyRKkTGXWCQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"fdae05f3b24737f92a4ba0125ac024d922c879d667f47b8837bdc7c5263bce67","last_reissued_at":"2026-05-18T00:20:40.708875Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T00:20:40.708875Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"What Doubling Tricks Can and Can't Do for Multi-Armed Bandits","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG","math.ST","stat.TH"],"primary_cat":"stat.ML","authors_text":"CNRS), Emilie Kaufmann (SEQUEL, Lilian Besson (IETR)","submitted_at":"2018-03-19T15:02:15Z","abstract_excerpt":"An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the \"Doubling Trick\". In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a ge"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1803.06971","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1803.06971","created_at":"2026-05-18T00:20:40.708984+00:00"},{"alias_kind":"arxiv_version","alias_value":"1803.06971v1","created_at":"2026-05-18T00:20:40.708984+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1803.06971","created_at":"2026-05-18T00:20:40.708984+00:00"},{"alias_kind":"pith_short_12","alias_value":"7WXAL45SI437","created_at":"2026-05-18T12:32:11.075285+00:00"},{"alias_kind":"pith_short_16","alias_value":"7WXAL45SI437SKSL","created_at":"2026-05-18T12:32:11.075285+00:00"},{"alias_kind":"pith_short_8","alias_value":"7WXAL45S","created_at":"2026-05-18T12:32:11.075285+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":8,"internal_anchor_count":5,"sample":[{"citing_arxiv_id":"2603.22348","citing_title":"Learning Safely Without Knowing the World:COMPASS-Hedge","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23351","citing_title":"Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21107","citing_title":"Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction","ref_index":288,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19584","citing_title":"Online Market Making and the Value of Observing the Order Book","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2602.00417","citing_title":"Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06190","citing_title":"Constrained Contextual Bandits with Adversarial Contexts","ref_index":278,"is_internal_anchor":false},{"citing_arxiv_id":"2605.04207","citing_title":"Optimal Semiparametric Dynamic Pricing with Feature Diversity","ref_index":76,"is_internal_anchor":false},{"citing_arxiv_id":"2604.21432","citing_title":"A single algorithm for both restless and rested rotting bandits","ref_index":8,"is_internal_anchor":false}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E","json":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E.json","graph_json":"https://pith.science/api/pith-number/7WXAL45SI437SKSLUAJFVQBE3E/graph.json","events_json":"https://pith.science/api/pith-number/7WXAL45SI437SKSLUAJFVQBE3E/events.json","paper":"https://pith.science/paper/7WXAL45S"},"agent_actions":{"view_html":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E","download_json":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E.json","view_paper":"https://pith.science/paper/7WXAL45S","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1803.06971&json=true","fetch_graph":"https://pith.science/api/pith-number/7WXAL45SI437SKSLUAJFVQBE3E/graph.json","fetch_events":"https://pith.science/api/pith-number/7WXAL45SI437SKSLUAJFVQBE3E/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E/action/timestamp_anchor","attest_storage":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E/action/storage_attestation","attest_author":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E/action/author_attestation","sign_citation":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E/action/citation_signature","submit_replication":"https://pith.science/pith/7WXAL45SI437SKSLUAJFVQBE3E/action/replication_record"}},"created_at":"2026-05-18T00:20:40.708984+00:00","updated_at":"2026-05-18T00:20:40.708984+00:00"}