Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2):235–256

Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer · 2002

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Not all uncertainty is alike: volatility, stochasticity, and exploration

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

cs.AI · 2026-05-07 · accept · novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

DAPRO provides the first dynamic, theoretically guaranteed way to allocate interaction budgets across test cases for bounding time-to-event in multi-turn LLM evaluations, achieving tighter coverage than static conformal survival methods.

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

Learning Safely Without Knowing the World:COMPASS-Hedge

cs.LG · 2026-03-22

citing papers explorer

Showing 2 of 2 citing papers after filters.

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation cs.LG · 2026-05-07 · unverdicted · none · ref 61
DAPRO provides the first dynamic, theoretically guaranteed way to allocate interaction budgets across test cases for bounding time-to-event in multi-turn LLM evaluations, achieving tighter coverage than static conformal survival methods.
Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy cs.LG · 2026-05-08 · unverdicted · none · ref 4
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2):235–256

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer