Foundations and Trends

Regret analysis of stochastic, nonstochastic multi-armed bandit problems , author= · 2012

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

math.PR · 2026-05-20 · unverdicted · novelty 7.0

Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.

Active Context Selection Improves Simple Regret in Contextual Bandits

cs.LG · 2026-05-19 · accept · novelty 7.0

Active sampling with allocation q_j proportional to p_j to the 2/3 achieves tight regret sqrt(n/T) times norm of p to the 2/3 for known context distribution p, with improvement up to Theta(k to the 1/4) over passive sampling.

Boundedly Rational Meta-Learning in Sequential Consumer Choice

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Consumers transfer brand-level regularities across contexts using low-D boundedly rational meta-learning approximations that fit choice data better than no-transfer or fully integrated Bayesian benchmarks.

Constrained Contextual Bandits with Adversarial Contexts

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.

Common-agency Games for Multi-Objective Test-Time Alignment

cs.GT · 2026-05-08 · unverdicted · novelty 6.0

CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.

citing papers explorer

Showing 6 of 6 citing papers.

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise math.PR · 2026-05-20 · unverdicted · none · ref 93
Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.
Active Context Selection Improves Simple Regret in Contextual Bandits cs.LG · 2026-05-19 · accept · none · ref 2
Active sampling with allocation q_j proportional to p_j to the 2/3 achieves tight regret sqrt(n/T) times norm of p to the 2/3 for known context distribution p, with improvement up to Theta(k to the 1/4) over passive sampling.
Boundedly Rational Meta-Learning in Sequential Consumer Choice cs.LG · 2026-05-15 · unverdicted · none · ref 129
Consumers transfer brand-level regularities across contexts using low-D boundedly rational meta-learning approximations that fit choice data better than no-transfer or fully integrated Bayesian benchmarks.
Constrained Contextual Bandits with Adversarial Contexts cs.LG · 2026-05-07 · unverdicted · none · ref 85
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction cs.LG · 2026-05-20 · unverdicted · none · ref 96
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
Common-agency Games for Multi-Objective Test-Time Alignment cs.GT · 2026-05-08 · unverdicted · none · ref 31
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.

Foundations and Trends

fields

years

verdicts

representative citing papers

citing papers explorer