pith. sign in

Cambridge University Press

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 4 2025 2

roles

background 2

polarities

background 2

representative citing papers

Offline Local Search for Online Stochastic Bandits

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.

Delightful Exploration

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Delight-gated exploration spends actions only when expected improvement times surprisal exceeds a gate price, recovers Pandora's reservation rule, and shows weaker regret growth than Thompson sampling or epsilon-greedy across bandits and MDPs with transferable hyperparameters.

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

citing papers explorer

Showing 6 of 6 citing papers.

  • Offline Local Search for Online Stochastic Bandits cs.LG · 2026-04-10 · unverdicted · none · ref 37

    A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.

  • A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies cs.LG · 2025-10-17 · unverdicted · none · ref 61

    Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.

  • Delightful Exploration cs.LG · 2026-05-13 · unverdicted · none · ref 9

    Delight-gated exploration spends actions only when expected improvement times surprisal exceeds a gate price, recovers Pandora's reservation rule, and shows weaker regret growth than Thompson sampling or epsilon-greedy across bandits and MDPs with transferable hyperparameters.

  • Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy cs.LG · 2026-05-08 · unverdicted · none · ref 20

    Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

  • An Efficient Algorithm for Minimizing Ordered Norms in Fractional Load Balancing cs.DS · 2025-11-14 · conditional · none · ref 56

    A randomized (1+ε)-approximation algorithm for ordered-norm load balancing uses O((n+d)(ε^{-2} + log log d) log(n+d)) linear-oracle calls via follow-the-regularized-leader prices and martingale progress analysis.

  • RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits stat.ML · 2026-03-11 · unverdicted · none · ref 17

    RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.