hub

Proceedings of the 19th international conference on World wide web , pages=

A contextual-bandit approach to personalized news article recommendation , author=

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

browse 12 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

other 2

citation-polarity summary

unclear 2

representative citing papers

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.

Active Context Selection Improves Simple Regret in Contextual Bandits

cs.LG · 2026-05-19 · accept · novelty 7.0

Active sampling with allocation q_j proportional to p_j to the 2/3 achieves tight regret sqrt(n/T) times norm of p to the 2/3 for known context distribution p, with improvement up to Theta(k to the 1/4) over passive sampling.

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

cs.LG · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Anchor-TS defines arm indices as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean to correct distribution-shift bias and safely accelerate online learning with offline data.

Optimality of Sub-network Laplace Approximations: New Results and Methods

stat.ML · 2026-05-09 · conditional · novelty 7.0

Sub-network Laplace approximations always underestimate full-model predictive variance, and two new gradient-based and greedy selection rules provide theoretically grounded improvements.

Constrained Contextual Bandits with Adversarial Contexts

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.

The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions

cs.GT · 2026-05-03 · unverdicted · novelty 7.0

Online learning algorithms for bidding in repeated second-price auctions achieve rate-optimal regret by modeling ad value as a causal treatment effect and exploiting second-price payment information.

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.

Adaptive Instruction Composition for Automated LLM Red-Teaming

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

Adaptive Instruction Composition uses a neural contextual bandit with RL to adaptively combine crowdsourced texts, generating more effective and diverse LLM jailbreaks than random or prior adaptive methods on Harmbench.

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.

When Determinants Are Not Enough: Private Rare Switching

cs.LG · 2026-05-22 · unverdicted · novelty 5.0

Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

cs.LG · 2020-05-04 · unverdicted · novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · 2 refs

citing papers explorer

Showing 12 of 12 citing papers.

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time cs.LG · 2026-05-08 · unverdicted · none · ref 70
A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.
Active Context Selection Improves Simple Regret in Contextual Bandits cs.LG · 2026-05-19 · accept · none · ref 10
Active sampling with allocation q_j proportional to p_j to the 2/3 achieves tight regret sqrt(n/T) times norm of p to the 2/3 for known context distribution p, with improvement up to Theta(k to the 1/4) over passive sampling.
Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift cs.LG · 2026-05-11 · unverdicted · none · ref 11 · 2 links
Anchor-TS defines arm indices as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean to correct distribution-shift bias and safely accelerate online learning with offline data.
Optimality of Sub-network Laplace Approximations: New Results and Methods stat.ML · 2026-05-09 · conditional · none · ref 19
Sub-network Laplace approximations always underestimate full-model predictive variance, and two new gradient-based and greedy selection rules provide theoretically grounded improvements.
Constrained Contextual Bandits with Adversarial Contexts cs.LG · 2026-05-07 · unverdicted · none · ref 286
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
The (Marginal) Value of a Search Ad: An Online Causal Framework for Repeated Second-price Auctions cs.GT · 2026-05-03 · unverdicted · none · ref 23
Online learning algorithms for bidding in repeated second-price auctions achieve rate-optimal regret by modeling ad value as a causal treatment effect and exploiting second-price payment information.
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search cs.LG · 2026-05-01 · unverdicted · none · ref 45
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
Adaptive Instruction Composition for Automated LLM Red-Teaming cs.CR · 2026-04-22 · unverdicted · none · ref 65
Adaptive Instruction Composition uses a neural contextual bandit with RL to adaptively combine crowdsourced texts, generating more effective and diverse LLM jailbreaks than random or prior adaptive methods on Harmbench.
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction cs.LG · 2026-05-20 · unverdicted · none · ref 296
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
When Determinants Are Not Enough: Private Rare Switching cs.LG · 2026-05-22 · unverdicted · none · ref 86
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems cs.LG · 2020-05-04 · unverdicted · none · ref 185
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
Online Learning-to-Defer with Varying Experts stat.ML · 2026-05-12 · unreviewed · ref 146 · 2 links

Proceedings of the 19th international conference on World wide web , pages=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer