DOMD-GLB is the first algorithm for nonstationary GLBs with O(1) per-round costs, achieving dynamic regret bounds of order Õ(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}) for drifting and Õ(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}) for piecewise-stationary environments.
arXiv preprint arXiv:2003.10113 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
The paper establishes equilibrium existence and uniqueness for nonlinear utility consumer networks under contraction conditions and proposes a shape-constrained isotonic regression approach with strict no-regret convergence for learning utilities in targeted monopoly pricing.
DARLING augments RL with change detection to match minimax lower bounds on dynamic regret for piecewise stationary tabular and linear MDPs under separability and reachability conditions.
Dri-MED achieves Õ(κ d² log T / Δ̃) regret and Õ(d) constraint violations for drifting contextual bandits with personalized preferences and baseline constraints under practitioner-friendly assumptions.
citing papers explorer
-
Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent
DOMD-GLB is the first algorithm for nonstationary GLBs with O(1) per-round costs, achieving dynamic regret bounds of order Õ(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}) for drifting and Õ(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}) for piecewise-stationary environments.
-
Equilibrium and Pricing in Consumer Networks with Nonlinear Utilities: An Online Shape-Constrained Learning Approach
The paper establishes equilibrium existence and uniqueness for nonlinear utility consumer networks under contraction conditions and proposes a shape-constrained isotonic regression approach with strict no-regret convergence for learning utilities in targeted monopoly pricing.
-
DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees
DARLING augments RL with change detection to match minimax lower bounds on dynamic regret for piecewise stationary tabular and linear MDPs under separability and reachability conditions.
-
Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts
Dri-MED achieves Õ(κ d² log T / Δ̃) regret and Õ(d) constraint violations for drifting contextual bandits with personalized preferences and baseline constraints under practitioner-friendly assumptions.