Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
Advances in Neural Information Processing Systems , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
ORBIT learns the (β-1)-smooth oracle price map via local polynomial approximation and bandit convex optimization in a semiparametric contextual pricing model, achieving regret Õ(T^{(2β-1)/(4β-3)} + √(dT)) with a matching lower bound for fixed d.
Learnability of adversarial noisy bandits is characterized by the convexified generalized maximin volume for oblivious adversaries and for adaptive adversaries when the arm space is countable.
Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.
citing papers explorer
-
Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds
Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
-
Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning
ORBIT learns the (β-1)-smooth oracle price map via local polynomial approximation and bandit convex optimization in a semiparametric contextual pricing model, achieving regret Õ(T^{(2β-1)/(4β-3)} + √(dT)) with a matching lower bound for fixed d.
-
On Characterizing Learnability for Adversarial Noisy Bandits
Learnability of adversarial noisy bandits is characterized by the convexified generalized maximin volume for oblivious adversaries and for adaptive adversaries when the arm space is countable.
-
Interpreting Reinforcement Learning Agents with Susceptibilities
Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
-
Optimal Online and Offline Algorithms for Contextual MNL with Applications to Assortment and Pricing
New algorithms for joint contextual MNL assortment and pricing deliver improved online regret bounds of order W sqrt(d T log N)/L0 and local suboptimality guarantees offline.