Recognition: unknown
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
read the original abstract
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees
RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.
-
Active Learning for Gaussian Process Regression Under Self-Induced Boltzmann Weights
AB-SID-iVAR enables Gaussian process active learning for self-induced Boltzmann distributions by closed-form approximation of the target, with high-probability error vanishing guarantees and empirical gains on PES and...
-
FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
-
Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments
Probability-of-Hit acquisition function ranks perturbation candidates by posterior probability of threshold exceedance, with asymptotic optimality proof and up to 6.4% gains on real immunology data.
-
Learning myopic mixed-integer nonlinear model predictive control from expert demonstrations
A myopic MINMPC framework learns a value function offline via inverse optimization from expert data, allowing short horizons with near-optimal performance and strict integer feasibility online for hybrid systems.
-
Spectral bandits
Spectral bandits achieve scalable regret in graph-structured recommendation by using an effective dimension to learn good policies from few node evaluations.
-
ADKO: Agentic Decentralized Knowledge Optimization
ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
-
Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
-
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
-
Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization
MAPFZ extends classical MAPF to non-unit integer costs on graphs with finite states, solved efficiently by CBS-NIC and Bayesian-optimized discretization, outperforming prior methods on benchmarks.
-
Bayesian Optimization of Crossbar-Based Compute-In-Memory System Design for Efficient DNN Inference
A multi-objective Bayesian optimization framework co-optimizes CIM crossbar hardware and DNN parameters for VGG8/CIFAR-10 and VGG16/Tiny-ImageNet, achieving comparable accuracy with up to 65% smaller area and 52% lowe...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.