Importance Sampling in Expensive Finite-Sum Optimization via Contextual Bandit Methods
Pith reviewed 2026-05-09 23:55 UTC · model grok-4.3
The pith
Framing subset selection for stochastic average model methods as a contextual bandit problem lets the Exp4 algorithm use side information to create better sampling distributions than uniform randomization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that generating sampling distributions for SAM methods can be posed as a contextual bandit problem, and that applying the Exp4 algorithm to this formulation allows side information to be incorporated so that the resulting distributions improve upon uniform random selection, as shown in preliminary numerical results on synthetic problems.
What carries the argument
The Exp4 algorithm (Exponential weights for Exploration and Exploitation with Experts), which maintains and updates weights over expert policies to select context-dependent sampling distributions for subset choice in each SAM iteration.
Load-bearing premise
Side information such as lower-fidelity simulations or expert knowledge can be encoded as context vectors that let the bandit algorithm learn sampling distributions meaningfully superior to uniform randomization.
What would settle it
A controlled experiment on the same synthetic problems showing no statistically significant reduction in optimization error or iterations to convergence when using Exp4-guided sampling versus uniform random sampling would falsify the practical improvement claim.
Figures
read the original abstract
In computational science workflows, it is often the case that 1) objective functions for optimization involve multiple simulation outputs, and 2) those simulations can be performed (at least partially) in parallel. In this work, we reexamine past work on a class of randomized algorithms, stochastic average model (SAM) methods. SAM methods are conceptually similar to stochastic average gradient methods, and effectively require that only randomized subsets of simulation outputs be locally modeled in each iteration of a model-based optimization method. This work focuses on the question of how best to perform this randomization of subset selection, especially in settings where there exists useful side information such as alternative lower-fidelity simulations, pre-trained emulators or domain expertise from humans or AI models. In particular, we consider the problem of generating sampling distributions for SAM methods as a contextual bandit problem and we apply the Exponential weights algorithm for Exploration and Exploitation with Experts (Exp4). We provide some preliminary numerical results on synthetic problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reexamines Stochastic Average Model (SAM) methods for expensive finite-sum optimization problems involving multiple simulation outputs that can be partially parallelized. It recasts the problem of generating sampling distributions for randomized subset selection as a contextual bandit problem and applies the Exp4 algorithm to incorporate side information such as lower-fidelity simulations, pre-trained emulators, or domain expertise. Preliminary numerical results on synthetic problems are reported.
Significance. If the central claim holds, the work could improve efficiency in simulation-based optimization by adaptively using contextual side information to reduce variance in SAM subset selection. The choice of Exp4 is a natural fit for the multi-expert bandit setting, and the emphasis on practical side information addresses a real need in computational workflows. However, the current evidence is limited to synthetic cases, so broader significance depends on further validation.
major comments (2)
- [Numerical experiments] The numerical experiments section provides only preliminary results on synthetic problems but omits all details on experimental design: no description of baselines (uniform sampling, static importance sampling, or other bandit variants), performance metrics (convergence iterations, variance of gradient estimates, or wall-clock time), number of independent runs, or statistical tests. This leaves the claimed improvement over standard randomized selection unsupported.
- [Method] The method section does not define the instantaneous reward signal (e.g., whether it is a function of local model error, gradient contribution, or simulation output) nor how context vectors are constructed from the mentioned side information. Without these definitions it is impossible to verify whether Exp4 can exploit the context to produce non-uniform distributions that meaningfully outperform uniform or static sampling.
minor comments (1)
- [Abstract] The abstract could more explicitly state the precise form of the reward and context used in the Exp4 application.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We agree that more details are required in both the method and numerical experiments sections. We address each major comment below and will make the necessary revisions to the manuscript.
read point-by-point responses
-
Referee: [Numerical experiments] The numerical experiments section provides only preliminary results on synthetic problems but omits all details on experimental design: no description of baselines (uniform sampling, static importance sampling, or other bandit variants), performance metrics (convergence iterations, variance of gradient estimates, or wall-clock time), number of independent runs, or statistical tests. This leaves the claimed improvement over standard randomized selection unsupported.
Authors: We acknowledge the validity of this observation. The presented results are indeed preliminary and lack the full experimental details. In the revised version, we will provide a complete description of the experimental design, including the baselines considered (uniform sampling, static importance sampling, and alternative bandit algorithms), the specific performance metrics used (convergence in terms of iterations, variance of the gradient estimates, and wall-clock time), the number of independent runs, and appropriate statistical tests to support the improvements over standard randomized selection. revision: yes
-
Referee: [Method] The method section does not define the instantaneous reward signal (e.g., whether it is a function of local model error, gradient contribution, or simulation output) nor how context vectors are constructed from the mentioned side information. Without these definitions it is impossible to verify whether Exp4 can exploit the context to produce non-uniform distributions that meaningfully outperform uniform or static sampling.
Authors: We agree that these definitions are essential for clarity and reproducibility. The revised manuscript will include explicit definitions of the instantaneous reward signal, detailing its computation based on local model error, gradient contributions, or simulation outputs. We will also specify the construction of context vectors from the available side information, such as lower-fidelity simulations, pre-trained emulators, or domain expertise. This will illustrate how the Exp4 algorithm can leverage contextual information to generate adaptive, non-uniform sampling distributions that improve upon uniform or static methods. revision: yes
Circularity Check
No significant circularity; direct application of existing Exp4 algorithm
full rationale
The paper recasts subset selection in SAM methods as a contextual bandit problem and directly applies the known Exp4 algorithm, providing preliminary synthetic results. No equations, definitions, or steps reduce by construction to fitted inputs or prior self-citations. The core proposal is the modeling choice and algorithm application itself, which is independent of the target outcomes and does not rely on self-referential fitting or uniqueness claims imported from the authors' prior work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The theoretical assumptions of the Exp4 algorithm hold when applied to generating sampling distributions in SAM methods
Reference graph
Works this paper leans on
-
[1]
https://github.com/POptUS/BenDFO, 2022
BenDFO: Code for benchmarking derivative-free optimization alg orithms. https://github.com/POptUS/BenDFO, 2022
work page 2022
-
[2]
https://github.com/POptUS/IBCDFO, 2024
IBCDFO: Interpolation-based composite derivative-free optim ization. https://github.com/POptUS/IBCDFO, 2024
work page 2024
-
[3]
Competin g in the dark: An efficient algorithm for bandit linear optimization
Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. Competin g in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory , volume 110, 2009
work page 2009
- [4]
-
[5]
Finite-time a nalysis of the multiarmed bandit problem
Peter Auer, Nicol` o Cesa-Bianchi, and Paul Fischer. Finite-time a nalysis of the multiarmed bandit problem. Machine Learning, 47(2–3):235–256, 2002
work page 2002
-
[6]
The nonstochastic multiarmed bandit problem
Peter Auer, Nicol` o Cesa-Bianchi, Yoav Freund, and Robert E Sc hapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing , 32(1):48–77, 2002
work page 2002
-
[7]
Convergence rate analysis of a stochastic trust-region method via supermartingales
Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinb erg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS Journal on Optimization , 1(2):92–119, 2019
work page 2019
-
[8]
Raghu Bollapragada, Matt Menickelly, Witold Nazarewicz, Jared O’N eal, Paul-Gerhard Reinhard, and Stefan M. Wild. Optimization and supervised machine learning methods for fitting numerical physics models without derivatives. Journal of Physics G: Nuclear and Particle Physics , 48(2):024001, 2021
work page 2021
-
[9]
Zalan Borsos, Andreas Krause, and Kfir Y. Levy. Online variance reduction for stochastic optimization. In S´ ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the 31st Conference On Learning Theory , volume 75 of Proceedings of Machine Learning Research , pages 324–357. PMLR, 06–09 Jul 2018
work page 2018
-
[10]
Regret analysis ofstochastic and nonstochastic multi-armed bandit problems
S´ ebastien Bubeck and Nicol` o Cesa-Bianchi. Regret analysis ofstochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5(1):1–122, 2012
work page 2012
-
[11]
The best of both wo rlds: stochastic and adversarial bandits
S´ ebastien Bubeck and Aleksandrs Slivkins. The best of both wo rlds: stochastic and adversarial bandits. In Conference on Learning Theory (COLT) , 2012
work page 2012
-
[12]
Stochas tic optimization using a trust-region method and random models
Ruobing Chen, Matt Menickelly, and Katya Scheinberg. Stochas tic optimization using a trust-region method and random models. Mathematical Programming, 169(2):447–487, 2018. 20
work page 2018
-
[13]
Conn, Katya Scheinberg, and Lu ´ ıs N
Andrew R. Conn, Katya Scheinberg, and Lu ´ ıs N. Vicente.Introduction to Derivative-Free Optimization. SIAM, 2009
work page 2009
-
[14]
Importance sampling for minibatches
Dominik Csiba and Peter Richt´ arik. Importance sampling for minibatches. Journal of Machine Learning Research, 19(27):1–21, 2018
work page 2018
-
[15]
Bach, and Simon Lacoste-Julien
Aaron Defazio, Francis R. Bach, and Simon Lacoste-Julien. SAG A: A fast incremental gradient method with support for non-strongly convex composite objectives. In Z oubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors , Advances in Neural Information Processing Systems 27 , pages 1646–1654, 2014
work page 2014
-
[16]
Adaptive importance sam pling for finite-sum optimization and sampling with decreasing step-sizes
Ayoub El Hanchi and David Stephens. Adaptive importance sam pling for finite-sum optimization and sampling with decreasing step-sizes. Advances in Neural Information Processing Systems , 33:15702– 15713, 2020
work page 2020
-
[17]
Nonconvex variance re duced optimization with arbitrary sam- pling
Samuel Horv´ ath and Peter Richtarik. Nonconvex variance re duced optimization with arbitrary sam- pling. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97, pages 2781–2789. PMLR, 2019
work page 2019
-
[18]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic op timization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[19]
Asymptotically efficient ada ptive allocation rules
Tze Leung Lai and Herbert Robbins. Asymptotically efficient ada ptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985
work page 1985
-
[20]
The epoch-greedy algorithm f or contextual multi-armed bandits
John Langford and Tong Zhang. The epoch-greedy algorithm f or contextual multi-armed bandits. In Advances in Neural Information Processing Systems (NeurIP S), 2008
work page 2008
-
[21]
Structure-aware method s for expensive derivative-free nonsmooth composite optimization
Jeffrey Larson and Matt Menickelly. Structure-aware method s for expensive derivative-free nonsmooth composite optimization. Mathematical Programming Computation , 16(1):1–36, 2024
work page 2024
-
[22]
Jeffrey Larson, Matt Menickelly, and Stefan M. Wild. Derivative- free optimization methods. Acta Numerica, 28:287–404, 2019
work page 2019
- [23]
-
[24]
Adam with bandit samplin g for deep learning
Rui Liu, Tianyi Wu, and Barzan Mozafari. Adam with bandit samplin g for deep learning. Advances in Neural Information Processing Systems , 33:5393–5404, 2020
work page 2020
-
[25]
P. Marevi´ c, N. Schunck, E. M. Ney, R. Navarro P´ erez, M. V erriere, and J. O’Neal. Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (IV) HFBTHO (v4.0): A new version of the program. Comput. Phys. Commun. , 276:108367, 2022
work page 2022
-
[26]
Matt Menickelly and Stefan M. Wild. Stochastic average model me thods. Computational Optimization and Applications, 88(2):405–442, 2024
work page 2024
-
[27]
Jorge J. Mor´ e and Stefan M. Wild. Benchmarking derivative-fr ee optimization algorithms. SIAM Journal on Optimization , 20(1):172–191, 2009
work page 2009
-
[28]
Adaptive sampling probabili- ties for non-smooth optimization
Hongseok Namkoong, Aman Sinha, Steve Yadlowsky, and John C Duchi. Adaptive sampling probabili- ties for non-smooth optimization. In International Conference on Machine Learning , pages 2574–2583. PMLR, 2017
work page 2017
-
[29]
Some aspects of the sequential design of ex periments
Herbert Robbins. Some aspects of the sequential design of ex periments. Bulletin of the American Mathematical Society, 58(5):527–535, 1952
work page 1952
-
[30]
Stochastic optim ization with bandit sampling
Farnood Salehi, L Elisa Celis, and Patrick Thiran. Stochastic optim ization with bandit sampling. In Advances in Neural Information Processing Systems (NeurIP S), 2017. 21
work page 2017
-
[31]
Minimizing finite s ums with the stochastic average gradient
Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite s ums with the stochastic average gradient. Mathematical Programming, 162(1-2):83–112, 2017
work page 2017
-
[32]
Stefan M. Wild. MNH: A derivative-free optimization algorithm usin g minimal norm Hessians. In Tenth Copper Mountain Conference on Iterative Methods , 2008
work page 2008
-
[33]
Stefan M. Wild. Solving derivative-free nonlinear least squares p roblems with POUNDERS. In Tamas Terlaky, Miguel F. Anjos, and Shabbir Ahmed, editors, Advances and Trends in Optimization with Engineering Applications, pages 529–540. SIAM, 2017
work page 2017
-
[34]
Adaptive client sampling in federated learning via online learning with ban dit feedback
Boxin Zhao, Lingxiao Wang, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Ch aochao Chen, and Mladen Kolar. Adaptive client sampling in federated learning via online learning with ban dit feedback. Journal of Machine Learning Research, 26(8):1–67, 2025. A Proof of Theorem 2 Proof. For notational convenience, define qk,n := wk,n /W k and define yk,n := ek,n ⊤ ˆdk for a...
work page 2025
-
[35]
Sensitivity to Amplitude ( SA): SA(t, Pinc ) = |∂y/∂A |(t,Pinc ) = exp( − dinc t)| cos(winc t + pinc )|
-
[36]
Sensitivity to Damping Coefficient ( Sd ): Sd(t, Pinc ) = |∂y/∂d |(t,Pinc ) = Ainc t exp(− dinc t)| cos(winc t + pinc)|
-
[37]
Sensitivity to Angular Frequency ( Sw ): Sw (t, Pinc ) = |∂y/∂w |(t,Pinc ) = Ainc t exp(− dinc t)| sin(winc t + pinc)|
-
[38]
Sensitivity to Phase Offset ( Sp ): Sp (t, Pinc ) = |∂y/∂p |(t,Pinc ) = Ainc exp(− dinc t)| sin(winc t + pinc)| The score for each time point i is then calculated as: Scorei = SA(ti , Pinc )|Ai − Ainc | + Sd (ti , Pinc )|di − dinc| + Sw (ti , Pinc )|wi − winc | + Sp(ti , Pinc )|pi − pinc| Rationale for this scoring method: • Distance to Incumbent: The ter...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.