pith. sign in

arxiv: 2604.20657 · v1 · submitted 2026-04-22 · 🧮 math.OC

Importance Sampling in Expensive Finite-Sum Optimization via Contextual Bandit Methods

Pith reviewed 2026-05-09 23:55 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic average modelcontextual banditsExp4importance samplingfinite-sum optimizationmodel-based optimizationside information
0
0 comments X

The pith

Framing subset selection for stochastic average model methods as a contextual bandit problem lets the Exp4 algorithm use side information to create better sampling distributions than uniform randomization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reconsiders stochastic average model methods for optimization tasks where the objective depends on multiple simulation outputs that can run in parallel. It recasts the choice of which randomized subsets to model locally in each iteration as a contextual bandit problem. The authors apply the Exp4 algorithm to learn sampling distributions that draw on available side information such as lower-fidelity simulations, pre-trained emulators, or domain expertise. Preliminary numerical tests on synthetic problems indicate that this approach can outperform standard randomized selection. Readers would care because many expensive finite-sum optimization workflows stand to benefit from reduced high-fidelity evaluations when smarter, context-aware sampling is possible.

Core claim

The central claim is that generating sampling distributions for SAM methods can be posed as a contextual bandit problem, and that applying the Exp4 algorithm to this formulation allows side information to be incorporated so that the resulting distributions improve upon uniform random selection, as shown in preliminary numerical results on synthetic problems.

What carries the argument

The Exp4 algorithm (Exponential weights for Exploration and Exploitation with Experts), which maintains and updates weights over expert policies to select context-dependent sampling distributions for subset choice in each SAM iteration.

Load-bearing premise

Side information such as lower-fidelity simulations or expert knowledge can be encoded as context vectors that let the bandit algorithm learn sampling distributions meaningfully superior to uniform randomization.

What would settle it

A controlled experiment on the same synthetic problems showing no statistically significant reduction in optimization error or iterations to convergence when using Exp4-guided sampling versus uniform random sampling would falsify the practical improvement claim.

Figures

Figures reproduced from arXiv: 2604.20657 by Matt Menickelly.

Figure 1
Figure 1. Figure 1: SAM-POUNDERS [26] employing quadratic interpolation models applied to a synthetic quadratic problem described in the text. The first variant uses the Lipschitz estimation scheme described in Section 4.1 to generate π k , while the latter updates models uniformly at random. Left: A comparison of a typical realization of the two variants comparing iteration counter to true function value f(x k ) (unknown to … view at source ↗
Figure 7
Figure 7. Figure 7: Reasoning text provided by Gemini Pro 3.1 on a random itera [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
read the original abstract

In computational science workflows, it is often the case that 1) objective functions for optimization involve multiple simulation outputs, and 2) those simulations can be performed (at least partially) in parallel. In this work, we reexamine past work on a class of randomized algorithms, stochastic average model (SAM) methods. SAM methods are conceptually similar to stochastic average gradient methods, and effectively require that only randomized subsets of simulation outputs be locally modeled in each iteration of a model-based optimization method. This work focuses on the question of how best to perform this randomization of subset selection, especially in settings where there exists useful side information such as alternative lower-fidelity simulations, pre-trained emulators or domain expertise from humans or AI models. In particular, we consider the problem of generating sampling distributions for SAM methods as a contextual bandit problem and we apply the Exponential weights algorithm for Exploration and Exploitation with Experts (Exp4). We provide some preliminary numerical results on synthetic problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript reexamines Stochastic Average Model (SAM) methods for expensive finite-sum optimization problems involving multiple simulation outputs that can be partially parallelized. It recasts the problem of generating sampling distributions for randomized subset selection as a contextual bandit problem and applies the Exp4 algorithm to incorporate side information such as lower-fidelity simulations, pre-trained emulators, or domain expertise. Preliminary numerical results on synthetic problems are reported.

Significance. If the central claim holds, the work could improve efficiency in simulation-based optimization by adaptively using contextual side information to reduce variance in SAM subset selection. The choice of Exp4 is a natural fit for the multi-expert bandit setting, and the emphasis on practical side information addresses a real need in computational workflows. However, the current evidence is limited to synthetic cases, so broader significance depends on further validation.

major comments (2)
  1. [Numerical experiments] The numerical experiments section provides only preliminary results on synthetic problems but omits all details on experimental design: no description of baselines (uniform sampling, static importance sampling, or other bandit variants), performance metrics (convergence iterations, variance of gradient estimates, or wall-clock time), number of independent runs, or statistical tests. This leaves the claimed improvement over standard randomized selection unsupported.
  2. [Method] The method section does not define the instantaneous reward signal (e.g., whether it is a function of local model error, gradient contribution, or simulation output) nor how context vectors are constructed from the mentioned side information. Without these definitions it is impossible to verify whether Exp4 can exploit the context to produce non-uniform distributions that meaningfully outperform uniform or static sampling.
minor comments (1)
  1. [Abstract] The abstract could more explicitly state the precise form of the reward and context used in the Exp4 application.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We agree that more details are required in both the method and numerical experiments sections. We address each major comment below and will make the necessary revisions to the manuscript.

read point-by-point responses
  1. Referee: [Numerical experiments] The numerical experiments section provides only preliminary results on synthetic problems but omits all details on experimental design: no description of baselines (uniform sampling, static importance sampling, or other bandit variants), performance metrics (convergence iterations, variance of gradient estimates, or wall-clock time), number of independent runs, or statistical tests. This leaves the claimed improvement over standard randomized selection unsupported.

    Authors: We acknowledge the validity of this observation. The presented results are indeed preliminary and lack the full experimental details. In the revised version, we will provide a complete description of the experimental design, including the baselines considered (uniform sampling, static importance sampling, and alternative bandit algorithms), the specific performance metrics used (convergence in terms of iterations, variance of the gradient estimates, and wall-clock time), the number of independent runs, and appropriate statistical tests to support the improvements over standard randomized selection. revision: yes

  2. Referee: [Method] The method section does not define the instantaneous reward signal (e.g., whether it is a function of local model error, gradient contribution, or simulation output) nor how context vectors are constructed from the mentioned side information. Without these definitions it is impossible to verify whether Exp4 can exploit the context to produce non-uniform distributions that meaningfully outperform uniform or static sampling.

    Authors: We agree that these definitions are essential for clarity and reproducibility. The revised manuscript will include explicit definitions of the instantaneous reward signal, detailing its computation based on local model error, gradient contributions, or simulation outputs. We will also specify the construction of context vectors from the available side information, such as lower-fidelity simulations, pre-trained emulators, or domain expertise. This will illustrate how the Exp4 algorithm can leverage contextual information to generate adaptive, non-uniform sampling distributions that improve upon uniform or static methods. revision: yes

Circularity Check

0 steps flagged

No significant circularity; direct application of existing Exp4 algorithm

full rationale

The paper recasts subset selection in SAM methods as a contextual bandit problem and directly applies the known Exp4 algorithm, providing preliminary synthetic results. No equations, definitions, or steps reduce by construction to fitted inputs or prior self-citations. The core proposal is the modeling choice and algorithm application itself, which is independent of the target outcomes and does not rely on self-referential fitting or uniqueness claims imported from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claim rests on the applicability of contextual bandit methods to SAM sampling without introducing new free parameters or invented entities; it inherits standard assumptions from the Exp4 algorithm and the SAM framework.

axioms (1)
  • domain assumption The theoretical assumptions of the Exp4 algorithm hold when applied to generating sampling distributions in SAM methods
    The paper applies Exp4 without detailing whether regret bounds or exploration requirements translate directly to the optimization setting.

pith-pipeline@v0.9.0 · 5458 in / 1272 out tokens · 59757 ms · 2026-05-09T23:55:15.031167+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    https://github.com/POptUS/BenDFO, 2022

    BenDFO: Code for benchmarking derivative-free optimization alg orithms. https://github.com/POptUS/BenDFO, 2022

  2. [2]

    https://github.com/POptUS/IBCDFO, 2024

    IBCDFO: Interpolation-based composite derivative-free optim ization. https://github.com/POptUS/IBCDFO, 2024

  3. [3]

    Competin g in the dark: An efficient algorithm for bandit linear optimization

    Jacob D Abernethy, Elad Hazan, and Alexander Rakhlin. Competin g in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory , volume 110, 2009

  4. [4]

    Schapire

    Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong L i, and Robert E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. International Conference on Machine Learning (ICML), 2014

  5. [5]

    Finite-time a nalysis of the multiarmed bandit problem

    Peter Auer, Nicol` o Cesa-Bianchi, and Paul Fischer. Finite-time a nalysis of the multiarmed bandit problem. Machine Learning, 47(2–3):235–256, 2002

  6. [6]

    The nonstochastic multiarmed bandit problem

    Peter Auer, Nicol` o Cesa-Bianchi, Yoav Freund, and Robert E Sc hapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing , 32(1):48–77, 2002

  7. [7]

    Convergence rate analysis of a stochastic trust-region method via supermartingales

    Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinb erg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS Journal on Optimization , 1(2):92–119, 2019

  8. [8]

    Raghu Bollapragada, Matt Menickelly, Witold Nazarewicz, Jared O’N eal, Paul-Gerhard Reinhard, and Stefan M. Wild. Optimization and supervised machine learning methods for fitting numerical physics models without derivatives. Journal of Physics G: Nuclear and Particle Physics , 48(2):024001, 2021

  9. [9]

    Zalan Borsos, Andreas Krause, and Kfir Y. Levy. Online variance reduction for stochastic optimization. In S´ ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the 31st Conference On Learning Theory , volume 75 of Proceedings of Machine Learning Research , pages 324–357. PMLR, 06–09 Jul 2018

  10. [10]

    Regret analysis ofstochastic and nonstochastic multi-armed bandit problems

    S´ ebastien Bubeck and Nicol` o Cesa-Bianchi. Regret analysis ofstochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5(1):1–122, 2012

  11. [11]

    The best of both wo rlds: stochastic and adversarial bandits

    S´ ebastien Bubeck and Aleksandrs Slivkins. The best of both wo rlds: stochastic and adversarial bandits. In Conference on Learning Theory (COLT) , 2012

  12. [12]

    Stochas tic optimization using a trust-region method and random models

    Ruobing Chen, Matt Menickelly, and Katya Scheinberg. Stochas tic optimization using a trust-region method and random models. Mathematical Programming, 169(2):447–487, 2018. 20

  13. [13]

    Conn, Katya Scheinberg, and Lu ´ ıs N

    Andrew R. Conn, Katya Scheinberg, and Lu ´ ıs N. Vicente.Introduction to Derivative-Free Optimization. SIAM, 2009

  14. [14]

    Importance sampling for minibatches

    Dominik Csiba and Peter Richt´ arik. Importance sampling for minibatches. Journal of Machine Learning Research, 19(27):1–21, 2018

  15. [15]

    Bach, and Simon Lacoste-Julien

    Aaron Defazio, Francis R. Bach, and Simon Lacoste-Julien. SAG A: A fast incremental gradient method with support for non-strongly convex composite objectives. In Z oubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors , Advances in Neural Information Processing Systems 27 , pages 1646–1654, 2014

  16. [16]

    Adaptive importance sam pling for finite-sum optimization and sampling with decreasing step-sizes

    Ayoub El Hanchi and David Stephens. Adaptive importance sam pling for finite-sum optimization and sampling with decreasing step-sizes. Advances in Neural Information Processing Systems , 33:15702– 15713, 2020

  17. [17]

    Nonconvex variance re duced optimization with arbitrary sam- pling

    Samuel Horv´ ath and Peter Richtarik. Nonconvex variance re duced optimization with arbitrary sam- pling. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97, pages 2781–2789. PMLR, 2019

  18. [18]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic op timization. arXiv preprint arXiv:1412.6980, 2014

  19. [19]

    Asymptotically efficient ada ptive allocation rules

    Tze Leung Lai and Herbert Robbins. Asymptotically efficient ada ptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985

  20. [20]

    The epoch-greedy algorithm f or contextual multi-armed bandits

    John Langford and Tong Zhang. The epoch-greedy algorithm f or contextual multi-armed bandits. In Advances in Neural Information Processing Systems (NeurIP S), 2008

  21. [21]

    Structure-aware method s for expensive derivative-free nonsmooth composite optimization

    Jeffrey Larson and Matt Menickelly. Structure-aware method s for expensive derivative-free nonsmooth composite optimization. Mathematical Programming Computation , 16(1):1–36, 2024

  22. [22]

    Jeffrey Larson, Matt Menickelly, and Stefan M. Wild. Derivative- free optimization methods. Acta Numerica, 28:287–404, 2019

  23. [23]

    Schapire

    Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A c ontextual-bandit approach to person- alized news article recommendation. In International World Wide Web Conference (WWW) , 2010

  24. [24]

    Adam with bandit samplin g for deep learning

    Rui Liu, Tianyi Wu, and Barzan Mozafari. Adam with bandit samplin g for deep learning. Advances in Neural Information Processing Systems , 33:5393–5404, 2020

  25. [25]

    Marevi´ c, N

    P. Marevi´ c, N. Schunck, E. M. Ney, R. Navarro P´ erez, M. V erriere, and J. O’Neal. Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (IV) HFBTHO (v4.0): A new version of the program. Comput. Phys. Commun. , 276:108367, 2022

  26. [26]

    Matt Menickelly and Stefan M. Wild. Stochastic average model me thods. Computational Optimization and Applications, 88(2):405–442, 2024

  27. [27]

    Mor´ e and Stefan M

    Jorge J. Mor´ e and Stefan M. Wild. Benchmarking derivative-fr ee optimization algorithms. SIAM Journal on Optimization , 20(1):172–191, 2009

  28. [28]

    Adaptive sampling probabili- ties for non-smooth optimization

    Hongseok Namkoong, Aman Sinha, Steve Yadlowsky, and John C Duchi. Adaptive sampling probabili- ties for non-smooth optimization. In International Conference on Machine Learning , pages 2574–2583. PMLR, 2017

  29. [29]

    Some aspects of the sequential design of ex periments

    Herbert Robbins. Some aspects of the sequential design of ex periments. Bulletin of the American Mathematical Society, 58(5):527–535, 1952

  30. [30]

    Stochastic optim ization with bandit sampling

    Farnood Salehi, L Elisa Celis, and Patrick Thiran. Stochastic optim ization with bandit sampling. In Advances in Neural Information Processing Systems (NeurIP S), 2017. 21

  31. [31]

    Minimizing finite s ums with the stochastic average gradient

    Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite s ums with the stochastic average gradient. Mathematical Programming, 162(1-2):83–112, 2017

  32. [32]

    Stefan M. Wild. MNH: A derivative-free optimization algorithm usin g minimal norm Hessians. In Tenth Copper Mountain Conference on Iterative Methods , 2008

  33. [33]

    Stefan M. Wild. Solving derivative-free nonlinear least squares p roblems with POUNDERS. In Tamas Terlaky, Miguel F. Anjos, and Shabbir Ahmed, editors, Advances and Trends in Optimization with Engineering Applications, pages 529–540. SIAM, 2017

  34. [34]

    Adaptive client sampling in federated learning via online learning with ban dit feedback

    Boxin Zhao, Lingxiao Wang, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Ch aochao Chen, and Mladen Kolar. Adaptive client sampling in federated learning via online learning with ban dit feedback. Journal of Machine Learning Research, 26(8):1–67, 2025. A Proof of Theorem 2 Proof. For notational convenience, define qk,n := wk,n /W k and define yk,n := ek,n ⊤ ˆdk for a...

  35. [35]

    Sensitivity to Amplitude ( SA): SA(t, Pinc ) = |∂y/∂A |(t,Pinc ) = exp( − dinc t)| cos(winc t + pinc )|

  36. [36]

    Sensitivity to Damping Coefficient ( Sd ): Sd(t, Pinc ) = |∂y/∂d |(t,Pinc ) = Ainc t exp(− dinc t)| cos(winc t + pinc)|

  37. [37]

    Sensitivity to Angular Frequency ( Sw ): Sw (t, Pinc ) = |∂y/∂w |(t,Pinc ) = Ainc t exp(− dinc t)| sin(winc t + pinc)|

  38. [38]

    A larger distance implies the current quadrati c approximation is less likely to be accurate at the incumben t point

    Sensitivity to Phase Offset ( Sp ): Sp (t, Pinc ) = |∂y/∂p |(t,Pinc ) = Ainc exp(− dinc t)| sin(winc t + pinc)| The score for each time point i is then calculated as: Scorei = SA(ti , Pinc )|Ai − Ainc | + Sd (ti , Pinc )|di − dinc| + Sw (ti , Pinc )|wi − winc | + Sp(ti , Pinc )|pi − pinc| Rationale for this scoring method: • Distance to Incumbent: The ter...