Unbiased Gradients for a Class of Conditional Stochastic Optimization Problems
Pith reviewed 2026-05-20 23:37 UTC · model grok-4.3
The pith
A method combining Markovian stochastic approximation with unbiased estimators finds optimizers for conditional stochastic problems when joint distributions cannot be sampled directly.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By merging Markovian stochastic approximation for handling the sampling of variables with unbiased gradient estimation, the optimizer of F(ξ) can be found even when the joint distribution of X and Z is not directly accessible, as illustrated in examples from parameter estimation and portfolio selection.
What carries the argument
The integration of Markovian stochastic approximation with unbiased approximation methods to produce gradients for the conditional stochastic objective F(ξ).
Load-bearing premise
Suitable conditional or marginal samplers exist and the Markovian approximation converges to the correct stationary distribution in a manner where any bias can be corrected by the unbiased estimator.
What would settle it
Apply the method to a simulated instance of F(ξ) with a known closed-form optimizer and check whether the iterates converge to that known value at the expected rate.
Figures
read the original abstract
In this paper we consider the conditional stochastic optimization (CSO) problem. This consists of optimizing a function which can be written as the expectation of a function which is itself a function of a conditional expectation, i.e.~of the type $F(\xi) := \mathbb{E}\left[f\left(Z,\mathbb{E}[g(Z,X,\xi)|Z]\right)\right]$, where precise definitions are given in the main text. We address a particular class of CSO problems where the joint law of the random variables $X,Z$ cannot be exactly sampled; this case has been addressed in Goda & Kitade (2023). We introduce a method that combines Markovian stochastic approximation with unbiased approximation methods which allows one to find the optimizer of $F(\xi)$ in the context of interest. We illustrate our methodology on two examples associated to parameter estimation with model averaging and portfolio selection associated to high-dimensional full factor multivariate stochastic volatility models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses a class of conditional stochastic optimization (CSO) problems of the form F(ξ) := E[f(Z, E[g(Z,X,ξ)|Z])], where the joint law of (X,Z) cannot be sampled exactly. It proposes a method that combines Markovian stochastic approximation (to sample from the relevant conditional or marginal distributions) with unbiased gradient estimators, and illustrates the approach on two examples: parameter estimation with model averaging and portfolio selection for high-dimensional full-factor multivariate stochastic volatility models.
Significance. If the central construction yields unbiased gradients for the composite estimator despite the drifting target distribution induced by simultaneous ξ updates, the result would extend the scope of unbiased stochastic optimization methods to a practically relevant subclass of CSO problems previously treated only under stronger sampling assumptions (e.g., Goda & Kitade 2023). The two concrete applications demonstrate potential utility in statistics and quantitative finance.
major comments (1)
- [Algorithm and convergence analysis sections] The unbiasedness claim for the composite gradient estimator rests on the inner Markov chain being at stationarity with respect to the current ξ at each outer iteration. Because ξ is updated on every step, the target conditional law drifts; standard geometric ergodicity guarantees apply only for fixed ξ. No analysis is supplied (in the algorithm section or the convergence section) that bounds the residual transient bias or shows that the subsequent debiasing/coupling step removes lag bias whose magnitude depends on the relative time scales of the inner chain and the outer SA update. This is load-bearing for the claim that the method “allows one to find the optimizer.”
minor comments (2)
- [Introduction / Problem statement] The notation for the conditional expectation inside f is introduced in the abstract but would benefit from an explicit display equation with all random variables and conditioning clearly labeled in the main text.
- [Numerical experiments] The two numerical examples would be strengthened by reporting both the achieved objective value and a diagnostic for the empirical bias of the gradient estimator (e.g., Monte-Carlo estimate of E[estimator] versus a high-accuracy reference).
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. The major comment concerns the lack of analysis for transient bias in the inner Markov chain due to drifting targets as ξ is updated. We address this point directly below.
read point-by-point responses
-
Referee: [Algorithm and convergence analysis sections] The unbiasedness claim for the composite gradient estimator rests on the inner Markov chain being at stationarity with respect to the current ξ at each outer iteration. Because ξ is updated on every step, the target conditional law drifts; standard geometric ergodicity guarantees apply only for fixed ξ. No analysis is supplied (in the algorithm section or the convergence section) that bounds the residual transient bias or shows that the subsequent debiasing/coupling step removes lag bias whose magnitude depends on the relative time scales of the inner chain and the outer SA update. This is load-bearing for the claim that the method “allows one to find the optimizer.”
Authors: We thank the referee for highlighting this important subtlety. The unbiasedness of the gradient estimator is formally established under the assumption that the inner Markov chain has reached stationarity for the current fixed ξ. As the outer stochastic approximation updates ξ at each iteration, the target conditional distribution changes, so standard geometric ergodicity results for time-homogeneous chains do not immediately guarantee that the chain is close to stationarity at the new target. The manuscript does not supply explicit bounds on the resulting transient (lag) bias or a quantitative analysis of how the debiasing/coupling step interacts with the relative time scales of the inner and outer processes. We agree that this constitutes a gap in the current theoretical development. In the revised version we will add a dedicated paragraph in the convergence analysis section that introduces a mixing-time assumption on the inner chain relative to the outer step-size schedule and shows that, under this condition, the residual bias vanishes asymptotically, thereby supporting convergence to the optimizer of F(ξ). revision: yes
Circularity Check
No circularity: method combines external unbiased estimators with standard Markovian SA without self-referential reduction
full rationale
The paper defines the CSO objective F(ξ) explicitly as an outer expectation of a function of a conditional inner expectation, then proposes a composite estimator that applies Markovian stochastic approximation to sample from the conditional law and layers an unbiased gradient estimator on top. This construction cites Goda & Kitade (2023) for the base CSO setting and relies on standard ergodicity results for the inner chain; neither the optimizer nor the gradient estimator is obtained by fitting parameters to the target result itself or by renaming a known quantity. The derivation therefore remains self-contained against external benchmarks and does not reduce any claimed prediction to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Andrieu , C., Jasra , A., Doucet , A., & Del Moral , P. (2011). On non-linear Markov chain Monte Carlo. Bernoulli, 17 , 987-1014
work page 2011
-
[2]
Andrieu , C., Moulines , E. & Priouret , P. (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. , 44 , 283-312
work page 2005
-
[3]
Andrieu , C., & Vihola , M. (2014). Markovian stochastic approximation with expanding projections. Bernoulli, 20 , 545--585
work page 2014
-
[4]
Awadelkarim , E., Jasra , A., & Ruzayqat , H. (2024). Unbiased parameters for diffusions. SIAM J. Control Optim. , 62 , 2664-2694
work page 2024
-
[5]
Capp\'e , O., Ryden , T, & Moulines , \'E. (2005). Inference in Hidden Markov Models. Springer: New York
work page 2005
-
[6]
Dellaportas , P., Titsias , M., Petrova , K. & Plataniotis , A. (2023). Scalable inference for a full multivariate stochastic volatility model. J. Econ., 232 , 501-520
work page 2023
-
[7]
Goda , T. & Kitade , W. (2023). Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization. Math. Comp. Sim., 204 , 743-763
work page 2023
-
[8]
(2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs
Goda , T., Hironaka , T., Kitade , W & Foster , A. (2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs. SIAM J. Sci. Comput., 44, A286--A311
work page 2022
-
[9]
Jasra , A., Law , K. J. H., & Yu , F. (2022). Unbiased filtering of diffusions. Adv. Appl. Probab. 54 , 661-687
work page 2022
-
[10]
Heng , J., Houssineau , J. & Jasra , A. (2024). On unbiased score estimation for partially observed diffusions. J. Mach. Learn. Res., 25 , 1-66
work page 2024
-
[11]
Heng , J., Jasra , A., Law , K. J. H. & Tarakanov , A. (2023).On unbiased estimation of discretized models. SIAM/ASA JUQ, 11 , 616--645
work page 2023
-
[12]
Hu , Y., Chen , X., & He , N.. (2020). Sample complexity of sample average approximation for conditional stochastic optimization. SIAM J. Optim. 30 , 2103--2133
work page 2020
-
[13]
Hu , Y., Chen , X., & He , N.. (2020). Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning. NIPs
work page 2020
-
[14]
Lindsten , F., Jordan , M. & Schon , T. (2014) Particle Gibbs with ancestor sampling. J. Mach. Learn. Res., 15 , 2145--2184
work page 2014
-
[15]
Lindsten , F., Douc , R., & Moulines , E. (2015). Uniform ergodicity of the particle Gibbs sampler. Scand. J. Stat., 42 775--797
work page 2015
-
[16]
Singh , R., Sahani , M. & Gretton , A. (2019) Kernel instrumental variable regression. NeurIPs
work page 2019
-
[17]
Tsagaris , T., Jasra , A. & Adams , N. (2012). Robust and adaptive algorithms for online portfolio selection. Quant. Finan., 12 ,1651--1662
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.