Unbiased Gradients for a Class of Conditional Stochastic Optimization Problems

Ajay Jasra; Miguel Alvarez

arxiv: 2605.18786 · v1 · pith:KEUF73BEnew · submitted 2026-05-07 · 🧮 math.OC · cs.NA· math.NA· stat.ME

Unbiased Gradients for a Class of Conditional Stochastic Optimization Problems

Miguel Alvarez , Ajay Jasra This is my paper

Pith reviewed 2026-05-20 23:37 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NAstat.ME

keywords conditional stochastic optimizationunbiased gradientsMarkovian stochastic approximationstochastic volatility modelsportfolio selectionparameter estimation

0 comments

The pith

A method combining Markovian stochastic approximation with unbiased estimators finds optimizers for conditional stochastic problems when joint distributions cannot be sampled directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses conditional stochastic optimization problems of the form F(ξ) := E[f(Z, E[g(Z,X,ξ)|Z])], where the joint law of X and Z cannot be sampled exactly. It proposes combining Markovian stochastic approximation with unbiased approximation methods to optimize F(ξ) without introducing uncorrectable bias. This is shown to work for tasks like parameter estimation with model averaging and portfolio selection in high-dimensional full factor multivariate stochastic volatility models. A sympathetic reader cares because the approach solves a class of optimization problems that arise in statistical modeling but were previously limited by sampling constraints.

Core claim

By merging Markovian stochastic approximation for handling the sampling of variables with unbiased gradient estimation, the optimizer of F(ξ) can be found even when the joint distribution of X and Z is not directly accessible, as illustrated in examples from parameter estimation and portfolio selection.

What carries the argument

The integration of Markovian stochastic approximation with unbiased approximation methods to produce gradients for the conditional stochastic objective F(ξ).

Load-bearing premise

Suitable conditional or marginal samplers exist and the Markovian approximation converges to the correct stationary distribution in a manner where any bias can be corrected by the unbiased estimator.

What would settle it

Apply the method to a simulated instance of F(ξ) with a known closed-form optimizer and check whether the iterates converge to that known value at the expected rate.

Figures

Figures reproduced from arXiv: 2605.18786 by Ajay Jasra, Miguel Alvarez.

**Figure 2.** Figure 2: Estimation of the relative MSE, bias squared and variance of the estimated score function in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Estimation of the relative MSE, bias squared and variance of the estimated score function in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Estimated relative MSE of the component in terms of MSA iteration. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Approximation of the objective function F(ξ) in terms of the number of iterations of the MSA method. For the second part, the algorithm is implemented in an online setting over an evaluation horizon of 252 trading days, spanning the period from 2023-12-20 to 2024-12-13. Each dataset consists of approximately 730 daily observations. At the initial rebalancing time, model parameters are estimated using a tra… view at source ↗

**Figure 6.** Figure 6: Approximation of the objective function F(ξ) in terms of the number of iterations of the MSA method.. Market Strategy FinalW % gain % loss MDD % WT Ann.R Ann.V Sharpe STOXX Europe 600 Unbiased MSV-MSA 1.33 0.58 -0.53 6.04 58.40 33.80 11.48 2.60 Uniform 1.17 0.49 -0.50 6.43 56.80 16.94 9.86 1.64 Tadawul Unbiased MSV-MSA 1.26 0.60 -0.65 11.17 59.60 25.81 13.40 1.78 Uniform 1.15 0.60 -0.63 13.21 56.00 15.48 1… view at source ↗

read the original abstract

In this paper we consider the conditional stochastic optimization (CSO) problem. This consists of optimizing a function which can be written as the expectation of a function which is itself a function of a conditional expectation, i.e.~of the type $F(\xi) := \mathbb{E}\left[f\left(Z,\mathbb{E}[g(Z,X,\xi)|Z]\right)\right]$, where precise definitions are given in the main text. We address a particular class of CSO problems where the joint law of the random variables $X,Z$ cannot be exactly sampled; this case has been addressed in Goda & Kitade (2023). We introduce a method that combines Markovian stochastic approximation with unbiased approximation methods which allows one to find the optimizer of $F(\xi)$ in the context of interest. We illustrate our methodology on two examples associated to parameter estimation with model averaging and portfolio selection associated to high-dimensional full factor multivariate stochastic volatility models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines Markovian stochastic approximation with unbiased estimators for conditional stochastic optimization when joints cannot be sampled, but the bias control as the target drifts with updates is the part that needs checking.

read the letter

The one or two things to know about this paper are that it proposes using Markovian stochastic approximation together with unbiased approximation techniques to optimize a class of conditional stochastic optimization problems, specifically when the joint distribution of the inner variables cannot be sampled directly. This is an incremental step from prior work on unbiased estimators for such nested expectations. The paper does a decent job laying out the problem setup for F(ξ) involving the conditional expectation inside the outer one. It then describes the method that combines the Markov chain for generating approximate samples from the required conditional or marginal distributions with the unbiased gradient construction. The two numerical examples—one for parameter estimation under model averaging and another for portfolio optimization in high-dimensional stochastic volatility models—provide some evidence that the approach can be implemented and applied to realistic problems in statistics and finance. Those illustrations are a positive point because they connect the abstract method to concrete use cases. Where it could be softer is in the justification for unbiasedness and convergence when the parameter is being updated iteratively. The Markov chain is meant to sample from a distribution that depends on the current value of ξ. As optimization proceeds, ξ moves, so the target distribution changes at each iteration. While the unbiased estimator might correct for some approximation errors, it is not clear from the description whether it handles the bias from the chain not being at equilibrium for the moving target. If the mixing time of the chain is not sufficiently fast compared to the step size of the outer stochastic approximation, there could be a persistent error. The paper would benefit from a more detailed analysis or bounds that address this dynamic setting rather than assuming stationarity at each step. This work is mainly for specialists in stochastic optimization and Monte Carlo methods who encounter nested expectations in their applications. Someone looking for new tools to handle unsamplable joints in conditional problems might find the combination interesting and the examples helpful for understanding potential use. I would recommend putting it through peer review. The core idea addresses a real practical issue, and with solid supporting analysis it could be a useful addition to the literature.

Referee Report

1 major / 2 minor

Summary. The paper addresses a class of conditional stochastic optimization (CSO) problems of the form F(ξ) := E[f(Z, E[g(Z,X,ξ)|Z])], where the joint law of (X,Z) cannot be sampled exactly. It proposes a method that combines Markovian stochastic approximation (to sample from the relevant conditional or marginal distributions) with unbiased gradient estimators, and illustrates the approach on two examples: parameter estimation with model averaging and portfolio selection for high-dimensional full-factor multivariate stochastic volatility models.

Significance. If the central construction yields unbiased gradients for the composite estimator despite the drifting target distribution induced by simultaneous ξ updates, the result would extend the scope of unbiased stochastic optimization methods to a practically relevant subclass of CSO problems previously treated only under stronger sampling assumptions (e.g., Goda & Kitade 2023). The two concrete applications demonstrate potential utility in statistics and quantitative finance.

major comments (1)

[Algorithm and convergence analysis sections] The unbiasedness claim for the composite gradient estimator rests on the inner Markov chain being at stationarity with respect to the current ξ at each outer iteration. Because ξ is updated on every step, the target conditional law drifts; standard geometric ergodicity guarantees apply only for fixed ξ. No analysis is supplied (in the algorithm section or the convergence section) that bounds the residual transient bias or shows that the subsequent debiasing/coupling step removes lag bias whose magnitude depends on the relative time scales of the inner chain and the outer SA update. This is load-bearing for the claim that the method “allows one to find the optimizer.”

minor comments (2)

[Introduction / Problem statement] The notation for the conditional expectation inside f is introduced in the abstract but would benefit from an explicit display equation with all random variables and conditioning clearly labeled in the main text.
[Numerical experiments] The two numerical examples would be strengthened by reporting both the achieved objective value and a diagnostic for the empirical bias of the gradient estimator (e.g., Monte-Carlo estimate of E[estimator] versus a high-accuracy reference).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. The major comment concerns the lack of analysis for transient bias in the inner Markov chain due to drifting targets as ξ is updated. We address this point directly below.

read point-by-point responses

Referee: [Algorithm and convergence analysis sections] The unbiasedness claim for the composite gradient estimator rests on the inner Markov chain being at stationarity with respect to the current ξ at each outer iteration. Because ξ is updated on every step, the target conditional law drifts; standard geometric ergodicity guarantees apply only for fixed ξ. No analysis is supplied (in the algorithm section or the convergence section) that bounds the residual transient bias or shows that the subsequent debiasing/coupling step removes lag bias whose magnitude depends on the relative time scales of the inner chain and the outer SA update. This is load-bearing for the claim that the method “allows one to find the optimizer.”

Authors: We thank the referee for highlighting this important subtlety. The unbiasedness of the gradient estimator is formally established under the assumption that the inner Markov chain has reached stationarity for the current fixed ξ. As the outer stochastic approximation updates ξ at each iteration, the target conditional distribution changes, so standard geometric ergodicity results for time-homogeneous chains do not immediately guarantee that the chain is close to stationarity at the new target. The manuscript does not supply explicit bounds on the resulting transient (lag) bias or a quantitative analysis of how the debiasing/coupling step interacts with the relative time scales of the inner and outer processes. We agree that this constitutes a gap in the current theoretical development. In the revised version we will add a dedicated paragraph in the convergence analysis section that introduces a mixing-time assumption on the inner chain relative to the outer step-size schedule and shows that, under this condition, the residual bias vanishes asymptotically, thereby supporting convergence to the optimizer of F(ξ). revision: yes

Circularity Check

0 steps flagged

No circularity: method combines external unbiased estimators with standard Markovian SA without self-referential reduction

full rationale

The paper defines the CSO objective F(ξ) explicitly as an outer expectation of a function of a conditional inner expectation, then proposes a composite estimator that applies Markovian stochastic approximation to sample from the conditional law and layers an unbiased gradient estimator on top. This construction cites Goda & Kitade (2023) for the base CSO setting and relies on standard ergodicity results for the inner chain; neither the optimizer nor the gradient estimator is obtained by fitting parameters to the target result itself or by renaming a known quantity. The derivation therefore remains self-contained against external benchmarks and does not reduce any claimed prediction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on unstated convergence assumptions for the Markov chain and unbiasedness of the gradient estimator after approximation.

pith-pipeline@v0.9.0 · 5697 in / 1113 out tokens · 24115 ms · 2026-05-20T23:37:58.715943+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Andrieu , C., Jasra , A., Doucet , A., & Del Moral , P. (2011). On non-linear Markov chain Monte Carlo. Bernoulli, 17 , 987-1014

work page 2011
[2]

& Priouret , P

Andrieu , C., Moulines , E. & Priouret , P. (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. , 44 , 283-312

work page 2005
[3]

Andrieu , C., & Vihola , M. (2014). Markovian stochastic approximation with expanding projections. Bernoulli, 20 , 545--585

work page 2014
[4]

Awadelkarim , E., Jasra , A., & Ruzayqat , H. (2024). Unbiased parameters for diffusions. SIAM J. Control Optim. , 62 , 2664-2694

work page 2024
[5]

Capp\'e , O., Ryden , T, & Moulines , \'E. (2005). Inference in Hidden Markov Models. Springer: New York

work page 2005
[6]

& Plataniotis , A

Dellaportas , P., Titsias , M., Petrova , K. & Plataniotis , A. (2023). Scalable inference for a full multivariate stochastic volatility model. J. Econ., 232 , 501-520

work page 2023
[7]

& Kitade , W

Goda , T. & Kitade , W. (2023). Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization. Math. Comp. Sim., 204 , 743-763

work page 2023
[8]

(2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs

Goda , T., Hironaka , T., Kitade , W & Foster , A. (2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs. SIAM J. Sci. Comput., 44, A286--A311

work page 2022
[9]

Jasra , A., Law , K. J. H., & Yu , F. (2022). Unbiased filtering of diffusions. Adv. Appl. Probab. 54 , 661-687

work page 2022
[10]

& Jasra , A

Heng , J., Houssineau , J. & Jasra , A. (2024). On unbiased score estimation for partially observed diffusions. J. Mach. Learn. Res., 25 , 1-66

work page 2024
[11]

Heng , J., Jasra , A., Law , K. J. H. & Tarakanov , A. (2023).On unbiased estimation of discretized models. SIAM/ASA JUQ, 11 , 616--645

work page 2023
[12]

Hu , Y., Chen , X., & He , N.. (2020). Sample complexity of sample average approximation for conditional stochastic optimization. SIAM J. Optim. 30 , 2103--2133

work page 2020
[13]

Hu , Y., Chen , X., & He , N.. (2020). Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning. NIPs

work page 2020
[14]

& Schon , T

Lindsten , F., Jordan , M. & Schon , T. (2014) Particle Gibbs with ancestor sampling. J. Mach. Learn. Res., 15 , 2145--2184

work page 2014
[15]

Lindsten , F., Douc , R., & Moulines , E. (2015). Uniform ergodicity of the particle Gibbs sampler. Scand. J. Stat., 42 775--797

work page 2015
[16]

& Gretton , A

Singh , R., Sahani , M. & Gretton , A. (2019) Kernel instrumental variable regression. NeurIPs

work page 2019
[17]

& Adams , N

Tsagaris , T., Jasra , A. & Adams , N. (2012). Robust and adaptive algorithms for online portfolio selection. Quant. Finan., 12 ,1651--1662

work page 2012

[1] [1]

Andrieu , C., Jasra , A., Doucet , A., & Del Moral , P. (2011). On non-linear Markov chain Monte Carlo. Bernoulli, 17 , 987-1014

work page 2011

[2] [2]

& Priouret , P

Andrieu , C., Moulines , E. & Priouret , P. (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. , 44 , 283-312

work page 2005

[3] [3]

Andrieu , C., & Vihola , M. (2014). Markovian stochastic approximation with expanding projections. Bernoulli, 20 , 545--585

work page 2014

[4] [4]

Awadelkarim , E., Jasra , A., & Ruzayqat , H. (2024). Unbiased parameters for diffusions. SIAM J. Control Optim. , 62 , 2664-2694

work page 2024

[5] [5]

Capp\'e , O., Ryden , T, & Moulines , \'E. (2005). Inference in Hidden Markov Models. Springer: New York

work page 2005

[6] [6]

& Plataniotis , A

Dellaportas , P., Titsias , M., Petrova , K. & Plataniotis , A. (2023). Scalable inference for a full multivariate stochastic volatility model. J. Econ., 232 , 501-520

work page 2023

[7] [7]

& Kitade , W

Goda , T. & Kitade , W. (2023). Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization. Math. Comp. Sim., 204 , 743-763

work page 2023

[8] [8]

(2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs

Goda , T., Hironaka , T., Kitade , W & Foster , A. (2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs. SIAM J. Sci. Comput., 44, A286--A311

work page 2022

[9] [9]

Jasra , A., Law , K. J. H., & Yu , F. (2022). Unbiased filtering of diffusions. Adv. Appl. Probab. 54 , 661-687

work page 2022

[10] [10]

& Jasra , A

Heng , J., Houssineau , J. & Jasra , A. (2024). On unbiased score estimation for partially observed diffusions. J. Mach. Learn. Res., 25 , 1-66

work page 2024

[11] [11]

Heng , J., Jasra , A., Law , K. J. H. & Tarakanov , A. (2023).On unbiased estimation of discretized models. SIAM/ASA JUQ, 11 , 616--645

work page 2023

[12] [12]

Hu , Y., Chen , X., & He , N.. (2020). Sample complexity of sample average approximation for conditional stochastic optimization. SIAM J. Optim. 30 , 2103--2133

work page 2020

[13] [13]

Hu , Y., Chen , X., & He , N.. (2020). Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning. NIPs

work page 2020

[14] [14]

& Schon , T

Lindsten , F., Jordan , M. & Schon , T. (2014) Particle Gibbs with ancestor sampling. J. Mach. Learn. Res., 15 , 2145--2184

work page 2014

[15] [15]

Lindsten , F., Douc , R., & Moulines , E. (2015). Uniform ergodicity of the particle Gibbs sampler. Scand. J. Stat., 42 775--797

work page 2015

[16] [16]

& Gretton , A

Singh , R., Sahani , M. & Gretton , A. (2019) Kernel instrumental variable regression. NeurIPs

work page 2019

[17] [17]

& Adams , N

Tsagaris , T., Jasra , A. & Adams , N. (2012). Robust and adaptive algorithms for online portfolio selection. Quant. Finan., 12 ,1651--1662

work page 2012