pith. sign in

arxiv: 2605.18786 · v1 · pith:KEUF73BEnew · submitted 2026-05-07 · 🧮 math.OC · cs.NA· math.NA· stat.ME

Unbiased Gradients for a Class of Conditional Stochastic Optimization Problems

Pith reviewed 2026-05-20 23:37 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NAstat.ME
keywords conditional stochastic optimizationunbiased gradientsMarkovian stochastic approximationstochastic volatility modelsportfolio selectionparameter estimation
0
0 comments X

The pith

A method combining Markovian stochastic approximation with unbiased estimators finds optimizers for conditional stochastic problems when joint distributions cannot be sampled directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses conditional stochastic optimization problems of the form F(ξ) := E[f(Z, E[g(Z,X,ξ)|Z])], where the joint law of X and Z cannot be sampled exactly. It proposes combining Markovian stochastic approximation with unbiased approximation methods to optimize F(ξ) without introducing uncorrectable bias. This is shown to work for tasks like parameter estimation with model averaging and portfolio selection in high-dimensional full factor multivariate stochastic volatility models. A sympathetic reader cares because the approach solves a class of optimization problems that arise in statistical modeling but were previously limited by sampling constraints.

Core claim

By merging Markovian stochastic approximation for handling the sampling of variables with unbiased gradient estimation, the optimizer of F(ξ) can be found even when the joint distribution of X and Z is not directly accessible, as illustrated in examples from parameter estimation and portfolio selection.

What carries the argument

The integration of Markovian stochastic approximation with unbiased approximation methods to produce gradients for the conditional stochastic objective F(ξ).

Load-bearing premise

Suitable conditional or marginal samplers exist and the Markovian approximation converges to the correct stationary distribution in a manner where any bias can be corrected by the unbiased estimator.

What would settle it

Apply the method to a simulated instance of F(ξ) with a known closed-form optimizer and check whether the iterates converge to that known value at the expected rate.

Figures

Figures reproduced from arXiv: 2605.18786 by Ajay Jasra, Miguel Alvarez.

Figure 1
Figure 1. Figure 1: Estimation of the relative MSE, bias squared and variance of the estimated score function in [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Estimation of the relative MSE, bias squared and variance of the estimated score function in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Estimation of the relative MSE, bias squared and variance of the estimated score function in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Estimated relative MSE of the component in terms of MSA iteration. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Approximation of the objective function F(ξ) in terms of the number of iterations of the MSA method. For the second part, the algorithm is implemented in an online setting over an evaluation horizon of 252 trading days, spanning the period from 2023-12-20 to 2024-12-13. Each dataset consists of approximately 730 daily observations. At the initial rebalancing time, model parameters are estimated using a tra… view at source ↗
Figure 6
Figure 6. Figure 6: Approximation of the objective function F(ξ) in terms of the number of iterations of the MSA method.. Market Strategy FinalW % gain % loss MDD % WT Ann.R Ann.V Sharpe STOXX Europe 600 Unbiased MSV-MSA 1.33 0.58 -0.53 6.04 58.40 33.80 11.48 2.60 Uniform 1.17 0.49 -0.50 6.43 56.80 16.94 9.86 1.64 Tadawul Unbiased MSV-MSA 1.26 0.60 -0.65 11.17 59.60 25.81 13.40 1.78 Uniform 1.15 0.60 -0.63 13.21 56.00 15.48 1… view at source ↗
read the original abstract

In this paper we consider the conditional stochastic optimization (CSO) problem. This consists of optimizing a function which can be written as the expectation of a function which is itself a function of a conditional expectation, i.e.~of the type $F(\xi) := \mathbb{E}\left[f\left(Z,\mathbb{E}[g(Z,X,\xi)|Z]\right)\right]$, where precise definitions are given in the main text. We address a particular class of CSO problems where the joint law of the random variables $X,Z$ cannot be exactly sampled; this case has been addressed in Goda & Kitade (2023). We introduce a method that combines Markovian stochastic approximation with unbiased approximation methods which allows one to find the optimizer of $F(\xi)$ in the context of interest. We illustrate our methodology on two examples associated to parameter estimation with model averaging and portfolio selection associated to high-dimensional full factor multivariate stochastic volatility models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper addresses a class of conditional stochastic optimization (CSO) problems of the form F(ξ) := E[f(Z, E[g(Z,X,ξ)|Z])], where the joint law of (X,Z) cannot be sampled exactly. It proposes a method that combines Markovian stochastic approximation (to sample from the relevant conditional or marginal distributions) with unbiased gradient estimators, and illustrates the approach on two examples: parameter estimation with model averaging and portfolio selection for high-dimensional full-factor multivariate stochastic volatility models.

Significance. If the central construction yields unbiased gradients for the composite estimator despite the drifting target distribution induced by simultaneous ξ updates, the result would extend the scope of unbiased stochastic optimization methods to a practically relevant subclass of CSO problems previously treated only under stronger sampling assumptions (e.g., Goda & Kitade 2023). The two concrete applications demonstrate potential utility in statistics and quantitative finance.

major comments (1)
  1. [Algorithm and convergence analysis sections] The unbiasedness claim for the composite gradient estimator rests on the inner Markov chain being at stationarity with respect to the current ξ at each outer iteration. Because ξ is updated on every step, the target conditional law drifts; standard geometric ergodicity guarantees apply only for fixed ξ. No analysis is supplied (in the algorithm section or the convergence section) that bounds the residual transient bias or shows that the subsequent debiasing/coupling step removes lag bias whose magnitude depends on the relative time scales of the inner chain and the outer SA update. This is load-bearing for the claim that the method “allows one to find the optimizer.”
minor comments (2)
  1. [Introduction / Problem statement] The notation for the conditional expectation inside f is introduced in the abstract but would benefit from an explicit display equation with all random variables and conditioning clearly labeled in the main text.
  2. [Numerical experiments] The two numerical examples would be strengthened by reporting both the achieved objective value and a diagnostic for the empirical bias of the gradient estimator (e.g., Monte-Carlo estimate of E[estimator] versus a high-accuracy reference).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. The major comment concerns the lack of analysis for transient bias in the inner Markov chain due to drifting targets as ξ is updated. We address this point directly below.

read point-by-point responses
  1. Referee: [Algorithm and convergence analysis sections] The unbiasedness claim for the composite gradient estimator rests on the inner Markov chain being at stationarity with respect to the current ξ at each outer iteration. Because ξ is updated on every step, the target conditional law drifts; standard geometric ergodicity guarantees apply only for fixed ξ. No analysis is supplied (in the algorithm section or the convergence section) that bounds the residual transient bias or shows that the subsequent debiasing/coupling step removes lag bias whose magnitude depends on the relative time scales of the inner chain and the outer SA update. This is load-bearing for the claim that the method “allows one to find the optimizer.”

    Authors: We thank the referee for highlighting this important subtlety. The unbiasedness of the gradient estimator is formally established under the assumption that the inner Markov chain has reached stationarity for the current fixed ξ. As the outer stochastic approximation updates ξ at each iteration, the target conditional distribution changes, so standard geometric ergodicity results for time-homogeneous chains do not immediately guarantee that the chain is close to stationarity at the new target. The manuscript does not supply explicit bounds on the resulting transient (lag) bias or a quantitative analysis of how the debiasing/coupling step interacts with the relative time scales of the inner and outer processes. We agree that this constitutes a gap in the current theoretical development. In the revised version we will add a dedicated paragraph in the convergence analysis section that introduces a mixing-time assumption on the inner chain relative to the outer step-size schedule and shows that, under this condition, the residual bias vanishes asymptotically, thereby supporting convergence to the optimizer of F(ξ). revision: yes

Circularity Check

0 steps flagged

No circularity: method combines external unbiased estimators with standard Markovian SA without self-referential reduction

full rationale

The paper defines the CSO objective F(ξ) explicitly as an outer expectation of a function of a conditional inner expectation, then proposes a composite estimator that applies Markovian stochastic approximation to sample from the conditional law and layers an unbiased gradient estimator on top. This construction cites Goda & Kitade (2023) for the base CSO setting and relies on standard ergodicity results for the inner chain; neither the optimizer nor the gradient estimator is obtained by fitting parameters to the target result itself or by renaming a known quantity. The derivation therefore remains self-contained against external benchmarks and does not reduce any claimed prediction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on unstated convergence assumptions for the Markov chain and unbiasedness of the gradient estimator after approximation.

pith-pipeline@v0.9.0 · 5697 in / 1113 out tokens · 24115 ms · 2026-05-20T23:37:58.715943+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Andrieu , C., Jasra , A., Doucet , A., & Del Moral , P. (2011). On non-linear Markov chain Monte Carlo. Bernoulli, 17 , 987-1014

  2. [2]

    & Priouret , P

    Andrieu , C., Moulines , E. & Priouret , P. (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. , 44 , 283-312

  3. [3]

    Andrieu , C., & Vihola , M. (2014). Markovian stochastic approximation with expanding projections. Bernoulli, 20 , 545--585

  4. [4]

    Awadelkarim , E., Jasra , A., & Ruzayqat , H. (2024). Unbiased parameters for diffusions. SIAM J. Control Optim. , 62 , 2664-2694

  5. [5]

    Capp\'e , O., Ryden , T, & Moulines , \'E. (2005). Inference in Hidden Markov Models. Springer: New York

  6. [6]

    & Plataniotis , A

    Dellaportas , P., Titsias , M., Petrova , K. & Plataniotis , A. (2023). Scalable inference for a full multivariate stochastic volatility model. J. Econ., 232 , 501-520

  7. [7]

    & Kitade , W

    Goda , T. & Kitade , W. (2023). Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization. Math. Comp. Sim., 204 , 743-763

  8. [8]

    (2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs

    Goda , T., Hironaka , T., Kitade , W & Foster , A. (2022) Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs. SIAM J. Sci. Comput., 44, A286--A311

  9. [9]

    Jasra , A., Law , K. J. H., & Yu , F. (2022). Unbiased filtering of diffusions. Adv. Appl. Probab. 54 , 661-687

  10. [10]

    & Jasra , A

    Heng , J., Houssineau , J. & Jasra , A. (2024). On unbiased score estimation for partially observed diffusions. J. Mach. Learn. Res., 25 , 1-66

  11. [11]

    Heng , J., Jasra , A., Law , K. J. H. & Tarakanov , A. (2023).On unbiased estimation of discretized models. SIAM/ASA JUQ, 11 , 616--645

  12. [12]

    Hu , Y., Chen , X., & He , N.. (2020). Sample complexity of sample average approximation for conditional stochastic optimization. SIAM J. Optim. 30 , 2103--2133

  13. [13]

    Hu , Y., Chen , X., & He , N.. (2020). Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning. NIPs

  14. [14]

    & Schon , T

    Lindsten , F., Jordan , M. & Schon , T. (2014) Particle Gibbs with ancestor sampling. J. Mach. Learn. Res., 15 , 2145--2184

  15. [15]

    Lindsten , F., Douc , R., & Moulines , E. (2015). Uniform ergodicity of the particle Gibbs sampler. Scand. J. Stat., 42 775--797

  16. [16]

    & Gretton , A

    Singh , R., Sahani , M. & Gretton , A. (2019) Kernel instrumental variable regression. NeurIPs

  17. [17]

    & Adams , N

    Tsagaris , T., Jasra , A. & Adams , N. (2012). Robust and adaptive algorithms for online portfolio selection. Quant. Finan., 12 ,1651--1662