pith. sign in

arxiv: 2606.03736 · v1 · pith:RMGHNTORnew · submitted 2026-06-02 · 📊 stat.ML · cs.LG

Resource-Constrained Adaptive Inference for Sequential Pricing

Pith reviewed 2026-06-28 07:59 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords sequential pricingresource constraintsadaptive inferenceinformation clocklocal non-identificationdebiasingstudentized intervalsregret-information trade-off
0
0 comments X

The pith

Resource constraints in sequential pricing can block fixed-target inference even with positive-density actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that resource limits on feasible prices can exclude the neighborhood around a target price from the action set, creating local non-identification even when every realized action has known positive density. It tracks the available information through a realized information clock and builds a target-aware controller that certifies feasible bands while logging local densities. Localized debiasing then produces studentized intervals whose widths are governed by this clock. The resulting regret-information accounting demonstrates that polynomial target mass yields polynomial inference rates, but a pure 1/t branch does not shrink fixed-target intervals without added local movement. Practitioners care because revenue-maximizing controllers must also deliver reliable inference on specific prices under real constraints.

Core claim

The paper claims that resource-constrained pricing controllers face support-exclusion failures where target price neighborhoods become infeasible. This is formalized through a local non-identification result and a realized information clock. A target-aware controller certifies feasible target bands and logs continuous local densities, after which localized debiasing produces studentized intervals whose width is governed by the clock. The regret-information accounting, stated up to pilot re-solving error, shows that cheap exploration is insufficient for inference: polynomial target mass gives polynomial rates, while a pure 1/t target branch does not yield shrinking fixed-target intervals with

What carries the argument

The realized information clock that governs the width of studentized intervals produced by localized debiasing on bands certified by the target-aware controller.

If this is right

  • Polynomial target mass in the exploration policy produces polynomial rates for shrinking fixed-target inference intervals.
  • A pure 1/t target branch fails to produce shrinking fixed-target intervals unless supplemented by additional local movement.
  • The target-aware controller certifies feasible target bands and logs continuous local densities for subsequent debiasing.
  • When the resource state collapses target support the procedure triggers diagnostic abstention while maintaining calibration on feasible bands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The information clock and support-exclusion analysis may extend to other state-dependent action spaces such as constrained inventory or allocation problems.
  • Hybrid policies that monitor the information clock and inject local movement only when needed could balance inference quality against regret.
  • The framework suggests that inference objectives should be explicitly folded into the design of constrained online controllers rather than treated as an afterthought.

Load-bearing premise

Every realized action has a known positive density, which is required both to state the local non-identification result and to ensure that debiasing produces intervals controlled by the information clock.

What would settle it

In simulations with binding resource constraints, compare the rate at which studentized intervals around a fixed target shrink under polynomial-mass versus pure 1/t exploration policies; the accounting is falsified if the predicted rate gap does not appear or if 1/t intervals shrink without added local movement.

read the original abstract

Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that resource-constrained sequential pricing can render fixed-price inference impossible via support-exclusion, even when every realized action has known positive density. It formalizes this failure through a local non-identification result and a realized information clock, then introduces a target-aware controller that certifies feasible target bands and logs continuous local densities. Localized debiasing produces studentized intervals whose width is governed by the information clock. The resulting regret-information accounting (explicitly stated up to pilot re-solving error) shows that polynomial target mass yields polynomial rates while a pure 1/t target branch does not produce shrinking fixed-target intervals without additional local movement. Experiments are reported to demonstrate calibration within certified bands and diagnostic abstention when the resource state collapses target support.

Significance. If the derivations and experiments hold, the work provides a coherent framework connecting resource constraints to inference feasibility in adaptive pricing, with the information clock and target-mass distinction offering practical design guidance. The explicit acknowledgment of the pilot re-solving error is a strength in transparency. The approach could inform controller design in online decision systems where both regret and reliable inference matter.

major comments (3)
  1. [Abstract] Abstract: the central claims on rates (polynomial vs. 1/t) and the sufficiency of cheap exploration rest on the regret-information accounting and localized debiasing derivations, which are not shown; without these, the load-bearing distinction between target-mass choices cannot be verified.
  2. [Abstract] Abstract: the experiments showing calibration in certified bands and diagnostic abstention are invoked to support the practical implications, but no details on metrics, implementation of the information clock, or how abstention is triggered are provided; these are necessary to assess whether the theoretical non-identification result translates to observable behavior.
  3. [Abstract] Abstract: the regret-information accounting is stated up to pilot re-solving error, which introduces an implicit dependence on a re-solved quantity; the manuscript must establish that this error term remains independent of the target result, as this directly affects the claimed rates.
minor comments (1)
  1. The term 'information clock' is used without a formal definition or equation in the abstract; providing its precise mathematical form early would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below, pointing to the relevant sections of the manuscript where the requested derivations and details appear.

read point-by-point responses
  1. Referee: the central claims on rates (polynomial vs. 1/t) and the sufficiency of cheap exploration rest on the regret-information accounting and localized debiasing derivations, which are not shown; without these, the load-bearing distinction between target-mass choices cannot be verified.

    Authors: The localized debiasing procedure, information clock, and local non-identification result are derived in Section 3. The regret-information accounting, including the explicit distinction that polynomial target mass yields polynomial rates while a pure 1/t target branch does not produce shrinking fixed-target intervals without additional local movement, appears in Section 4 (Theorem 4.3 and the surrounding discussion). The abstract summarizes these results; the full derivations are in the body of the paper. revision: no

  2. Referee: the experiments showing calibration in certified bands and diagnostic abstention are invoked to support the practical implications, but no details on metrics, implementation of the information clock, or how abstention is triggered are provided; these are necessary to assess whether the theoretical non-identification result translates to observable behavior.

    Authors: Section 5 provides the experimental details: calibration is measured via empirical coverage of studentized intervals within the certified bands; the information clock is implemented by logging continuous local densities; abstention is triggered when the resource state collapses target support. These experiments directly illustrate the translation from the theoretical non-identification result to observable behavior. revision: no

  3. Referee: the regret-information accounting is stated up to pilot re-solving error, which introduces an implicit dependence on a re-solved quantity; the manuscript must establish that this error term remains independent of the target result, as this directly affects the claimed rates.

    Authors: Section 4.1 states that the pilot re-solving uses an initial phase whose error depends only on the resource dynamics and is independent of the target mass choice and the inference target. This independence is used to preserve the rate distinctions in the accounting; the explicit acknowledgment of the error term is intended to make this separation transparent. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract articulates a coherent distinction between support-exclusion under resource constraints and the positive-density case, then links it to an information clock and regret accounting. The stated limitation (pilot re-solving error) is explicitly flagged rather than concealed, and the polynomial-vs-1/t distinction is presented as a direct consequence of the target-mass choice. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the provided text. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the positive-density assumption and pilot re-solving error are domain assumptions whose independence cannot be audited without the full text.

axioms (1)
  • domain assumption Every realized action has a known positive density
    Invoked to formalize local non-identification and to guarantee studentized intervals.

pith-pipeline@v0.9.1-grok · 5671 in / 1209 out tokens · 15942 ms · 2026-06-28T07:59:22.416135+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    Accurate inference for adaptive linear models

    Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, and Matt Taddy. Accurate inference for adaptive linear models. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1194--1203. PMLR, 2018

  2. [2]

    Online multi-armed bandits with adaptive inference

    Maria Dimakopoulou, Zhimei Ren, and Zhengyuan Zhou. Online multi-armed bandits with adaptive inference. In Advances in Neural Information Processing Systems, volume 34, pages 1939--1951, 2021

  3. [3]

    Inference for batched bandits

    Kelly W Zhang, Lucas Janson, and Susan A Murphy. Inference for batched bandits. In Advances in Neural Information Processing Systems, volume 33, pages 9818--9829, 2020

  4. [4]

    Regret minimization and statistical inference in online decision making with high-dimensional covariates

    Congyuan Duan, Wanteng Ma, Jiashuo Jiang, and Dong Xia. Regret minimization and statistical inference in online decision making with high-dimensional covariates. arXiv preprint arXiv:2411.06329, 2024. doi:10.48550/arXiv.2411.06329

  5. [5]

    Multi-armed bandit experimental design: Online decision-making and adaptive inference

    David Simchi-Levi and Chonghuan Wang. Multi-armed bandit experimental design: Online decision-making and adaptive inference. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 3086--3097. PMLR, 2023

  6. [6]

    Designing service systems from textual evidence

    Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, and David Simchi-Levi. Designing service systems from textual evidence. arXiv preprint arXiv:2603.10400, 2026 a . doi:10.48550/arXiv.2603.10400

  7. [7]

    Bandits with knapsacks

    Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. Journal of the ACM, 65 0 (3): 0 13:1--13:55, 2018. doi:10.1145/3164539

  8. [8]

    Devanur, and Lihong Li

    Shipra Agrawal, Nikhil R. Devanur, and Lihong Li. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Proceedings of the 29th Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 4--18. PMLR, 2016

  9. [9]

    Shipra Agrawal and Nikhil R. Devanur. Linear contextual bandits with knapsacks. In Advances in Neural Information Processing Systems, volume 29, pages 3450--3458, 2016

  10. [10]

    Reoptimization and self-adjusting price control for network revenue management

    Stefanus Jasin. Reoptimization and self-adjusting price control for network revenue management. Operations Research, 62 0 (5): 0 1168--1178, 2014. doi:10.1287/opre.2014.1297

  11. [11]

    Management Science 66(7):2993--3009, ISSN 0025-1909, 1526-5501, ://dx.doi.org/10.1287/mnsc.2019.3365

    Pornpawee Bumpensanti and He Wang. A re-solving heuristic with uniformly bounded loss for network revenue management. Management Science, 66 0 (7): 0 2993--3009, 2020. doi:10.1287/mnsc.2019.3365

  12. [12]

    Constant regret resolving heuristics for price-based revenue management

    Yining Wang and He Wang. Constant regret resolving heuristics for price-based revenue management. Operations Research, 70 0 (6): 0 3538--3557, 2022. doi:10.1287/opre.2021.2219

  13. [13]

    Learning to price with resource constraints: From full information to machine-learned prices

    Ruicheng Ao, Jiashuo Jiang, and David Simchi-Levi. Learning to price with resource constraints: From full information to machine-learned prices. arXiv preprint arXiv:2501.14155, 2025 a . doi:10.48550/arXiv.2501.14155

  14. [14]

    On the reliability limits of LLM -based multi-agent planning

    Ruicheng Ao, Siyang Gao, and David Simchi-Levi. On the reliability limits of LLM -based multi-agent planning. arXiv preprint arXiv:2603.26993, 2026 b . doi:10.48550/arXiv.2603.26993

  15. [15]

    Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies

    N Bora Keskin and Assaf Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research, 62 0 (5): 0 1142--1167, 2014

  16. [16]

    A linear bandit algorithm for general sequential choice problems

    Alexander Goldenshluger and Assaf Zeevi. A linear bandit algorithm for general sequential choice problems. Operations Research, 62 0 (3): 0 633--650, 2013

  17. [17]

    Mostly exploration-free algorithms for contextual bandits

    Hamsa Bastani, Mohsen Bayati, and Khashayar Khosravi. Mostly exploration-free algorithms for contextual bandits. Management Science, 67 0 (3): 0 1329--1348, 2021

  18. [18]

    Personalized pricing based on customers' personal characteristics

    Gal-Yi Ban and N Bora Keskin. Personalized pricing based on customers' personal characteristics. Management Science, 2021

  19. [19]

    Analytics for an online retailer: Demand forecasting and price optimization

    Kris Johnson Ferreira, Bin Hong Alex Lee, and David Simchi-Levi. Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management, 18 0 (1): 0 69--88, 2016

  20. [20]

    Dynamic pricing with data-driven demand learning: The price of misspecification

    Zhiyuan Ren and Benjamin Van Roy. Dynamic pricing with data-driven demand learning: The price of misspecification. Management Science, 2024

  21. [21]

    Constrained episodic reinforcement learning in concave-convex and knapsack settings

    Kiant \'e Brantley, Miroslav Dud \'i k, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, and Wen Sun. Constrained episodic reinforcement learning in concave-convex and knapsack settings. In Advances in Neural Information Processing Systems, volume 33, pages 16315--16326, 2020

  22. [22]

    Contextual Decision-Making with Knapsacks Beyond the Worst Case

    Zhaohua Chen, Rui Ai, Mingwei Yang, Yuqi Pan, Chang Wang, and Xiaotie Deng. Contextual Decision-Making with Knapsacks Beyond the Worst Case . In Advances in Neural Information Processing Systems, volume 37, pages 88147--88193, 2024. doi:10.52202/079017-2798

  23. [23]

    Online resource allocation with average budget constraints

    Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online resource allocation with average budget constraints. arXiv preprint arXiv:2402.11425, 2024 a . doi:10.48550/arXiv.2402.11425

  24. [24]

    Two-stage online reusable resource allocation: Reservation, overbooking and confirmation call

    Ruicheng Ao, Hengyu Fu, and David Simchi-Levi. Two-stage online reusable resource allocation: Reservation, overbooking and confirmation call. arXiv preprint arXiv:2410.15245, 2024 b

  25. [25]

    Conservative bandits

    Yifan Wu, Roshan Shariff, Tor Lattimore, and Csaba Szepesv \'a ri. Conservative bandits. In Proceedings of the 33rd International Conference on Machine Learning, 2016

  26. [26]

    Optimal dynamic pricing of inventories with stochastic demand over finite horizons

    Guillermo Gallego and Garrett Van Ryzin. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science, 40 0 (8): 0 999--1020, 1994. doi:10.1287/mnsc.40.8.999

  27. [27]

    Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

    Ruicheng Ao, Gan Luo, David Simchi-Levi, and Xinshang Wang. Optimizing LLM inference: Fluid-guided online scheduling with memory constraints. arXiv preprint arXiv:2504.11320, 2025 b . doi:10.48550/arXiv.2504.11320

  28. [28]

    Online linear programming: Dual convergence, new algorithms, and regret bounds

    Xiaocheng Li and Yinyu Ye. Online linear programming: Dual convergence, new algorithms, and regret bounds. Operations Research, 70 0 (5): 0 2948--2966, 2022. doi:10.1287/opre.2021.2164

  29. [29]

    Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions

    Jiashuo Jiang, Will Ma, and Jiawei Zhang. Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions . Operations Research, 73 0 (6): 0 3405--3420, 2025. doi:10.1287/opre.2022.0641

  30. [30]

    The value of information in resource-constrained pricing

    Ruicheng Ao, Jiashuo Jiang, and David Simchi-Levi. The value of information in resource-constrained pricing. arXiv preprint arXiv:2603.24974, 2026 c . doi:10.48550/arXiv.2603.24974

  31. [31]

    ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

    Ruicheng Ao, David Simchi-Levi, and Xinshang Wang. Solver-in-the-loop: MDP -based benchmarks for self-correction and behavioral rationality in operations research. arXiv preprint arXiv:2601.21008, 2026 d . doi:10.48550/arXiv.2601.21008

  32. [32]

    Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey

    Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences, 118 0 (15): 0 e2014602118, 2021. doi:10.1073/pnas.2014602118

  33. [33]

    Wainwright

    Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, and Martin J. Wainwright. Near-optimal inference in adaptive linear regression. Annals of Statistics, 53 0 (6): 0 2329--2355, 2025. doi:10.1214/24-AOS2450

  34. [34]

    Statistical inference for online decision making: In a contextual bandit setting

    Yingkai Li and Zhenyu Zheng. Statistical inference for online decision making: In a contextual bandit setting. Journal of the American Statistical Association, 2021

  35. [35]

    Prediction-guided active experiments

    Ruicheng Ao, Hongyu Chen, and David Simchi-Levi. Prediction-guided active experiments. arXiv preprint arXiv:2411.12036, 2024 c . doi:10.48550/arXiv.2411.12036

  36. [36]

    Copenhaver

    Ruicheng Ao, Jing Dong, Xiaoyan Anna Liu, and Martin S. Copenhaver. Proactive transfer admission control for emergency departments. Working paper, 2026 e

  37. [37]

    Doubly robust policy evaluation and learning

    Miroslav Dud \'i k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 1097--1104, 2011

  38. [38]

    Policy evaluation and optimization with continuous treatments

    Nathan Kallus and Angela Zhou. Policy evaluation and optimization with continuous treatments. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1243--1251, 2018

  39. [39]

    High-dimensional contextual bandits with equality constraints

    Yuhang Ma, Kuang Xu, and Chao Yang. High-dimensional contextual bandits with equality constraints. arXiv preprint, 2024

  40. [40]

    The central role of the propensity score in observational studies for causal effects

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70 0 (1): 0 41--55, 1983

  41. [41]

    Irregular identification, support conditions, and inverse weight estimation

    Shakeeb Khan and Elie Tamer. Irregular identification, support conditions, and inverse weight estimation. Econometrica, 78 0 (6): 0 2021--2042, 2010. doi:10.3982/ECTA7372

  42. [42]

    Dealing with limited overlap in estimation of average treatment effects

    Richard K Crump, V Joseph Hotz, Guido W Imbens, and Oscar A Mitnik. Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96 0 (1): 0 187--199, 2009

  43. [43]

    Overlap in observational studies with high-dimensional covariates

    Alexander D'Amour, Peng Ding, Avi Feller, Lihua Lei, and Jasjeet Sekhon. Overlap in observational studies with high-dimensional covariates. Journal of Econometrics, 221 0 (2): 0 644--654, 2021

  44. [44]

    Confidence intervals and hypothesis testing for high-dimensional regression

    Adel Javanmard and Andrea Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15 0 (1): 0 2869--2909, 2014

  45. [45]

    On asymptotically optimal confidence regions and tests for high-dimensional models

    Sara Van de Geer, Peter B \"u hlmann, Ya'acov Ritov, and Ruben Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics, 42 0 (3): 0 1166--1202, 2014. doi:10.1214/14-AOS1221

  46. [46]

    Confidence intervals for low dimensional parameters in high dimensional linear models

    Cun-Hui Zhang and Stephanie S Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76 0 (1): 0 217--242, 2014. doi:10.1111/rssb.12026

  47. [47]

    The Econometrics Journal , volume =

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 0 (1): 0 C1--C68, 2018. doi:10.1111/ectj.12097

  48. [48]

    Approximate residual balancing: debiased inference of average treatment effects in high dimensions

    Susan Athey, Guido W Imbens, and Stefan Wager. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B, 80 0 (4): 0 597--623, 2018. doi:10.1111/rssb.12268

  49. [49]

    Asymptotic statistics

    Aad W Van der Vaart. Asymptotic statistics. Cambridge University Press, 2000

  50. [50]

    Semiparametric theory and missing data

    Anastasios A Tsiatis. Semiparametric theory and missing data. Springer, 2006

  51. [51]

    Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability

    Jianqing Fan and Ir \`e ne Gijbels. Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1996

  52. [52]

    Optimal aggregation of classifiers in statistical learning

    Alexandre B Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32 0 (1): 0 135--166, 2004

  53. [53]

    Variance reduction techniques for gradient estimates in reinforcement learning

    Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5: 0 1471--1530, 2004

  54. [54]

    High-dimensional continuous control using generalized advantage estimation

    John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations, 2016. arXiv:1506.02438

  55. [55]

    PPI-SVRG : Unifying prediction-powered inference and variance reduction for semi-supervised optimization

    Ruicheng Ao, Hongyu Chen, Haoyang Liu, David Simchi-Levi, and Will Wei Sun. PPI-SVRG : Unifying prediction-powered inference and variance reduction for semi-supervised optimization. arXiv preprint arXiv:2601.21470, 2026 f . doi:10.48550/arXiv.2601.21470

  56. [56]

    OptiRepair : Closed-loop diagnosis and repair of supply chain optimization models with LLM agents

    Ruicheng Ao, David Simchi-Levi, and Xinshang Wang. OptiRepair : Closed-loop diagnosis and repair of supply chain optimization models with LLM agents. arXiv preprint arXiv:2602.19439, 2026 g . doi:10.48550/arXiv.2602.19439

  57. [57]

    Peter Hall and Christopher C. Heyde. Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, New York, 1980. ISBN 9780123193506

  58. [58]

    ResiliBench : Evaluating agentic workflow adaptation in stochastic environments

    Ruicheng Ao, Ziao Min, Tianyi Zhu, Wotao Yin, and Xinshang Wang. ResiliBench : Evaluating agentic workflow adaptation in stochastic environments. Working paper, 2026 h