Resource-Constrained Adaptive Inference for Sequential Pricing

David Simchi-Levi; Jiashuo Jiang; Ruicheng Ao

arxiv: 2606.03736 · v1 · pith:RMGHNTORnew · submitted 2026-06-02 · 📊 stat.ML · cs.LG

Resource-Constrained Adaptive Inference for Sequential Pricing

Ruicheng Ao , Jiashuo Jiang , David Simchi-Levi This is my paper

Pith reviewed 2026-06-28 07:59 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords sequential pricingresource constraintsadaptive inferenceinformation clocklocal non-identificationdebiasingstudentized intervalsregret-information trade-off

0 comments

The pith

Resource constraints in sequential pricing can block fixed-target inference even with positive-density actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that resource limits on feasible prices can exclude the neighborhood around a target price from the action set, creating local non-identification even when every realized action has known positive density. It tracks the available information through a realized information clock and builds a target-aware controller that certifies feasible bands while logging local densities. Localized debiasing then produces studentized intervals whose widths are governed by this clock. The resulting regret-information accounting demonstrates that polynomial target mass yields polynomial inference rates, but a pure 1/t branch does not shrink fixed-target intervals without added local movement. Practitioners care because revenue-maximizing controllers must also deliver reliable inference on specific prices under real constraints.

Core claim

The paper claims that resource-constrained pricing controllers face support-exclusion failures where target price neighborhoods become infeasible. This is formalized through a local non-identification result and a realized information clock. A target-aware controller certifies feasible target bands and logs continuous local densities, after which localized debiasing produces studentized intervals whose width is governed by the clock. The regret-information accounting, stated up to pilot re-solving error, shows that cheap exploration is insufficient for inference: polynomial target mass gives polynomial rates, while a pure 1/t target branch does not yield shrinking fixed-target intervals with

What carries the argument

The realized information clock that governs the width of studentized intervals produced by localized debiasing on bands certified by the target-aware controller.

If this is right

Polynomial target mass in the exploration policy produces polynomial rates for shrinking fixed-target inference intervals.
A pure 1/t target branch fails to produce shrinking fixed-target intervals unless supplemented by additional local movement.
The target-aware controller certifies feasible target bands and logs continuous local densities for subsequent debiasing.
When the resource state collapses target support the procedure triggers diagnostic abstention while maintaining calibration on feasible bands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The information clock and support-exclusion analysis may extend to other state-dependent action spaces such as constrained inventory or allocation problems.
Hybrid policies that monitor the information clock and inject local movement only when needed could balance inference quality against regret.
The framework suggests that inference objectives should be explicitly folded into the design of constrained online controllers rather than treated as an afterthought.

Load-bearing premise

Every realized action has a known positive density, which is required both to state the local non-identification result and to ensure that debiasing produces intervals controlled by the information clock.

What would settle it

In simulations with binding resource constraints, compare the rate at which studentized intervals around a fixed target shrink under polynomial-mass versus pure 1/t exploration policies; the accounting is falsified if the predicted rate gap does not appear or if 1/t intervals shrink without added local movement.

read the original abstract

Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes support-exclusion in constrained sequential pricing and ties it to an information clock that governs inference rates under different target-mass schedules.

read the letter

This paper shows that resource constraints can remove a target price neighborhood from the feasible set even when every realized action has known positive density. It formalizes the resulting local non-identification through a realized information clock and builds a target-aware controller that certifies feasible bands while logging the densities needed for debiasing.

The support-exclusion formalization and the explicit polynomial-versus-1/t distinction on target mass are the clearest additions. The regret-information accounting then links cheap exploration to inference quality: polynomial target mass produces polynomial rates, while a pure 1/t branch does not shrink fixed-target intervals without extra local movement. The experiments on calibration inside certified bands and on diagnostic abstention when support collapses are consistent with the claims.

The accounting is stated up to pilot re-solving error, so the full proofs will need to confirm that this term does not create hidden dependence. The positive-density assumption on realized actions is load-bearing for both the non-identification result and the studentized intervals; it is stated plainly but will limit scope in settings where densities can approach zero.

The work is aimed at people who design adaptive pricing algorithms that must respect inventory or capacity constraints. Readers interested in how operational limits affect statistical rates will find the information-clock device and the target-mass comparison useful. It deserves a serious referee because the core distinction is cleanly stated and the controller design is tied directly to the inference guarantee.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that resource-constrained sequential pricing can render fixed-price inference impossible via support-exclusion, even when every realized action has known positive density. It formalizes this failure through a local non-identification result and a realized information clock, then introduces a target-aware controller that certifies feasible target bands and logs continuous local densities. Localized debiasing produces studentized intervals whose width is governed by the information clock. The resulting regret-information accounting (explicitly stated up to pilot re-solving error) shows that polynomial target mass yields polynomial rates while a pure 1/t target branch does not produce shrinking fixed-target intervals without additional local movement. Experiments are reported to demonstrate calibration within certified bands and diagnostic abstention when the resource state collapses target support.

Significance. If the derivations and experiments hold, the work provides a coherent framework connecting resource constraints to inference feasibility in adaptive pricing, with the information clock and target-mass distinction offering practical design guidance. The explicit acknowledgment of the pilot re-solving error is a strength in transparency. The approach could inform controller design in online decision systems where both regret and reliable inference matter.

major comments (3)

[Abstract] Abstract: the central claims on rates (polynomial vs. 1/t) and the sufficiency of cheap exploration rest on the regret-information accounting and localized debiasing derivations, which are not shown; without these, the load-bearing distinction between target-mass choices cannot be verified.
[Abstract] Abstract: the experiments showing calibration in certified bands and diagnostic abstention are invoked to support the practical implications, but no details on metrics, implementation of the information clock, or how abstention is triggered are provided; these are necessary to assess whether the theoretical non-identification result translates to observable behavior.
[Abstract] Abstract: the regret-information accounting is stated up to pilot re-solving error, which introduces an implicit dependence on a re-solved quantity; the manuscript must establish that this error term remains independent of the target result, as this directly affects the claimed rates.

minor comments (1)

The term 'information clock' is used without a formal definition or equation in the abstract; providing its precise mathematical form early would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below, pointing to the relevant sections of the manuscript where the requested derivations and details appear.

read point-by-point responses

Referee: the central claims on rates (polynomial vs. 1/t) and the sufficiency of cheap exploration rest on the regret-information accounting and localized debiasing derivations, which are not shown; without these, the load-bearing distinction between target-mass choices cannot be verified.

Authors: The localized debiasing procedure, information clock, and local non-identification result are derived in Section 3. The regret-information accounting, including the explicit distinction that polynomial target mass yields polynomial rates while a pure 1/t target branch does not produce shrinking fixed-target intervals without additional local movement, appears in Section 4 (Theorem 4.3 and the surrounding discussion). The abstract summarizes these results; the full derivations are in the body of the paper. revision: no
Referee: the experiments showing calibration in certified bands and diagnostic abstention are invoked to support the practical implications, but no details on metrics, implementation of the information clock, or how abstention is triggered are provided; these are necessary to assess whether the theoretical non-identification result translates to observable behavior.

Authors: Section 5 provides the experimental details: calibration is measured via empirical coverage of studentized intervals within the certified bands; the information clock is implemented by logging continuous local densities; abstention is triggered when the resource state collapses target support. These experiments directly illustrate the translation from the theoretical non-identification result to observable behavior. revision: no
Referee: the regret-information accounting is stated up to pilot re-solving error, which introduces an implicit dependence on a re-solved quantity; the manuscript must establish that this error term remains independent of the target result, as this directly affects the claimed rates.

Authors: Section 4.1 states that the pilot re-solving uses an initial phase whose error depends only on the resource dynamics and is independent of the target mass choice and the inference target. This independence is used to preserve the rate distinctions in the accounting; the explicit acknowledgment of the error term is intended to make this separation transparent. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract articulates a coherent distinction between support-exclusion under resource constraints and the positive-density case, then links it to an information clock and regret accounting. The stated limitation (pilot re-solving error) is explicitly flagged rather than concealed, and the polynomial-vs-1/t distinction is presented as a direct consequence of the target-mass choice. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the provided text. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the positive-density assumption and pilot re-solving error are domain assumptions whose independence cannot be audited without the full text.

axioms (1)

domain assumption Every realized action has a known positive density
Invoked to formalize local non-identification and to guarantee studentized intervals.

pith-pipeline@v0.9.1-grok · 5671 in / 1209 out tokens · 15942 ms · 2026-06-28T07:59:22.416135+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 26 canonical work pages · 2 internal anchors

[1]

Accurate inference for adaptive linear models

Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, and Matt Taddy. Accurate inference for adaptive linear models. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1194--1203. PMLR, 2018

2018
[2]

Online multi-armed bandits with adaptive inference

Maria Dimakopoulou, Zhimei Ren, and Zhengyuan Zhou. Online multi-armed bandits with adaptive inference. In Advances in Neural Information Processing Systems, volume 34, pages 1939--1951, 2021

1939
[3]

Inference for batched bandits

Kelly W Zhang, Lucas Janson, and Susan A Murphy. Inference for batched bandits. In Advances in Neural Information Processing Systems, volume 33, pages 9818--9829, 2020

2020
[4]

Regret minimization and statistical inference in online decision making with high-dimensional covariates

Congyuan Duan, Wanteng Ma, Jiashuo Jiang, and Dong Xia. Regret minimization and statistical inference in online decision making with high-dimensional covariates. arXiv preprint arXiv:2411.06329, 2024. doi:10.48550/arXiv.2411.06329

work page doi:10.48550/arxiv.2411.06329 2024
[5]

Multi-armed bandit experimental design: Online decision-making and adaptive inference

David Simchi-Levi and Chonghuan Wang. Multi-armed bandit experimental design: Online decision-making and adaptive inference. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 3086--3097. PMLR, 2023

2023
[6]

Designing service systems from textual evidence

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, and David Simchi-Levi. Designing service systems from textual evidence. arXiv preprint arXiv:2603.10400, 2026 a . doi:10.48550/arXiv.2603.10400

work page doi:10.48550/arxiv.2603.10400 2026
[7]

Bandits with knapsacks

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. Journal of the ACM, 65 0 (3): 0 13:1--13:55, 2018. doi:10.1145/3164539

work page doi:10.1145/3164539 2018
[8]

Devanur, and Lihong Li

Shipra Agrawal, Nikhil R. Devanur, and Lihong Li. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Proceedings of the 29th Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 4--18. PMLR, 2016

2016
[9]

Shipra Agrawal and Nikhil R. Devanur. Linear contextual bandits with knapsacks. In Advances in Neural Information Processing Systems, volume 29, pages 3450--3458, 2016

2016
[10]

Reoptimization and self-adjusting price control for network revenue management

Stefanus Jasin. Reoptimization and self-adjusting price control for network revenue management. Operations Research, 62 0 (5): 0 1168--1178, 2014. doi:10.1287/opre.2014.1297

work page doi:10.1287/opre.2014.1297 2014
[11]

Management Science 66(7):2993--3009, ISSN 0025-1909, 1526-5501, ://dx.doi.org/10.1287/mnsc.2019.3365

Pornpawee Bumpensanti and He Wang. A re-solving heuristic with uniformly bounded loss for network revenue management. Management Science, 66 0 (7): 0 2993--3009, 2020. doi:10.1287/mnsc.2019.3365

work page doi:10.1287/mnsc.2019.3365 2020
[12]

Constant regret resolving heuristics for price-based revenue management

Yining Wang and He Wang. Constant regret resolving heuristics for price-based revenue management. Operations Research, 70 0 (6): 0 3538--3557, 2022. doi:10.1287/opre.2021.2219

work page doi:10.1287/opre.2021.2219 2022
[13]

Learning to price with resource constraints: From full information to machine-learned prices

Ruicheng Ao, Jiashuo Jiang, and David Simchi-Levi. Learning to price with resource constraints: From full information to machine-learned prices. arXiv preprint arXiv:2501.14155, 2025 a . doi:10.48550/arXiv.2501.14155

work page doi:10.48550/arxiv.2501.14155 2025
[14]

On the reliability limits of LLM -based multi-agent planning

Ruicheng Ao, Siyang Gao, and David Simchi-Levi. On the reliability limits of LLM -based multi-agent planning. arXiv preprint arXiv:2603.26993, 2026 b . doi:10.48550/arXiv.2603.26993

work page doi:10.48550/arxiv.2603.26993 2026
[15]

Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies

N Bora Keskin and Assaf Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research, 62 0 (5): 0 1142--1167, 2014

2014
[16]

A linear bandit algorithm for general sequential choice problems

Alexander Goldenshluger and Assaf Zeevi. A linear bandit algorithm for general sequential choice problems. Operations Research, 62 0 (3): 0 633--650, 2013

2013
[17]

Mostly exploration-free algorithms for contextual bandits

Hamsa Bastani, Mohsen Bayati, and Khashayar Khosravi. Mostly exploration-free algorithms for contextual bandits. Management Science, 67 0 (3): 0 1329--1348, 2021

2021
[18]

Personalized pricing based on customers' personal characteristics

Gal-Yi Ban and N Bora Keskin. Personalized pricing based on customers' personal characteristics. Management Science, 2021

2021
[19]

Analytics for an online retailer: Demand forecasting and price optimization

Kris Johnson Ferreira, Bin Hong Alex Lee, and David Simchi-Levi. Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management, 18 0 (1): 0 69--88, 2016

2016
[20]

Dynamic pricing with data-driven demand learning: The price of misspecification

Zhiyuan Ren and Benjamin Van Roy. Dynamic pricing with data-driven demand learning: The price of misspecification. Management Science, 2024

2024
[21]

Constrained episodic reinforcement learning in concave-convex and knapsack settings

Kiant \'e Brantley, Miroslav Dud \'i k, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, and Wen Sun. Constrained episodic reinforcement learning in concave-convex and knapsack settings. In Advances in Neural Information Processing Systems, volume 33, pages 16315--16326, 2020

2020
[22]

Contextual Decision-Making with Knapsacks Beyond the Worst Case

Zhaohua Chen, Rui Ai, Mingwei Yang, Yuqi Pan, Chang Wang, and Xiaotie Deng. Contextual Decision-Making with Knapsacks Beyond the Worst Case . In Advances in Neural Information Processing Systems, volume 37, pages 88147--88193, 2024. doi:10.52202/079017-2798

work page doi:10.52202/079017-2798 2024
[23]

Online resource allocation with average budget constraints

Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online resource allocation with average budget constraints. arXiv preprint arXiv:2402.11425, 2024 a . doi:10.48550/arXiv.2402.11425

work page doi:10.48550/arxiv.2402.11425 2024
[24]

Two-stage online reusable resource allocation: Reservation, overbooking and confirmation call

Ruicheng Ao, Hengyu Fu, and David Simchi-Levi. Two-stage online reusable resource allocation: Reservation, overbooking and confirmation call. arXiv preprint arXiv:2410.15245, 2024 b

arXiv 2024
[25]

Conservative bandits

Yifan Wu, Roshan Shariff, Tor Lattimore, and Csaba Szepesv \'a ri. Conservative bandits. In Proceedings of the 33rd International Conference on Machine Learning, 2016

2016
[26]

Optimal dynamic pricing of inventories with stochastic demand over finite horizons

Guillermo Gallego and Garrett Van Ryzin. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science, 40 0 (8): 0 999--1020, 1994. doi:10.1287/mnsc.40.8.999

work page doi:10.1287/mnsc.40.8.999 1994
[27]

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

Ruicheng Ao, Gan Luo, David Simchi-Levi, and Xinshang Wang. Optimizing LLM inference: Fluid-guided online scheduling with memory constraints. arXiv preprint arXiv:2504.11320, 2025 b . doi:10.48550/arXiv.2504.11320

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.11320 2025
[28]

Online linear programming: Dual convergence, new algorithms, and regret bounds

Xiaocheng Li and Yinyu Ye. Online linear programming: Dual convergence, new algorithms, and regret bounds. Operations Research, 70 0 (5): 0 2948--2966, 2022. doi:10.1287/opre.2021.2164

work page doi:10.1287/opre.2021.2164 2022
[29]

Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions

Jiashuo Jiang, Will Ma, and Jiawei Zhang. Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions . Operations Research, 73 0 (6): 0 3405--3420, 2025. doi:10.1287/opre.2022.0641

work page doi:10.1287/opre.2022.0641 2025
[30]

The value of information in resource-constrained pricing

Ruicheng Ao, Jiashuo Jiang, and David Simchi-Levi. The value of information in resource-constrained pricing. arXiv preprint arXiv:2603.24974, 2026 c . doi:10.48550/arXiv.2603.24974

work page doi:10.48550/arxiv.2603.24974 2026
[31]

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Ruicheng Ao, David Simchi-Levi, and Xinshang Wang. Solver-in-the-loop: MDP -based benchmarks for self-correction and behavioral rationality in operations research. arXiv preprint arXiv:2601.21008, 2026 d . doi:10.48550/arXiv.2601.21008

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.21008 2026
[32]

Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey

Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences, 118 0 (15): 0 e2014602118, 2021. doi:10.1073/pnas.2014602118

work page doi:10.1073/pnas.2014602118 2021
[33]

Wainwright

Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, and Martin J. Wainwright. Near-optimal inference in adaptive linear regression. Annals of Statistics, 53 0 (6): 0 2329--2355, 2025. doi:10.1214/24-AOS2450

work page doi:10.1214/24-aos2450 2025
[34]

Statistical inference for online decision making: In a contextual bandit setting

Yingkai Li and Zhenyu Zheng. Statistical inference for online decision making: In a contextual bandit setting. Journal of the American Statistical Association, 2021

2021
[35]

Prediction-guided active experiments

Ruicheng Ao, Hongyu Chen, and David Simchi-Levi. Prediction-guided active experiments. arXiv preprint arXiv:2411.12036, 2024 c . doi:10.48550/arXiv.2411.12036

work page doi:10.48550/arxiv.2411.12036 2024
[36]

Copenhaver

Ruicheng Ao, Jing Dong, Xiaoyan Anna Liu, and Martin S. Copenhaver. Proactive transfer admission control for emergency departments. Working paper, 2026 e

2026
[37]

Doubly robust policy evaluation and learning

Miroslav Dud \'i k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 1097--1104, 2011

2011
[38]

Policy evaluation and optimization with continuous treatments

Nathan Kallus and Angela Zhou. Policy evaluation and optimization with continuous treatments. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1243--1251, 2018

2018
[39]

High-dimensional contextual bandits with equality constraints

Yuhang Ma, Kuang Xu, and Chao Yang. High-dimensional contextual bandits with equality constraints. arXiv preprint, 2024

2024
[40]

The central role of the propensity score in observational studies for causal effects

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70 0 (1): 0 41--55, 1983

1983
[41]

Irregular identification, support conditions, and inverse weight estimation

Shakeeb Khan and Elie Tamer. Irregular identification, support conditions, and inverse weight estimation. Econometrica, 78 0 (6): 0 2021--2042, 2010. doi:10.3982/ECTA7372

work page doi:10.3982/ecta7372 2021
[42]

Dealing with limited overlap in estimation of average treatment effects

Richard K Crump, V Joseph Hotz, Guido W Imbens, and Oscar A Mitnik. Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96 0 (1): 0 187--199, 2009

2009
[43]

Overlap in observational studies with high-dimensional covariates

Alexander D'Amour, Peng Ding, Avi Feller, Lihua Lei, and Jasjeet Sekhon. Overlap in observational studies with high-dimensional covariates. Journal of Econometrics, 221 0 (2): 0 644--654, 2021

2021
[44]

Confidence intervals and hypothesis testing for high-dimensional regression

Adel Javanmard and Andrea Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15 0 (1): 0 2869--2909, 2014

2014
[45]

On asymptotically optimal confidence regions and tests for high-dimensional models

Sara Van de Geer, Peter B \"u hlmann, Ya'acov Ritov, and Ruben Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics, 42 0 (3): 0 1166--1202, 2014. doi:10.1214/14-AOS1221

work page doi:10.1214/14-aos1221 2014
[46]

Confidence intervals for low dimensional parameters in high dimensional linear models

Cun-Hui Zhang and Stephanie S Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76 0 (1): 0 217--242, 2014. doi:10.1111/rssb.12026

work page doi:10.1111/rssb.12026 2014
[47]

The Econometrics Journal , volume =

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 0 (1): 0 C1--C68, 2018. doi:10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018
[48]

Approximate residual balancing: debiased inference of average treatment effects in high dimensions

Susan Athey, Guido W Imbens, and Stefan Wager. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B, 80 0 (4): 0 597--623, 2018. doi:10.1111/rssb.12268

work page doi:10.1111/rssb.12268 2018
[49]

Asymptotic statistics

Aad W Van der Vaart. Asymptotic statistics. Cambridge University Press, 2000

2000
[50]

Semiparametric theory and missing data

Anastasios A Tsiatis. Semiparametric theory and missing data. Springer, 2006

2006
[51]

Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability

Jianqing Fan and Ir \`e ne Gijbels. Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1996

1996
[52]

Optimal aggregation of classifiers in statistical learning

Alexandre B Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32 0 (1): 0 135--166, 2004

2004
[53]

Variance reduction techniques for gradient estimates in reinforcement learning

Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5: 0 1471--1530, 2004

2004
[54]

High-dimensional continuous control using generalized advantage estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations, 2016. arXiv:1506.02438

Pith/arXiv arXiv 2016
[55]

PPI-SVRG : Unifying prediction-powered inference and variance reduction for semi-supervised optimization

Ruicheng Ao, Hongyu Chen, Haoyang Liu, David Simchi-Levi, and Will Wei Sun. PPI-SVRG : Unifying prediction-powered inference and variance reduction for semi-supervised optimization. arXiv preprint arXiv:2601.21470, 2026 f . doi:10.48550/arXiv.2601.21470

work page doi:10.48550/arxiv.2601.21470 2026
[56]

OptiRepair : Closed-loop diagnosis and repair of supply chain optimization models with LLM agents

Ruicheng Ao, David Simchi-Levi, and Xinshang Wang. OptiRepair : Closed-loop diagnosis and repair of supply chain optimization models with LLM agents. arXiv preprint arXiv:2602.19439, 2026 g . doi:10.48550/arXiv.2602.19439

work page doi:10.48550/arxiv.2602.19439 2026
[57]

Peter Hall and Christopher C. Heyde. Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, New York, 1980. ISBN 9780123193506

1980
[58]

ResiliBench : Evaluating agentic workflow adaptation in stochastic environments

Ruicheng Ao, Ziao Min, Tianyi Zhu, Wotao Yin, and Xinshang Wang. ResiliBench : Evaluating agentic workflow adaptation in stochastic environments. Working paper, 2026 h

2026

[1] [1]

Accurate inference for adaptive linear models

Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, and Matt Taddy. Accurate inference for adaptive linear models. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1194--1203. PMLR, 2018

2018

[2] [2]

Online multi-armed bandits with adaptive inference

Maria Dimakopoulou, Zhimei Ren, and Zhengyuan Zhou. Online multi-armed bandits with adaptive inference. In Advances in Neural Information Processing Systems, volume 34, pages 1939--1951, 2021

1939

[3] [3]

Inference for batched bandits

Kelly W Zhang, Lucas Janson, and Susan A Murphy. Inference for batched bandits. In Advances in Neural Information Processing Systems, volume 33, pages 9818--9829, 2020

2020

[4] [4]

Regret minimization and statistical inference in online decision making with high-dimensional covariates

Congyuan Duan, Wanteng Ma, Jiashuo Jiang, and Dong Xia. Regret minimization and statistical inference in online decision making with high-dimensional covariates. arXiv preprint arXiv:2411.06329, 2024. doi:10.48550/arXiv.2411.06329

work page doi:10.48550/arxiv.2411.06329 2024

[5] [5]

Multi-armed bandit experimental design: Online decision-making and adaptive inference

David Simchi-Levi and Chonghuan Wang. Multi-armed bandit experimental design: Online decision-making and adaptive inference. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 3086--3097. PMLR, 2023

2023

[6] [6]

Designing service systems from textual evidence

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, and David Simchi-Levi. Designing service systems from textual evidence. arXiv preprint arXiv:2603.10400, 2026 a . doi:10.48550/arXiv.2603.10400

work page doi:10.48550/arxiv.2603.10400 2026

[7] [7]

Bandits with knapsacks

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. Journal of the ACM, 65 0 (3): 0 13:1--13:55, 2018. doi:10.1145/3164539

work page doi:10.1145/3164539 2018

[8] [8]

Devanur, and Lihong Li

Shipra Agrawal, Nikhil R. Devanur, and Lihong Li. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Proceedings of the 29th Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 4--18. PMLR, 2016

2016

[9] [9]

Shipra Agrawal and Nikhil R. Devanur. Linear contextual bandits with knapsacks. In Advances in Neural Information Processing Systems, volume 29, pages 3450--3458, 2016

2016

[10] [10]

Reoptimization and self-adjusting price control for network revenue management

Stefanus Jasin. Reoptimization and self-adjusting price control for network revenue management. Operations Research, 62 0 (5): 0 1168--1178, 2014. doi:10.1287/opre.2014.1297

work page doi:10.1287/opre.2014.1297 2014

[11] [11]

Management Science 66(7):2993--3009, ISSN 0025-1909, 1526-5501, ://dx.doi.org/10.1287/mnsc.2019.3365

Pornpawee Bumpensanti and He Wang. A re-solving heuristic with uniformly bounded loss for network revenue management. Management Science, 66 0 (7): 0 2993--3009, 2020. doi:10.1287/mnsc.2019.3365

work page doi:10.1287/mnsc.2019.3365 2020

[12] [12]

Constant regret resolving heuristics for price-based revenue management

Yining Wang and He Wang. Constant regret resolving heuristics for price-based revenue management. Operations Research, 70 0 (6): 0 3538--3557, 2022. doi:10.1287/opre.2021.2219

work page doi:10.1287/opre.2021.2219 2022

[13] [13]

Learning to price with resource constraints: From full information to machine-learned prices

Ruicheng Ao, Jiashuo Jiang, and David Simchi-Levi. Learning to price with resource constraints: From full information to machine-learned prices. arXiv preprint arXiv:2501.14155, 2025 a . doi:10.48550/arXiv.2501.14155

work page doi:10.48550/arxiv.2501.14155 2025

[14] [14]

On the reliability limits of LLM -based multi-agent planning

Ruicheng Ao, Siyang Gao, and David Simchi-Levi. On the reliability limits of LLM -based multi-agent planning. arXiv preprint arXiv:2603.26993, 2026 b . doi:10.48550/arXiv.2603.26993

work page doi:10.48550/arxiv.2603.26993 2026

[15] [15]

Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies

N Bora Keskin and Assaf Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research, 62 0 (5): 0 1142--1167, 2014

2014

[16] [16]

A linear bandit algorithm for general sequential choice problems

Alexander Goldenshluger and Assaf Zeevi. A linear bandit algorithm for general sequential choice problems. Operations Research, 62 0 (3): 0 633--650, 2013

2013

[17] [17]

Mostly exploration-free algorithms for contextual bandits

Hamsa Bastani, Mohsen Bayati, and Khashayar Khosravi. Mostly exploration-free algorithms for contextual bandits. Management Science, 67 0 (3): 0 1329--1348, 2021

2021

[18] [18]

Personalized pricing based on customers' personal characteristics

Gal-Yi Ban and N Bora Keskin. Personalized pricing based on customers' personal characteristics. Management Science, 2021

2021

[19] [19]

Analytics for an online retailer: Demand forecasting and price optimization

Kris Johnson Ferreira, Bin Hong Alex Lee, and David Simchi-Levi. Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management, 18 0 (1): 0 69--88, 2016

2016

[20] [20]

Dynamic pricing with data-driven demand learning: The price of misspecification

Zhiyuan Ren and Benjamin Van Roy. Dynamic pricing with data-driven demand learning: The price of misspecification. Management Science, 2024

2024

[21] [21]

Constrained episodic reinforcement learning in concave-convex and knapsack settings

Kiant \'e Brantley, Miroslav Dud \'i k, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, and Wen Sun. Constrained episodic reinforcement learning in concave-convex and knapsack settings. In Advances in Neural Information Processing Systems, volume 33, pages 16315--16326, 2020

2020

[22] [22]

Contextual Decision-Making with Knapsacks Beyond the Worst Case

Zhaohua Chen, Rui Ai, Mingwei Yang, Yuqi Pan, Chang Wang, and Xiaotie Deng. Contextual Decision-Making with Knapsacks Beyond the Worst Case . In Advances in Neural Information Processing Systems, volume 37, pages 88147--88193, 2024. doi:10.52202/079017-2798

work page doi:10.52202/079017-2798 2024

[23] [23]

Online resource allocation with average budget constraints

Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online resource allocation with average budget constraints. arXiv preprint arXiv:2402.11425, 2024 a . doi:10.48550/arXiv.2402.11425

work page doi:10.48550/arxiv.2402.11425 2024

[24] [24]

Two-stage online reusable resource allocation: Reservation, overbooking and confirmation call

Ruicheng Ao, Hengyu Fu, and David Simchi-Levi. Two-stage online reusable resource allocation: Reservation, overbooking and confirmation call. arXiv preprint arXiv:2410.15245, 2024 b

arXiv 2024

[25] [25]

Conservative bandits

Yifan Wu, Roshan Shariff, Tor Lattimore, and Csaba Szepesv \'a ri. Conservative bandits. In Proceedings of the 33rd International Conference on Machine Learning, 2016

2016

[26] [26]

Optimal dynamic pricing of inventories with stochastic demand over finite horizons

Guillermo Gallego and Garrett Van Ryzin. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science, 40 0 (8): 0 999--1020, 1994. doi:10.1287/mnsc.40.8.999

work page doi:10.1287/mnsc.40.8.999 1994

[27] [27]

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

Ruicheng Ao, Gan Luo, David Simchi-Levi, and Xinshang Wang. Optimizing LLM inference: Fluid-guided online scheduling with memory constraints. arXiv preprint arXiv:2504.11320, 2025 b . doi:10.48550/arXiv.2504.11320

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.11320 2025

[28] [28]

Online linear programming: Dual convergence, new algorithms, and regret bounds

Xiaocheng Li and Yinyu Ye. Online linear programming: Dual convergence, new algorithms, and regret bounds. Operations Research, 70 0 (5): 0 2948--2966, 2022. doi:10.1287/opre.2021.2164

work page doi:10.1287/opre.2021.2164 2022

[29] [29]

Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions

Jiashuo Jiang, Will Ma, and Jiawei Zhang. Degeneracy Is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions . Operations Research, 73 0 (6): 0 3405--3420, 2025. doi:10.1287/opre.2022.0641

work page doi:10.1287/opre.2022.0641 2025

[30] [30]

The value of information in resource-constrained pricing

Ruicheng Ao, Jiashuo Jiang, and David Simchi-Levi. The value of information in resource-constrained pricing. arXiv preprint arXiv:2603.24974, 2026 c . doi:10.48550/arXiv.2603.24974

work page doi:10.48550/arxiv.2603.24974 2026

[31] [31]

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Ruicheng Ao, David Simchi-Levi, and Xinshang Wang. Solver-in-the-loop: MDP -based benchmarks for self-correction and behavioral rationality in operations research. arXiv preprint arXiv:2601.21008, 2026 d . doi:10.48550/arXiv.2601.21008

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.21008 2026

[32] [32]

Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey

Vitor Hadad, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences, 118 0 (15): 0 e2014602118, 2021. doi:10.1073/pnas.2014602118

work page doi:10.1073/pnas.2014602118 2021

[33] [33]

Wainwright

Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, and Martin J. Wainwright. Near-optimal inference in adaptive linear regression. Annals of Statistics, 53 0 (6): 0 2329--2355, 2025. doi:10.1214/24-AOS2450

work page doi:10.1214/24-aos2450 2025

[34] [34]

Statistical inference for online decision making: In a contextual bandit setting

Yingkai Li and Zhenyu Zheng. Statistical inference for online decision making: In a contextual bandit setting. Journal of the American Statistical Association, 2021

2021

[35] [35]

Prediction-guided active experiments

Ruicheng Ao, Hongyu Chen, and David Simchi-Levi. Prediction-guided active experiments. arXiv preprint arXiv:2411.12036, 2024 c . doi:10.48550/arXiv.2411.12036

work page doi:10.48550/arxiv.2411.12036 2024

[36] [36]

Copenhaver

Ruicheng Ao, Jing Dong, Xiaoyan Anna Liu, and Martin S. Copenhaver. Proactive transfer admission control for emergency departments. Working paper, 2026 e

2026

[37] [37]

Doubly robust policy evaluation and learning

Miroslav Dud \'i k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 1097--1104, 2011

2011

[38] [38]

Policy evaluation and optimization with continuous treatments

Nathan Kallus and Angela Zhou. Policy evaluation and optimization with continuous treatments. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1243--1251, 2018

2018

[39] [39]

High-dimensional contextual bandits with equality constraints

Yuhang Ma, Kuang Xu, and Chao Yang. High-dimensional contextual bandits with equality constraints. arXiv preprint, 2024

2024

[40] [40]

The central role of the propensity score in observational studies for causal effects

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70 0 (1): 0 41--55, 1983

1983

[41] [41]

Irregular identification, support conditions, and inverse weight estimation

Shakeeb Khan and Elie Tamer. Irregular identification, support conditions, and inverse weight estimation. Econometrica, 78 0 (6): 0 2021--2042, 2010. doi:10.3982/ECTA7372

work page doi:10.3982/ecta7372 2021

[42] [42]

Dealing with limited overlap in estimation of average treatment effects

Richard K Crump, V Joseph Hotz, Guido W Imbens, and Oscar A Mitnik. Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96 0 (1): 0 187--199, 2009

2009

[43] [43]

Overlap in observational studies with high-dimensional covariates

Alexander D'Amour, Peng Ding, Avi Feller, Lihua Lei, and Jasjeet Sekhon. Overlap in observational studies with high-dimensional covariates. Journal of Econometrics, 221 0 (2): 0 644--654, 2021

2021

[44] [44]

Confidence intervals and hypothesis testing for high-dimensional regression

Adel Javanmard and Andrea Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15 0 (1): 0 2869--2909, 2014

2014

[45] [45]

On asymptotically optimal confidence regions and tests for high-dimensional models

Sara Van de Geer, Peter B \"u hlmann, Ya'acov Ritov, and Ruben Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics, 42 0 (3): 0 1166--1202, 2014. doi:10.1214/14-AOS1221

work page doi:10.1214/14-aos1221 2014

[46] [46]

Confidence intervals for low dimensional parameters in high dimensional linear models

Cun-Hui Zhang and Stephanie S Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76 0 (1): 0 217--242, 2014. doi:10.1111/rssb.12026

work page doi:10.1111/rssb.12026 2014

[47] [47]

The Econometrics Journal , volume =

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 0 (1): 0 C1--C68, 2018. doi:10.1111/ectj.12097

work page doi:10.1111/ectj.12097 2018

[48] [48]

Approximate residual balancing: debiased inference of average treatment effects in high dimensions

Susan Athey, Guido W Imbens, and Stefan Wager. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B, 80 0 (4): 0 597--623, 2018. doi:10.1111/rssb.12268

work page doi:10.1111/rssb.12268 2018

[49] [49]

Asymptotic statistics

Aad W Van der Vaart. Asymptotic statistics. Cambridge University Press, 2000

2000

[50] [50]

Semiparametric theory and missing data

Anastasios A Tsiatis. Semiparametric theory and missing data. Springer, 2006

2006

[51] [51]

Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability

Jianqing Fan and Ir \`e ne Gijbels. Local Polynomial Modelling and Its Applications, volume 66 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1996

1996

[52] [52]

Optimal aggregation of classifiers in statistical learning

Alexandre B Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32 0 (1): 0 135--166, 2004

2004

[53] [53]

Variance reduction techniques for gradient estimates in reinforcement learning

Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5: 0 1471--1530, 2004

2004

[54] [54]

High-dimensional continuous control using generalized advantage estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations, 2016. arXiv:1506.02438

Pith/arXiv arXiv 2016

[55] [55]

PPI-SVRG : Unifying prediction-powered inference and variance reduction for semi-supervised optimization

Ruicheng Ao, Hongyu Chen, Haoyang Liu, David Simchi-Levi, and Will Wei Sun. PPI-SVRG : Unifying prediction-powered inference and variance reduction for semi-supervised optimization. arXiv preprint arXiv:2601.21470, 2026 f . doi:10.48550/arXiv.2601.21470

work page doi:10.48550/arxiv.2601.21470 2026

[56] [56]

OptiRepair : Closed-loop diagnosis and repair of supply chain optimization models with LLM agents

Ruicheng Ao, David Simchi-Levi, and Xinshang Wang. OptiRepair : Closed-loop diagnosis and repair of supply chain optimization models with LLM agents. arXiv preprint arXiv:2602.19439, 2026 g . doi:10.48550/arXiv.2602.19439

work page doi:10.48550/arxiv.2602.19439 2026

[57] [57]

Peter Hall and Christopher C. Heyde. Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, New York, 1980. ISBN 9780123193506

1980

[58] [58]

ResiliBench : Evaluating agentic workflow adaptation in stochastic environments

Ruicheng Ao, Ziao Min, Tianyi Zhu, Wotao Yin, and Xinshang Wang. ResiliBench : Evaluating agentic workflow adaptation in stochastic environments. Working paper, 2026 h

2026