arxiv: 2604.04383 · v1 · submitted 2026-04-06 · 💻 cs.AI · cs.MA· math.OC

Optimizing Service Operations via LLM-Powered Multi-Agent Simulation

Yanyuan Wang , Xiaowei Zhang This is my paper

Pith reviewed 2026-05-10 19:40 UTC · model grok-4.3

classification 💻 cs.AI cs.MAmath.OC

keywords LLM multi-agent simulationservice operations optimizationstochastic optimizationzeroth-order gradientson-trajectory learningdecision-dependent uncertaintycontrolled Markov chainsupply chain and contest design

0 comments

The pith

An on-trajectory algorithm optimizes service designs by estimating gradients during one LLM multi-agent simulation run.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an LLM-powered multi-agent simulation framework that treats service design as stochastic optimization where choices embedded in prompts shape the distribution of agent outcomes. It models the resulting uncertainty as a controlled Markov chain by parsing numerical information from LLM-generated text. The central contribution is an on-trajectory learning algorithm that constructs zeroth-order gradient estimates and updates the design parameters simultaneously within a single simulation run, aided by variance reduction. This setup targets steady-state performance improvements in service systems. Applications to supply chain sustainability and contest design demonstrate gains over black-box optimization and other LLM-based approaches.

Core claim

We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance in an LLM-powered multi-agent simulation of service operations posed as stochastic optimization with decision-dependent uncertainty.

What carries the argument

On-trajectory learning algorithm that builds zeroth-order gradient estimates while updating parameters inside a controlled Markov chain representation of LLM agent interactions.

If this is right

The approach outperforms black-box optimization, LLMs used as numerical solvers, and LLMs used as role-playing designers in a sustainable supply chain case.
It functions as a cost-effective evaluator of known designs when compared against real behavioral data in contest design.
It identifies strong designs that traditional methods miss in the contest design case study.
Variance reduction techniques are incorporated to stabilize the gradient estimates obtained during the run.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Service designers could test far more policy variations at low cost before committing to real-world pilots.
The method reframes prompt engineering as an optimizable system design task rather than manual iteration.
Similar single-run gradient learning could transfer to simulation-based policy tuning in domains like healthcare scheduling or education interventions.

Load-bearing premise

LLM-generated text can be parsed reliably for numerical outcomes and the simulated agent behaviors approximate how real people respond to design choices.

What would settle it

Applying the designs found by the single-run optimization to a real service system and observing no improvement in measured steady-state performance relative to benchmarks or current practice.

read the original abstract

Service system performance depends on how participants respond to design choices, but modeling these responses is hard due to the complexity of human behavior. We introduce an LLM-powered multi-agent simulation (LLM-MAS) framework for optimizing service operations. We pose the problem as stochastic optimization with decision-dependent uncertainty: design choices are embedded in prompts and shape the distribution of outcomes from interacting LLM-powered agents. By embedding key numerical information in prompts and extracting it from LLM-generated text, we model this uncertainty as a controlled Markov chain. We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance. We also incorporate variance reduction techniques. In a sustainable supply chain application, our method outperforms benchmarks, including blackbox optimization and using LLMs as numerical solvers or as role-playing system designers. A case study on optimal contest design with real behavioral data shows that LLM-MAS is both as a cost-effective evaluator of known designs and an exploratory tool that can uncover strong designs overlooked by traditional approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames LLM multi-agent sims as a controlled Markov chain and optimizes via single-trajectory zeroth-order updates, which is a novel combination, but the whole thing rests on unproven parsing of numbers from LLM text.

read the letter

The paper introduces an LLM-powered multi-agent simulation to handle decision-dependent uncertainty in service operations. Design choices go into prompts, numerical outcomes get extracted to form a controlled Markov chain, and an on-trajectory algorithm builds zeroth-order gradient estimates while updating parameters in one run, plus some variance reduction tricks. The supply-chain example and contest-design case with real data are presented as outperforming black-box methods and direct LLM use as solvers or designers.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an LLM-powered multi-agent simulation (LLM-MAS) framework to address the challenge of modeling complex human responses in service systems for optimization purposes. Design choices are incorporated into prompts that influence the behavior of interacting LLM agents, framing the problem as stochastic optimization with decision-dependent uncertainty. The core technical contribution is an on-trajectory learning algorithm that, within a single simulation trajectory, simultaneously generates zeroth-order gradient estimates and performs parameter updates to optimize steady-state performance, augmented by variance reduction techniques. Empirical validation includes a sustainable supply chain application demonstrating outperformance over black-box optimization and LLM-based alternatives, as well as a contest design case study that positions LLM-MAS as both an evaluator and an exploratory tool compared to traditional methods using real behavioral data.

Significance. If the results hold, particularly the reliability of extracting numerical outcomes from LLM text and the validity of the gradient estimates, the work has substantial significance for the field of service operations management and AI-assisted simulation. It provides a novel approach to handling decision-dependent uncertainty without requiring extensive real-world experimentation or simplified behavioral models. The on-trajectory nature allows efficient optimization in a single run, which is a strength. The case studies suggest practical utility in supply chain sustainability and contest design, potentially reducing costs and enabling discovery of overlooked designs. However, the significance is tempered by the need to address the robustness of LLM parsing and approximation to human behavior.

major comments (3)

[§3] §3 (On-trajectory learning algorithm): The construction of zeroth-order gradient estimates from the same trajectory used for updates introduces potential circularity, especially since variance-reduction techniques and prompt-extraction rules may have been tuned on these runs. The manuscript does not explicitly demonstrate separation between data used for fitting and evaluation, which is critical for validating the unbiasedness of the estimates.
[§5] §5 (Sustainable supply chain application): The abstract claims outperformance but the provided details lack quantitative results, error bars, ablation studies on the variance reduction or parsing components, and discussion of LLM hallucination effects. This makes it difficult to assess the robustness of the claimed superiority over benchmarks.
[§4] §4 (Modeling as controlled Markov chain): The assumption that LLM-generated text can be reliably parsed for numerical outcomes is load-bearing for the gradient estimation. No general bounds on extraction error or proof of consistency for the resulting estimators are provided, and errors would compound in the on-trajectory setting rather than averaging out.

minor comments (2)

[§2] The notation used for the decision-dependent uncertainty could be more explicitly defined with an equation in the problem formulation section.
[§6] Figure captions in the case study section should include more details on the experimental setup to improve clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate. We have aimed to strengthen the presentation of the on-trajectory algorithm, empirical results, and modeling assumptions without overstating theoretical guarantees.

read point-by-point responses

Referee: [§3] §3 (On-trajectory learning algorithm): The construction of zeroth-order gradient estimates from the same trajectory used for updates introduces potential circularity, especially since variance-reduction techniques and prompt-extraction rules may have been tuned on these runs. The manuscript does not explicitly demonstrate separation between data used for fitting and evaluation, which is critical for validating the unbiasedness of the estimates.

Authors: We appreciate the referee's concern about potential circularity. The on-trajectory algorithm generates zeroth-order gradient estimates via randomized perturbations applied to the current parameter within the ongoing Markov chain trajectory, following the structure of simultaneous perturbation stochastic approximation; the update uses the estimate but the perturbation distribution is independent of prior updates. Variance-reduction techniques and extraction rules were developed on separate preliminary simulation runs not included in the reported optimization trajectories. In the revision we will add an explicit statement of this data separation, a short proof sketch of unbiasedness under the controlled Markov chain model, and pseudocode clarifying the timing of estimation versus update. revision: partial
Referee: [§5] §5 (Sustainable supply chain application): The abstract claims outperformance but the provided details lack quantitative results, error bars, ablation studies on the variance reduction or parsing components, and discussion of LLM hallucination effects. This makes it difficult to assess the robustness of the claimed superiority over benchmarks.

Authors: We agree that the current results section is insufficiently detailed. The revised manuscript will include tables with mean performance values and standard errors computed over 10 independent runs, ablation experiments that isolate the variance-reduction and parsing modules, and a new subsection on hallucination mitigation (including prompt constraints and post-processing rules). These additions will allow direct quantitative comparison with black-box optimization and the LLM-based baselines. revision: yes
Referee: [§4] §4 (Modeling as controlled Markov chain): The assumption that LLM-generated text can be reliably parsed for numerical outcomes is load-bearing for the gradient estimation. No general bounds on extraction error or proof of consistency for the resulting estimators are provided, and errors would compound in the on-trajectory setting rather than averaging out.

Authors: The referee correctly identifies that reliable numerical extraction is a foundational assumption. While the two case studies provide empirical evidence of consistent extraction, we do not possess general bounds on LLM parsing error because the underlying models are proprietary. In the revision we will expand Section 4 with a discussion of how extraction errors propagate in the on-trajectory setting and add a limitations paragraph acknowledging the absence of consistency proofs. We will also report extraction accuracy statistics from the experiments. revision: partial

standing simulated objections not resolved

General theoretical bounds on LLM text extraction error and formal proof of estimator consistency under extraction noise

Circularity Check

0 steps flagged

No significant circularity in the claimed derivation chain.

full rationale

The paper introduces an LLM-MAS framework and an on-trajectory learning algorithm for simultaneous zeroth-order gradient estimation and parameter updates within a single simulation trajectory. This is presented as a standard stochastic optimization technique applied to a controlled Markov chain model of LLM agent interactions, with variance reduction incorporated. No equations or steps are shown to reduce by construction to fitted inputs, self-definitions, or self-citation chains; the algorithm's validity rests on external assumptions about LLM parsing reliability and behavioral approximation rather than internal equivalence. The applications (supply chain and contest design) serve as empirical validation outside the derivation itself. The central claim remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that LLMs can serve as proxies for human decision-making when design parameters are injected into prompts and numerical results are extracted from generated text; no independent validation of this proxy quality is described in the abstract.

axioms (1)

domain assumption LLM outputs can be treated as samples from a decision-dependent distribution that is stable enough to form a controlled Markov chain
Invoked when the authors state they model uncertainty as a controlled Markov chain by embedding and extracting numerical information.

invented entities (1)

LLM-MAS framework no independent evidence
purpose: To turn prompt-driven LLM interactions into an optimizable stochastic process for service operations
New named framework introduced to combine multi-agent LLM simulation with stochastic optimization.

pith-pipeline@v0.9.0 · 5483 in / 1400 out tokens · 51421 ms · 2026-05-10T19:40:49.204583+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We pose the problem as stochastic optimization with decision-dependent uncertainty... modeled as a controlled Markov chain.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution isbn issn journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...

work page
[3]

Manufacturing & Service Operations Management 25(4):1376--1393

Allon G, Cohen MC, Sinchaisri WP (2023) The impact of behavioral and economic drivers on gig economy workers. Manufacturing & Service Operations Management 25(4):1376--1393

work page 2023
[4]

https: //arxiv.org/abs/2508.02630 (2025)

Allouah A, Besbes O, Figueroa JD, Kanoria Y, Kumar A (2025) What is your AI agent buying? Evaluation , implications and emerging questions for agentic e-commerce. https://arxiv.org/abs/2508.02630

work page arXiv 2025
[5]

Proceedings of the 42nd International Conference on Machine Learning, 81005--81034

Anthis JR, Liu R, Richardson SM, Kozlowski AC, Koch B, Brynjolfsson E, Evans J, Bernstein MS (2025) Position: LLM social simulations are a promising research method. Proceedings of the 42nd International Conference on Machine Learning, 81005--81034

work page 2025
[6]

NeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making, ://openreview.net/forum?id=LFlsYLfkM8

Ao R, Luo G, Simchi-Levi D, Wang X (2025) Optimizing LLM inference: Fluid-based online scheduling under memory constraints. NeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making, ://openreview.net/forum?id=LFlsYLfkM8

work page 2025
[7]

Political Analysis 31(3):337--351

Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of one, many: Using language models to simulate human samples. Political Analysis 31(3):337--351

work page 2023
[8]

Asmussen S, Glynn PW (2007) Stochastic Simulation: Algorithm and Analysis (Springer)

work page 2007
[9]

Journal of Economic Literature 63(1):197--287

Axtell RL, Farmer JD (2025) Agent-based modeling in economics and finance: Past, present, and future. Journal of Economic Literature 63(1):197--287

work page 2025
[10]

Foundations of Computational Mathematics 22(1):35--76

Balasubramanian K, Ghadimi S (2022) Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Foundations of Computational Mathematics 22(1):35--76

work page 2022
[11]

Foundations and Trends in Technology, Information and Operations Management 18(3-4):214--420

Bandi N, Cohen MC, Ray S (2024) Behavioral retail operations: Tactics to win customers. Foundations and Trends in Technology, Information and Operations Management 18(3-4):214--420

work page 2024
[12]

Robust and adaptive optimization under a large language model lens.arXiv preprint arXiv:2501.00568, 2024

Bertsimas D, Margaritis G (2024) Robust and adaptive optimization under a large language model lens. https://arxiv.org/abs/2501.00568

work page arXiv 2024
[13]

consumer subsidy with green technology investment and environmental concern

Bian J, Zhang G, Zhou G (2020) Manufacturer vs. consumer subsidy with green technology investment and environmental concern. European Journal of Operational Research 287(3):832--843

work page 2020
[14]

Operations Research, forthcoming

Che E, Dong J, Tong XT (2026) Stochastic gradient descent with adaptive data. Operations Research, forthcoming

work page 2026
[15]

Chen Y, Kirshner SN, Ovchinnikov A, Andiappan M, Jenkin T (2025) A manager and an AI walk into a bar: Does ChatGPT make biased decisions like we do? Manufacturing & Service Operations Management 27(2):354--368

work page 2025
[16]

Proceedings of the National Academy of Sciences 120(51):e2316205120

Chen Y, Liu TX, Shan Y, Zhong S (2023) The emergence of economic rationality of GPT . Proceedings of the National Academy of Sciences 120(51):e2316205120

work page 2023
[17]

Manufacturing & Service Operations Management 22(3):430--445

Chen YJ, Dai T, Korpeoglu CG, K \"o rpeo g lu E, Sahin O, Tang CS, Xiao S (2020) OM forum— I nnovative online platforms: Research opportunities. Manufacturing & Service Operations Management 22(3):430--445

work page 2020
[18]

IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27(2):244--249

Chin DC (1997) Comparative study of stochastic algorithms for system optimization based on gradient approximations. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27(2):244--249

work page 1997
[19]

(2026) AI in Supply Chains : Perspectives from Global Thought Leaders (Springer)

Cohen MC, Dai T, eds. (2026) AI in Supply Chains : Perspectives from Global Thought Leaders (Springer)

work page 2026
[20]

Manufacturing & Service Operations Management, forthcoming

Cohen MC, Dai T, Perakis G, Agrawal N, Allon G, Boute RN, Cachon GP, Chen Z, Cohen M, Cristian R, Deshpande V, de V\' e ricourt F, Fransoo JC, Gijsbrechts J, Harsha P, Hu M, Keskinocak P, Kwon C, Lee H, Liu S, Mellou K, Menache I, Miller J, Netessine S, Olsen TL, Pathuri J, Peels R, Qi Y, Raman A, Robinson A, Shen ZJM, Shunko M, Simchi-Levi D, Smalley H, ...

work page 2026
[21]

Management Science 62(5):1235--1258

Cohen MC, Lobel R, Perakis G (2016) The impact of demand uncertainty on consumer subsidies for green technology adoption. Management Science 62(5):1235--1258

work page 2016
[22]

Production and Operations Management, forthcoming

Dai T, Swaminathan JM (2026) Artificial intelligence and operations: A foundational framework of emerging research and practice. Production and Operations Management, forthcoming

work page 2026
[23]

Advances in Neural Information Processing Systems 35, 16344--16359

Dao T, Fu D, Ermon S, Rudra A, R\' e C (2022) FlashAttention : F ast and memory-efficient exact attention with IO -awareness. Advances in Neural Information Processing Systems 35, 16344--16359

work page 2022
[24]

Donohue K, Katok E, Leider S, eds., The Handbook of Behavioral Operations, 149--198 (John Wiley & Sons)

Davis AM (2018) Biases in individual decision-making. Donohue K, Katok E, Leider S, eds., The Handbook of Behavioral Operations, 149--198 (John Wiley & Sons)

work page 2018
[25]

Nature Reviews Psychology 2(11):688--701

Demszky D, Yang D, Yeager DS, Bryan CJ, Clapper M, Chandhok S, Eichstaedt JC, Hecht C, Jamieson J, Johnson M, Jones M, Krettek-Cobb D, Lai L, JonesMitchell N, Ong DC, Dweck CS, Gross JJ, Pennebaker JW (2023) Using large language models in psychology. Nature Reviews Psychology 2(11):688--701

work page 2023
[26]

Advances in Neural Information Processing Systems 36, 10088--10115

Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2023) QLoRA : efficient finetuning of quantized LLMs . Advances in Neural Information Processing Systems 36, 10088--10115

work page 2023
[27]

(2018) The Handbook of Behavioral Operations (John Wiley & Sons)

Donohue K, Katok E, Leider S, eds. (2018) The Handbook of Behavioral Operations (John Wiley & Sons)

work page 2018
[28]

Manufacturing & Service Operations Management 22(1):191--202

Donohue K, \" O zer O, Zheng Y (2020) Behavioral operations: Past, present, and future. Manufacturing & Service Operations Management 22(1):191--202

work page 2020
[29]

Mathematics of Operations Research 48(2):954--998

Drusvyatskiy D, Xiao L (2023) Stochastic optimization with decision-dependent distributions. Mathematics of Operations Research 48(2):954--998

work page 2023
[30]

Transactions on Machine Learning Research

Feng X, Dou L, Li M, Wang Q, Guo Y, Wang H, Ma C, Kong L (2025) A survey on large language model-based social agents in game-theoretic scenarios. Transactions on Machine Learning Research

work page 2025
[31]

Gel E, Ntaimo L, eds., Recent Advances in Optimization and Modeling of Contemporary Problems, 255--278, INFORMS TutORials in Operations Research (INFORMS)

Frazier PI (2018) Bayesian optimization. Gel E, Ntaimo L, eds., Recent Advances in Optimization and Modeling of Contemporary Problems, 255--278, INFORMS TutORials in Operations Research (INFORMS)

work page 2018
[32]

Journal of Economic Perspectives 19(4):25--42

Frederick S (2005) Cognitive reflection and decision making. Journal of Economic Perspectives 19(4):25--42

work page 2005
[33]

Humanities and Social Sciences Communications 11(1):1259

Gao C, Lan X, Li N, Yuan Y, Ding J, Zhou Z, Xu F, Li Y (2024) Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanities and Social Sciences Communications 11(1):1259

work page 2024
[34]

SIAM Journal on Optimization 23(4):2341--2368

Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4):2341--2368

work page 2013
[35]

Goli A, Singh A (2024) Frontiers: Can large language models capture human preferences? Marketing Science 43(4):709--722

work page 2024
[36]

separate crowdsourcing contests

Hu M, Wang L (2021) Joint vs. separate crowdsourcing contests. Management Science 67(5):2711--2728

work page 2021
[37]

Operations Research 73(6):2986--3009

Huang C, Tang Z, Hu S, Jiang R, Zheng X, Ge D, Wang B, Wang Z (2025) ORLM : A customizable framework in training large models for automated optimization modeling. Operations Research 73(6):2986--3009

work page 2025
[38]

Management Science, forthcoming

Huang L, Zhang J, Zhang J (2026) Optimal contests with negative prizes: Theory and experiment. Management Science, forthcoming

work page 2026
[39]

Manufacturing & Service Operations Management 15(2):263--279

Huang T, Allon G, Bassamboo A (2013) Bounded rationality in service systems. Manufacturing & Service Operations Management 15(2):263--279

work page 2013
[40]

arXiv preprint arXiv:2502.07115 , year=

Jaillet P, Jiang J, Mellou K, Molinaro M, Podimata C, Zhou Z (2025) Online scheduling for LLM inference with KV cache constraints. https://arxiv.org/abs/2502.07115

work page arXiv 2025
[41]

SIAM Journal on Control and Optimization 42(4):1143--1166

Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4):1143--1166

work page 2003
[42]

Production and Operations Management 22(5):1035--1055

Krass D, Nedorezov T, Ovchinnikov A (2013) Environmental taxes and the choice of green technology. Production and Operations Management 22(5):1035--1055

work page 2013
[43]

Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer), 2nd edition

work page 2003
[44]

Foundations and Trends in Optimization 8(1-3):1--332

LA P, Bhatnagar S (2025) Gradient-based algorithms for zeroth-order optimization. Foundations and Trends in Optimization 8(1-3):1--332

work page 2025
[45]

Management Science 36(11):1364--1383

L'Ecuyer P (1990) A unified view of the IPA , SF , and LR gradient estimation techniques. Management Science 36(11):1364--1383

work page 1990
[46]

Management Science 40(11):1562--1578

L'Ecuyer P, Glynn PW (1994) Stochastic optimization by simulation: Convergence proofs for the GI/G/1 queue in steady-state. Management Science 40(11):1562--1578

work page 1994
[47]

Operations Research, forthcoming

Li X, Liang J, Chen X, Zhang Z (2026) Convergence and inference of stream stochastic gradient descent, with applications to queueing systems and inventory control. Operations Research, forthcoming

work page 2026
[48]

arXiv preprint arXiv:2504.07347 , year=

Li Y, Dai J, Peng T (2025) Throughput-optimal scheduling algorithms for LLM inference and AI agents. https://arxiv.org/abs/2504.07347

work page arXiv 2025
[49]

Journal of Economic Theory 175:291--317

Liu B, Lu J, Wang R, Zhang J (2018) Optimal prize allocation in contests: The role of negative prizes. Journal of Economic Theory 175:291--317

work page 2018
[50]

ACM Computing Surveys 55(9):Article No

Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):Article No. 195

work page 2023
[51]

International Conference on Machine Learning, 4264--4273

Maheswaranathan N, Metz L, Tucker G, Choi D, Sohl-Dickstein J (2019) Guided evolutionary strategies: Augmenting random search with surrogate gradients. International Conference on Machine Learning, 4264--4273

work page 2019
[52]

Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability (Springer), 2nd edition

work page 2012
[53]

Stochastic Systems 15(3):195--219

Mitzenmacher M, Shahout R (2025) Queueing, predictions, and large language models: Challenges and open problems. Stochastic Systems 15(3):195--219

work page 2025
[54]

Foundations of Computational Mathematics 17(2):527--566

Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations of Computational Mathematics 17(2):527--566

work page 2017
[55]

Management Science 72(2):1095--1111

Nittala L, Erat S (2026) Designing knowledge-driven innovation contests. Management Science 72(2):1095--1111

work page 2026
[56]

Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, Article No.: 74

Park JS, Popowski L, Cai C, Morris MR, Liang P, Bernstein MS (2022) Social simulacra: Creating populated prototypes for social computing systems. Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, Article No.: 74

work page 2022
[57]

Proceedings of the 39th International Conference on Machine Learning, 18332--18346

Rajbhandari S, Li C, Yao Z, Zhang M, Aminabadi RY, Awan AA, Rasley J, He Y (2022) D eep S peed- M o E : Advancing mixture-of-experts inference and training to power next-generation AI scale. Proceedings of the 39th International Conference on Machine Learning, 18332--18346

work page 2022
[58]

Raschka S (2024) Build A Large Language Model (From Scratch) (Manning)

work page 2024
[59]

https://arxiv.org/abs/2510.26494

Shirani S, Bayati M (2025) Simulating and experimenting with social media mobilization using LLM agents. https://arxiv.org/abs/2510.26494

work page arXiv 2025
[60]

AI in Supply Chains: Perspectives from Global Thought Leaders, 93--104 (Springer)

Simchi-Levi D, Mellou K, Menache I, Pathuri J (2026) Large language models for supply chain decisions. AI in Supply Chains: Perspectives from Global Thought Leaders, 93--104 (Springer)

work page 2026
[61]

(2022) Chain-of-thought prompting elicits reasoning in large language models

Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35, 24824--24837

work page 2022
[62]

Operations Research 70(6):3519--3537

Zhang Q, Hu J (2022) Actor-critic--like stochastic adaptive search for continuous simulation optimization. Operations Research 70(6):3519--3537

work page 2022
[63]

Automatica 136:110006

Zhang Y, Zhou Y, Ji K, Zavlanos MM (2022) A new one-point residual-feedback oracle for black-box learning and control. Automatica 136:110006

work page 2022
[64]

arXiv preprint arXiv:2507.11737 , year=

Zhou C, Yang J, Xin L, Chen Y, He Z, Ge D (2025) Auto-formulating dynamic programming problems with large language models. https://arxiv.org/abs/2507.11737

work page arXiv 2025