pith. machine review for the scientific record. sign in

arxiv: 2604.04383 · v1 · submitted 2026-04-06 · 💻 cs.AI · cs.MA· math.OC

Optimizing Service Operations via LLM-Powered Multi-Agent Simulation

Pith reviewed 2026-05-10 19:40 UTC · model grok-4.3

classification 💻 cs.AI cs.MAmath.OC
keywords LLM multi-agent simulationservice operations optimizationstochastic optimizationzeroth-order gradientson-trajectory learningdecision-dependent uncertaintycontrolled Markov chainsupply chain and contest design
0
0 comments X

The pith

An on-trajectory algorithm optimizes service designs by estimating gradients during one LLM multi-agent simulation run.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an LLM-powered multi-agent simulation framework that treats service design as stochastic optimization where choices embedded in prompts shape the distribution of agent outcomes. It models the resulting uncertainty as a controlled Markov chain by parsing numerical information from LLM-generated text. The central contribution is an on-trajectory learning algorithm that constructs zeroth-order gradient estimates and updates the design parameters simultaneously within a single simulation run, aided by variance reduction. This setup targets steady-state performance improvements in service systems. Applications to supply chain sustainability and contest design demonstrate gains over black-box optimization and other LLM-based approaches.

Core claim

We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance in an LLM-powered multi-agent simulation of service operations posed as stochastic optimization with decision-dependent uncertainty.

What carries the argument

On-trajectory learning algorithm that builds zeroth-order gradient estimates while updating parameters inside a controlled Markov chain representation of LLM agent interactions.

If this is right

  • The approach outperforms black-box optimization, LLMs used as numerical solvers, and LLMs used as role-playing designers in a sustainable supply chain case.
  • It functions as a cost-effective evaluator of known designs when compared against real behavioral data in contest design.
  • It identifies strong designs that traditional methods miss in the contest design case study.
  • Variance reduction techniques are incorporated to stabilize the gradient estimates obtained during the run.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Service designers could test far more policy variations at low cost before committing to real-world pilots.
  • The method reframes prompt engineering as an optimizable system design task rather than manual iteration.
  • Similar single-run gradient learning could transfer to simulation-based policy tuning in domains like healthcare scheduling or education interventions.

Load-bearing premise

LLM-generated text can be parsed reliably for numerical outcomes and the simulated agent behaviors approximate how real people respond to design choices.

What would settle it

Applying the designs found by the single-run optimization to a real service system and observing no improvement in measured steady-state performance relative to benchmarks or current practice.

read the original abstract

Service system performance depends on how participants respond to design choices, but modeling these responses is hard due to the complexity of human behavior. We introduce an LLM-powered multi-agent simulation (LLM-MAS) framework for optimizing service operations. We pose the problem as stochastic optimization with decision-dependent uncertainty: design choices are embedded in prompts and shape the distribution of outcomes from interacting LLM-powered agents. By embedding key numerical information in prompts and extracting it from LLM-generated text, we model this uncertainty as a controlled Markov chain. We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance. We also incorporate variance reduction techniques. In a sustainable supply chain application, our method outperforms benchmarks, including blackbox optimization and using LLMs as numerical solvers or as role-playing system designers. A case study on optimal contest design with real behavioral data shows that LLM-MAS is both as a cost-effective evaluator of known designs and an exploratory tool that can uncover strong designs overlooked by traditional approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an LLM-powered multi-agent simulation (LLM-MAS) framework to address the challenge of modeling complex human responses in service systems for optimization purposes. Design choices are incorporated into prompts that influence the behavior of interacting LLM agents, framing the problem as stochastic optimization with decision-dependent uncertainty. The core technical contribution is an on-trajectory learning algorithm that, within a single simulation trajectory, simultaneously generates zeroth-order gradient estimates and performs parameter updates to optimize steady-state performance, augmented by variance reduction techniques. Empirical validation includes a sustainable supply chain application demonstrating outperformance over black-box optimization and LLM-based alternatives, as well as a contest design case study that positions LLM-MAS as both an evaluator and an exploratory tool compared to traditional methods using real behavioral data.

Significance. If the results hold, particularly the reliability of extracting numerical outcomes from LLM text and the validity of the gradient estimates, the work has substantial significance for the field of service operations management and AI-assisted simulation. It provides a novel approach to handling decision-dependent uncertainty without requiring extensive real-world experimentation or simplified behavioral models. The on-trajectory nature allows efficient optimization in a single run, which is a strength. The case studies suggest practical utility in supply chain sustainability and contest design, potentially reducing costs and enabling discovery of overlooked designs. However, the significance is tempered by the need to address the robustness of LLM parsing and approximation to human behavior.

major comments (3)
  1. [§3] §3 (On-trajectory learning algorithm): The construction of zeroth-order gradient estimates from the same trajectory used for updates introduces potential circularity, especially since variance-reduction techniques and prompt-extraction rules may have been tuned on these runs. The manuscript does not explicitly demonstrate separation between data used for fitting and evaluation, which is critical for validating the unbiasedness of the estimates.
  2. [§5] §5 (Sustainable supply chain application): The abstract claims outperformance but the provided details lack quantitative results, error bars, ablation studies on the variance reduction or parsing components, and discussion of LLM hallucination effects. This makes it difficult to assess the robustness of the claimed superiority over benchmarks.
  3. [§4] §4 (Modeling as controlled Markov chain): The assumption that LLM-generated text can be reliably parsed for numerical outcomes is load-bearing for the gradient estimation. No general bounds on extraction error or proof of consistency for the resulting estimators are provided, and errors would compound in the on-trajectory setting rather than averaging out.
minor comments (2)
  1. [§2] The notation used for the decision-dependent uncertainty could be more explicitly defined with an equation in the problem formulation section.
  2. [§6] Figure captions in the case study section should include more details on the experimental setup to improve clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate. We have aimed to strengthen the presentation of the on-trajectory algorithm, empirical results, and modeling assumptions without overstating theoretical guarantees.

read point-by-point responses
  1. Referee: [§3] §3 (On-trajectory learning algorithm): The construction of zeroth-order gradient estimates from the same trajectory used for updates introduces potential circularity, especially since variance-reduction techniques and prompt-extraction rules may have been tuned on these runs. The manuscript does not explicitly demonstrate separation between data used for fitting and evaluation, which is critical for validating the unbiasedness of the estimates.

    Authors: We appreciate the referee's concern about potential circularity. The on-trajectory algorithm generates zeroth-order gradient estimates via randomized perturbations applied to the current parameter within the ongoing Markov chain trajectory, following the structure of simultaneous perturbation stochastic approximation; the update uses the estimate but the perturbation distribution is independent of prior updates. Variance-reduction techniques and extraction rules were developed on separate preliminary simulation runs not included in the reported optimization trajectories. In the revision we will add an explicit statement of this data separation, a short proof sketch of unbiasedness under the controlled Markov chain model, and pseudocode clarifying the timing of estimation versus update. revision: partial

  2. Referee: [§5] §5 (Sustainable supply chain application): The abstract claims outperformance but the provided details lack quantitative results, error bars, ablation studies on the variance reduction or parsing components, and discussion of LLM hallucination effects. This makes it difficult to assess the robustness of the claimed superiority over benchmarks.

    Authors: We agree that the current results section is insufficiently detailed. The revised manuscript will include tables with mean performance values and standard errors computed over 10 independent runs, ablation experiments that isolate the variance-reduction and parsing modules, and a new subsection on hallucination mitigation (including prompt constraints and post-processing rules). These additions will allow direct quantitative comparison with black-box optimization and the LLM-based baselines. revision: yes

  3. Referee: [§4] §4 (Modeling as controlled Markov chain): The assumption that LLM-generated text can be reliably parsed for numerical outcomes is load-bearing for the gradient estimation. No general bounds on extraction error or proof of consistency for the resulting estimators are provided, and errors would compound in the on-trajectory setting rather than averaging out.

    Authors: The referee correctly identifies that reliable numerical extraction is a foundational assumption. While the two case studies provide empirical evidence of consistent extraction, we do not possess general bounds on LLM parsing error because the underlying models are proprietary. In the revision we will expand Section 4 with a discussion of how extraction errors propagate in the on-trajectory setting and add a limitations paragraph acknowledging the absence of consistency proofs. We will also report extraction accuracy statistics from the experiments. revision: partial

standing simulated objections not resolved
  • General theoretical bounds on LLM text extraction error and formal proof of estimator consistency under extraction noise

Circularity Check

0 steps flagged

No significant circularity in the claimed derivation chain.

full rationale

The paper introduces an LLM-MAS framework and an on-trajectory learning algorithm for simultaneous zeroth-order gradient estimation and parameter updates within a single simulation trajectory. This is presented as a standard stochastic optimization technique applied to a controlled Markov chain model of LLM agent interactions, with variance reduction incorporated. No equations or steps are shown to reduce by construction to fitted inputs, self-definitions, or self-citation chains; the algorithm's validity rests on external assumptions about LLM parsing reliability and behavioral approximation rather than internal equivalence. The applications (supply chain and contest design) serve as empirical validation outside the derivation itself. The central claim remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that LLMs can serve as proxies for human decision-making when design parameters are injected into prompts and numerical results are extracted from generated text; no independent validation of this proxy quality is described in the abstract.

axioms (1)
  • domain assumption LLM outputs can be treated as samples from a decision-dependent distribution that is stable enough to form a controlled Markov chain
    Invoked when the authors state they model uncertainty as a controlled Markov chain by embedding and extracting numerical information.
invented entities (1)
  • LLM-MAS framework no independent evidence
    purpose: To turn prompt-driven LLM interactions into an optimizable stochastic process for service operations
    New named framework introduced to combine multi-agent LLM simulation with stochastic optimization.

pith-pipeline@v0.9.0 · 5483 in / 1400 out tokens · 51421 ms · 2026-05-10T19:40:49.204583+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution isbn issn journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...

  3. [3]

    Manufacturing & Service Operations Management 25(4):1376--1393

    Allon G, Cohen MC, Sinchaisri WP (2023) The impact of behavioral and economic drivers on gig economy workers. Manufacturing & Service Operations Management 25(4):1376--1393

  4. [4]

    https: //arxiv.org/abs/2508.02630 (2025)

    Allouah A, Besbes O, Figueroa JD, Kanoria Y, Kumar A (2025) What is your AI agent buying? Evaluation , implications and emerging questions for agentic e-commerce. https://arxiv.org/abs/2508.02630

  5. [5]

    Proceedings of the 42nd International Conference on Machine Learning, 81005--81034

    Anthis JR, Liu R, Richardson SM, Kozlowski AC, Koch B, Brynjolfsson E, Evans J, Bernstein MS (2025) Position: LLM social simulations are a promising research method. Proceedings of the 42nd International Conference on Machine Learning, 81005--81034

  6. [6]

    NeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making, ://openreview.net/forum?id=LFlsYLfkM8

    Ao R, Luo G, Simchi-Levi D, Wang X (2025) Optimizing LLM inference: Fluid-based online scheduling under memory constraints. NeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making, ://openreview.net/forum?id=LFlsYLfkM8

  7. [7]

    Political Analysis 31(3):337--351

    Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of one, many: Using language models to simulate human samples. Political Analysis 31(3):337--351

  8. [8]

    Asmussen S, Glynn PW (2007) Stochastic Simulation: Algorithm and Analysis (Springer)

  9. [9]

    Journal of Economic Literature 63(1):197--287

    Axtell RL, Farmer JD (2025) Agent-based modeling in economics and finance: Past, present, and future. Journal of Economic Literature 63(1):197--287

  10. [10]

    Foundations of Computational Mathematics 22(1):35--76

    Balasubramanian K, Ghadimi S (2022) Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Foundations of Computational Mathematics 22(1):35--76

  11. [11]

    Foundations and Trends in Technology, Information and Operations Management 18(3-4):214--420

    Bandi N, Cohen MC, Ray S (2024) Behavioral retail operations: Tactics to win customers. Foundations and Trends in Technology, Information and Operations Management 18(3-4):214--420

  12. [12]

    Robust and adaptive optimization under a large language model lens.arXiv preprint arXiv:2501.00568, 2024

    Bertsimas D, Margaritis G (2024) Robust and adaptive optimization under a large language model lens. https://arxiv.org/abs/2501.00568

  13. [13]

    consumer subsidy with green technology investment and environmental concern

    Bian J, Zhang G, Zhou G (2020) Manufacturer vs. consumer subsidy with green technology investment and environmental concern. European Journal of Operational Research 287(3):832--843

  14. [14]

    Operations Research, forthcoming

    Che E, Dong J, Tong XT (2026) Stochastic gradient descent with adaptive data. Operations Research, forthcoming

  15. [15]

    Chen Y, Kirshner SN, Ovchinnikov A, Andiappan M, Jenkin T (2025) A manager and an AI walk into a bar: Does ChatGPT make biased decisions like we do? Manufacturing & Service Operations Management 27(2):354--368

  16. [16]

    Proceedings of the National Academy of Sciences 120(51):e2316205120

    Chen Y, Liu TX, Shan Y, Zhong S (2023) The emergence of economic rationality of GPT . Proceedings of the National Academy of Sciences 120(51):e2316205120

  17. [17]

    Manufacturing & Service Operations Management 22(3):430--445

    Chen YJ, Dai T, Korpeoglu CG, K \"o rpeo g lu E, Sahin O, Tang CS, Xiao S (2020) OM forum— I nnovative online platforms: Research opportunities. Manufacturing & Service Operations Management 22(3):430--445

  18. [18]

    IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27(2):244--249

    Chin DC (1997) Comparative study of stochastic algorithms for system optimization based on gradient approximations. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27(2):244--249

  19. [19]

    (2026) AI in Supply Chains : Perspectives from Global Thought Leaders (Springer)

    Cohen MC, Dai T, eds. (2026) AI in Supply Chains : Perspectives from Global Thought Leaders (Springer)

  20. [20]

    Manufacturing & Service Operations Management, forthcoming

    Cohen MC, Dai T, Perakis G, Agrawal N, Allon G, Boute RN, Cachon GP, Chen Z, Cohen M, Cristian R, Deshpande V, de V\' e ricourt F, Fransoo JC, Gijsbrechts J, Harsha P, Hu M, Keskinocak P, Kwon C, Lee H, Liu S, Mellou K, Menache I, Miller J, Netessine S, Olsen TL, Pathuri J, Peels R, Qi Y, Raman A, Robinson A, Shen ZJM, Shunko M, Simchi-Levi D, Smalley H, ...

  21. [21]

    Management Science 62(5):1235--1258

    Cohen MC, Lobel R, Perakis G (2016) The impact of demand uncertainty on consumer subsidies for green technology adoption. Management Science 62(5):1235--1258

  22. [22]

    Production and Operations Management, forthcoming

    Dai T, Swaminathan JM (2026) Artificial intelligence and operations: A foundational framework of emerging research and practice. Production and Operations Management, forthcoming

  23. [23]

    Advances in Neural Information Processing Systems 35, 16344--16359

    Dao T, Fu D, Ermon S, Rudra A, R\' e C (2022) FlashAttention : F ast and memory-efficient exact attention with IO -awareness. Advances in Neural Information Processing Systems 35, 16344--16359

  24. [24]

    Donohue K, Katok E, Leider S, eds., The Handbook of Behavioral Operations, 149--198 (John Wiley & Sons)

    Davis AM (2018) Biases in individual decision-making. Donohue K, Katok E, Leider S, eds., The Handbook of Behavioral Operations, 149--198 (John Wiley & Sons)

  25. [25]

    Nature Reviews Psychology 2(11):688--701

    Demszky D, Yang D, Yeager DS, Bryan CJ, Clapper M, Chandhok S, Eichstaedt JC, Hecht C, Jamieson J, Johnson M, Jones M, Krettek-Cobb D, Lai L, JonesMitchell N, Ong DC, Dweck CS, Gross JJ, Pennebaker JW (2023) Using large language models in psychology. Nature Reviews Psychology 2(11):688--701

  26. [26]

    Advances in Neural Information Processing Systems 36, 10088--10115

    Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2023) QLoRA : efficient finetuning of quantized LLMs . Advances in Neural Information Processing Systems 36, 10088--10115

  27. [27]

    (2018) The Handbook of Behavioral Operations (John Wiley & Sons)

    Donohue K, Katok E, Leider S, eds. (2018) The Handbook of Behavioral Operations (John Wiley & Sons)

  28. [28]

    Manufacturing & Service Operations Management 22(1):191--202

    Donohue K, \" O zer O, Zheng Y (2020) Behavioral operations: Past, present, and future. Manufacturing & Service Operations Management 22(1):191--202

  29. [29]

    Mathematics of Operations Research 48(2):954--998

    Drusvyatskiy D, Xiao L (2023) Stochastic optimization with decision-dependent distributions. Mathematics of Operations Research 48(2):954--998

  30. [30]

    Transactions on Machine Learning Research

    Feng X, Dou L, Li M, Wang Q, Guo Y, Wang H, Ma C, Kong L (2025) A survey on large language model-based social agents in game-theoretic scenarios. Transactions on Machine Learning Research

  31. [31]

    Gel E, Ntaimo L, eds., Recent Advances in Optimization and Modeling of Contemporary Problems, 255--278, INFORMS TutORials in Operations Research (INFORMS)

    Frazier PI (2018) Bayesian optimization. Gel E, Ntaimo L, eds., Recent Advances in Optimization and Modeling of Contemporary Problems, 255--278, INFORMS TutORials in Operations Research (INFORMS)

  32. [32]

    Journal of Economic Perspectives 19(4):25--42

    Frederick S (2005) Cognitive reflection and decision making. Journal of Economic Perspectives 19(4):25--42

  33. [33]

    Humanities and Social Sciences Communications 11(1):1259

    Gao C, Lan X, Li N, Yuan Y, Ding J, Zhou Z, Xu F, Li Y (2024) Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanities and Social Sciences Communications 11(1):1259

  34. [34]

    SIAM Journal on Optimization 23(4):2341--2368

    Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4):2341--2368

  35. [35]

    Goli A, Singh A (2024) Frontiers: Can large language models capture human preferences? Marketing Science 43(4):709--722

  36. [36]

    separate crowdsourcing contests

    Hu M, Wang L (2021) Joint vs. separate crowdsourcing contests. Management Science 67(5):2711--2728

  37. [37]

    Operations Research 73(6):2986--3009

    Huang C, Tang Z, Hu S, Jiang R, Zheng X, Ge D, Wang B, Wang Z (2025) ORLM : A customizable framework in training large models for automated optimization modeling. Operations Research 73(6):2986--3009

  38. [38]

    Management Science, forthcoming

    Huang L, Zhang J, Zhang J (2026) Optimal contests with negative prizes: Theory and experiment. Management Science, forthcoming

  39. [39]

    Manufacturing & Service Operations Management 15(2):263--279

    Huang T, Allon G, Bassamboo A (2013) Bounded rationality in service systems. Manufacturing & Service Operations Management 15(2):263--279

  40. [40]

    arXiv preprint arXiv:2502.07115 , year=

    Jaillet P, Jiang J, Mellou K, Molinaro M, Podimata C, Zhou Z (2025) Online scheduling for LLM inference with KV cache constraints. https://arxiv.org/abs/2502.07115

  41. [41]

    SIAM Journal on Control and Optimization 42(4):1143--1166

    Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4):1143--1166

  42. [42]

    Production and Operations Management 22(5):1035--1055

    Krass D, Nedorezov T, Ovchinnikov A (2013) Environmental taxes and the choice of green technology. Production and Operations Management 22(5):1035--1055

  43. [43]

    Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer), 2nd edition

  44. [44]

    Foundations and Trends in Optimization 8(1-3):1--332

    LA P, Bhatnagar S (2025) Gradient-based algorithms for zeroth-order optimization. Foundations and Trends in Optimization 8(1-3):1--332

  45. [45]

    Management Science 36(11):1364--1383

    L'Ecuyer P (1990) A unified view of the IPA , SF , and LR gradient estimation techniques. Management Science 36(11):1364--1383

  46. [46]

    Management Science 40(11):1562--1578

    L'Ecuyer P, Glynn PW (1994) Stochastic optimization by simulation: Convergence proofs for the GI/G/1 queue in steady-state. Management Science 40(11):1562--1578

  47. [47]

    Operations Research, forthcoming

    Li X, Liang J, Chen X, Zhang Z (2026) Convergence and inference of stream stochastic gradient descent, with applications to queueing systems and inventory control. Operations Research, forthcoming

  48. [48]

    arXiv preprint arXiv:2504.07347 , year=

    Li Y, Dai J, Peng T (2025) Throughput-optimal scheduling algorithms for LLM inference and AI agents. https://arxiv.org/abs/2504.07347

  49. [49]

    Journal of Economic Theory 175:291--317

    Liu B, Lu J, Wang R, Zhang J (2018) Optimal prize allocation in contests: The role of negative prizes. Journal of Economic Theory 175:291--317

  50. [50]

    ACM Computing Surveys 55(9):Article No

    Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):Article No. 195

  51. [51]

    International Conference on Machine Learning, 4264--4273

    Maheswaranathan N, Metz L, Tucker G, Choi D, Sohl-Dickstein J (2019) Guided evolutionary strategies: Augmenting random search with surrogate gradients. International Conference on Machine Learning, 4264--4273

  52. [52]

    Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability (Springer), 2nd edition

  53. [53]

    Stochastic Systems 15(3):195--219

    Mitzenmacher M, Shahout R (2025) Queueing, predictions, and large language models: Challenges and open problems. Stochastic Systems 15(3):195--219

  54. [54]

    Foundations of Computational Mathematics 17(2):527--566

    Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations of Computational Mathematics 17(2):527--566

  55. [55]

    Management Science 72(2):1095--1111

    Nittala L, Erat S (2026) Designing knowledge-driven innovation contests. Management Science 72(2):1095--1111

  56. [56]

    Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, Article No.: 74

    Park JS, Popowski L, Cai C, Morris MR, Liang P, Bernstein MS (2022) Social simulacra: Creating populated prototypes for social computing systems. Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, Article No.: 74

  57. [57]

    Proceedings of the 39th International Conference on Machine Learning, 18332--18346

    Rajbhandari S, Li C, Yao Z, Zhang M, Aminabadi RY, Awan AA, Rasley J, He Y (2022) D eep S peed- M o E : Advancing mixture-of-experts inference and training to power next-generation AI scale. Proceedings of the 39th International Conference on Machine Learning, 18332--18346

  58. [58]

    Raschka S (2024) Build A Large Language Model (From Scratch) (Manning)

  59. [59]

    https://arxiv.org/abs/2510.26494

    Shirani S, Bayati M (2025) Simulating and experimenting with social media mobilization using LLM agents. https://arxiv.org/abs/2510.26494

  60. [60]

    AI in Supply Chains: Perspectives from Global Thought Leaders, 93--104 (Springer)

    Simchi-Levi D, Mellou K, Menache I, Pathuri J (2026) Large language models for supply chain decisions. AI in Supply Chains: Perspectives from Global Thought Leaders, 93--104 (Springer)

  61. [61]

    (2022) Chain-of-thought prompting elicits reasoning in large language models

    Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35, 24824--24837

  62. [62]

    Operations Research 70(6):3519--3537

    Zhang Q, Hu J (2022) Actor-critic--like stochastic adaptive search for continuous simulation optimization. Operations Research 70(6):3519--3537

  63. [63]

    Automatica 136:110006

    Zhang Y, Zhou Y, Ji K, Zavlanos MM (2022) A new one-point residual-feedback oracle for black-box learning and control. Automatica 136:110006

  64. [64]

    arXiv preprint arXiv:2507.11737 , year=

    Zhou C, Yang J, Xin L, Chen Y, He Z, Ge D (2025) Auto-formulating dynamic programming problems with large language models. https://arxiv.org/abs/2507.11737