Optimizing Service Operations via LLM-Powered Multi-Agent Simulation
Pith reviewed 2026-05-10 19:40 UTC · model grok-4.3
The pith
An on-trajectory algorithm optimizes service designs by estimating gradients during one LLM multi-agent simulation run.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance in an LLM-powered multi-agent simulation of service operations posed as stochastic optimization with decision-dependent uncertainty.
What carries the argument
On-trajectory learning algorithm that builds zeroth-order gradient estimates while updating parameters inside a controlled Markov chain representation of LLM agent interactions.
If this is right
- The approach outperforms black-box optimization, LLMs used as numerical solvers, and LLMs used as role-playing designers in a sustainable supply chain case.
- It functions as a cost-effective evaluator of known designs when compared against real behavioral data in contest design.
- It identifies strong designs that traditional methods miss in the contest design case study.
- Variance reduction techniques are incorporated to stabilize the gradient estimates obtained during the run.
Where Pith is reading between the lines
- Service designers could test far more policy variations at low cost before committing to real-world pilots.
- The method reframes prompt engineering as an optimizable system design task rather than manual iteration.
- Similar single-run gradient learning could transfer to simulation-based policy tuning in domains like healthcare scheduling or education interventions.
Load-bearing premise
LLM-generated text can be parsed reliably for numerical outcomes and the simulated agent behaviors approximate how real people respond to design choices.
What would settle it
Applying the designs found by the single-run optimization to a real service system and observing no improvement in measured steady-state performance relative to benchmarks or current practice.
read the original abstract
Service system performance depends on how participants respond to design choices, but modeling these responses is hard due to the complexity of human behavior. We introduce an LLM-powered multi-agent simulation (LLM-MAS) framework for optimizing service operations. We pose the problem as stochastic optimization with decision-dependent uncertainty: design choices are embedded in prompts and shape the distribution of outcomes from interacting LLM-powered agents. By embedding key numerical information in prompts and extracting it from LLM-generated text, we model this uncertainty as a controlled Markov chain. We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance. We also incorporate variance reduction techniques. In a sustainable supply chain application, our method outperforms benchmarks, including blackbox optimization and using LLMs as numerical solvers or as role-playing system designers. A case study on optimal contest design with real behavioral data shows that LLM-MAS is both as a cost-effective evaluator of known designs and an exploratory tool that can uncover strong designs overlooked by traditional approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an LLM-powered multi-agent simulation (LLM-MAS) framework to address the challenge of modeling complex human responses in service systems for optimization purposes. Design choices are incorporated into prompts that influence the behavior of interacting LLM agents, framing the problem as stochastic optimization with decision-dependent uncertainty. The core technical contribution is an on-trajectory learning algorithm that, within a single simulation trajectory, simultaneously generates zeroth-order gradient estimates and performs parameter updates to optimize steady-state performance, augmented by variance reduction techniques. Empirical validation includes a sustainable supply chain application demonstrating outperformance over black-box optimization and LLM-based alternatives, as well as a contest design case study that positions LLM-MAS as both an evaluator and an exploratory tool compared to traditional methods using real behavioral data.
Significance. If the results hold, particularly the reliability of extracting numerical outcomes from LLM text and the validity of the gradient estimates, the work has substantial significance for the field of service operations management and AI-assisted simulation. It provides a novel approach to handling decision-dependent uncertainty without requiring extensive real-world experimentation or simplified behavioral models. The on-trajectory nature allows efficient optimization in a single run, which is a strength. The case studies suggest practical utility in supply chain sustainability and contest design, potentially reducing costs and enabling discovery of overlooked designs. However, the significance is tempered by the need to address the robustness of LLM parsing and approximation to human behavior.
major comments (3)
- [§3] §3 (On-trajectory learning algorithm): The construction of zeroth-order gradient estimates from the same trajectory used for updates introduces potential circularity, especially since variance-reduction techniques and prompt-extraction rules may have been tuned on these runs. The manuscript does not explicitly demonstrate separation between data used for fitting and evaluation, which is critical for validating the unbiasedness of the estimates.
- [§5] §5 (Sustainable supply chain application): The abstract claims outperformance but the provided details lack quantitative results, error bars, ablation studies on the variance reduction or parsing components, and discussion of LLM hallucination effects. This makes it difficult to assess the robustness of the claimed superiority over benchmarks.
- [§4] §4 (Modeling as controlled Markov chain): The assumption that LLM-generated text can be reliably parsed for numerical outcomes is load-bearing for the gradient estimation. No general bounds on extraction error or proof of consistency for the resulting estimators are provided, and errors would compound in the on-trajectory setting rather than averaging out.
minor comments (2)
- [§2] The notation used for the decision-dependent uncertainty could be more explicitly defined with an equation in the problem formulation section.
- [§6] Figure captions in the case study section should include more details on the experimental setup to improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate. We have aimed to strengthen the presentation of the on-trajectory algorithm, empirical results, and modeling assumptions without overstating theoretical guarantees.
read point-by-point responses
-
Referee: [§3] §3 (On-trajectory learning algorithm): The construction of zeroth-order gradient estimates from the same trajectory used for updates introduces potential circularity, especially since variance-reduction techniques and prompt-extraction rules may have been tuned on these runs. The manuscript does not explicitly demonstrate separation between data used for fitting and evaluation, which is critical for validating the unbiasedness of the estimates.
Authors: We appreciate the referee's concern about potential circularity. The on-trajectory algorithm generates zeroth-order gradient estimates via randomized perturbations applied to the current parameter within the ongoing Markov chain trajectory, following the structure of simultaneous perturbation stochastic approximation; the update uses the estimate but the perturbation distribution is independent of prior updates. Variance-reduction techniques and extraction rules were developed on separate preliminary simulation runs not included in the reported optimization trajectories. In the revision we will add an explicit statement of this data separation, a short proof sketch of unbiasedness under the controlled Markov chain model, and pseudocode clarifying the timing of estimation versus update. revision: partial
-
Referee: [§5] §5 (Sustainable supply chain application): The abstract claims outperformance but the provided details lack quantitative results, error bars, ablation studies on the variance reduction or parsing components, and discussion of LLM hallucination effects. This makes it difficult to assess the robustness of the claimed superiority over benchmarks.
Authors: We agree that the current results section is insufficiently detailed. The revised manuscript will include tables with mean performance values and standard errors computed over 10 independent runs, ablation experiments that isolate the variance-reduction and parsing modules, and a new subsection on hallucination mitigation (including prompt constraints and post-processing rules). These additions will allow direct quantitative comparison with black-box optimization and the LLM-based baselines. revision: yes
-
Referee: [§4] §4 (Modeling as controlled Markov chain): The assumption that LLM-generated text can be reliably parsed for numerical outcomes is load-bearing for the gradient estimation. No general bounds on extraction error or proof of consistency for the resulting estimators are provided, and errors would compound in the on-trajectory setting rather than averaging out.
Authors: The referee correctly identifies that reliable numerical extraction is a foundational assumption. While the two case studies provide empirical evidence of consistent extraction, we do not possess general bounds on LLM parsing error because the underlying models are proprietary. In the revision we will expand Section 4 with a discussion of how extraction errors propagate in the on-trajectory setting and add a limitations paragraph acknowledging the absence of consistency proofs. We will also report extraction accuracy statistics from the experiments. revision: partial
- General theoretical bounds on LLM text extraction error and formal proof of estimator consistency under extraction noise
Circularity Check
No significant circularity in the claimed derivation chain.
full rationale
The paper introduces an LLM-MAS framework and an on-trajectory learning algorithm for simultaneous zeroth-order gradient estimation and parameter updates within a single simulation trajectory. This is presented as a standard stochastic optimization technique applied to a controlled Markov chain model of LLM agent interactions, with variance reduction incorporated. No equations or steps are shown to reduce by construction to fitted inputs, self-definitions, or self-citation chains; the algorithm's validity rests on external assumptions about LLM parsing reliability and behavioral approximation rather than internal equivalence. The applications (supply chain and contest design) serve as empirical validation outside the derivation itself. The central claim remains independent of its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM outputs can be treated as samples from a decision-dependent distribution that is stable enough to form a controlled Markov chain
invented entities (1)
-
LLM-MAS framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop an on-trajectory learning algorithm that, on a single simulation run, simultaneously constructs zeroth-order gradient estimates and updates design parameters to optimize steady-state performance.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We pose the problem as stochastic optimization with decision-dependent uncertainty... modeled as a controlled Markov chain.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter doi edition editor eid howpublished institution isbn issn journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in "" FUNCTION format.date year ...
-
[3]
Manufacturing & Service Operations Management 25(4):1376--1393
Allon G, Cohen MC, Sinchaisri WP (2023) The impact of behavioral and economic drivers on gig economy workers. Manufacturing & Service Operations Management 25(4):1376--1393
work page 2023
-
[4]
https: //arxiv.org/abs/2508.02630 (2025)
Allouah A, Besbes O, Figueroa JD, Kanoria Y, Kumar A (2025) What is your AI agent buying? Evaluation , implications and emerging questions for agentic e-commerce. https://arxiv.org/abs/2508.02630
-
[5]
Proceedings of the 42nd International Conference on Machine Learning, 81005--81034
Anthis JR, Liu R, Richardson SM, Kozlowski AC, Koch B, Brynjolfsson E, Evans J, Bernstein MS (2025) Position: LLM social simulations are a promising research method. Proceedings of the 42nd International Conference on Machine Learning, 81005--81034
work page 2025
-
[6]
Ao R, Luo G, Simchi-Levi D, Wang X (2025) Optimizing LLM inference: Fluid-based online scheduling under memory constraints. NeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making, ://openreview.net/forum?id=LFlsYLfkM8
work page 2025
-
[7]
Political Analysis 31(3):337--351
Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of one, many: Using language models to simulate human samples. Political Analysis 31(3):337--351
work page 2023
-
[8]
Asmussen S, Glynn PW (2007) Stochastic Simulation: Algorithm and Analysis (Springer)
work page 2007
-
[9]
Journal of Economic Literature 63(1):197--287
Axtell RL, Farmer JD (2025) Agent-based modeling in economics and finance: Past, present, and future. Journal of Economic Literature 63(1):197--287
work page 2025
-
[10]
Foundations of Computational Mathematics 22(1):35--76
Balasubramanian K, Ghadimi S (2022) Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points. Foundations of Computational Mathematics 22(1):35--76
work page 2022
-
[11]
Foundations and Trends in Technology, Information and Operations Management 18(3-4):214--420
Bandi N, Cohen MC, Ray S (2024) Behavioral retail operations: Tactics to win customers. Foundations and Trends in Technology, Information and Operations Management 18(3-4):214--420
work page 2024
-
[12]
Bertsimas D, Margaritis G (2024) Robust and adaptive optimization under a large language model lens. https://arxiv.org/abs/2501.00568
-
[13]
consumer subsidy with green technology investment and environmental concern
Bian J, Zhang G, Zhou G (2020) Manufacturer vs. consumer subsidy with green technology investment and environmental concern. European Journal of Operational Research 287(3):832--843
work page 2020
-
[14]
Operations Research, forthcoming
Che E, Dong J, Tong XT (2026) Stochastic gradient descent with adaptive data. Operations Research, forthcoming
work page 2026
-
[15]
Chen Y, Kirshner SN, Ovchinnikov A, Andiappan M, Jenkin T (2025) A manager and an AI walk into a bar: Does ChatGPT make biased decisions like we do? Manufacturing & Service Operations Management 27(2):354--368
work page 2025
-
[16]
Proceedings of the National Academy of Sciences 120(51):e2316205120
Chen Y, Liu TX, Shan Y, Zhong S (2023) The emergence of economic rationality of GPT . Proceedings of the National Academy of Sciences 120(51):e2316205120
work page 2023
-
[17]
Manufacturing & Service Operations Management 22(3):430--445
Chen YJ, Dai T, Korpeoglu CG, K \"o rpeo g lu E, Sahin O, Tang CS, Xiao S (2020) OM forum— I nnovative online platforms: Research opportunities. Manufacturing & Service Operations Management 22(3):430--445
work page 2020
-
[18]
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27(2):244--249
Chin DC (1997) Comparative study of stochastic algorithms for system optimization based on gradient approximations. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 27(2):244--249
work page 1997
-
[19]
(2026) AI in Supply Chains : Perspectives from Global Thought Leaders (Springer)
Cohen MC, Dai T, eds. (2026) AI in Supply Chains : Perspectives from Global Thought Leaders (Springer)
work page 2026
-
[20]
Manufacturing & Service Operations Management, forthcoming
Cohen MC, Dai T, Perakis G, Agrawal N, Allon G, Boute RN, Cachon GP, Chen Z, Cohen M, Cristian R, Deshpande V, de V\' e ricourt F, Fransoo JC, Gijsbrechts J, Harsha P, Hu M, Keskinocak P, Kwon C, Lee H, Liu S, Mellou K, Menache I, Miller J, Netessine S, Olsen TL, Pathuri J, Peels R, Qi Y, Raman A, Robinson A, Shen ZJM, Shunko M, Simchi-Levi D, Smalley H, ...
work page 2026
-
[21]
Management Science 62(5):1235--1258
Cohen MC, Lobel R, Perakis G (2016) The impact of demand uncertainty on consumer subsidies for green technology adoption. Management Science 62(5):1235--1258
work page 2016
-
[22]
Production and Operations Management, forthcoming
Dai T, Swaminathan JM (2026) Artificial intelligence and operations: A foundational framework of emerging research and practice. Production and Operations Management, forthcoming
work page 2026
-
[23]
Advances in Neural Information Processing Systems 35, 16344--16359
Dao T, Fu D, Ermon S, Rudra A, R\' e C (2022) FlashAttention : F ast and memory-efficient exact attention with IO -awareness. Advances in Neural Information Processing Systems 35, 16344--16359
work page 2022
-
[24]
Davis AM (2018) Biases in individual decision-making. Donohue K, Katok E, Leider S, eds., The Handbook of Behavioral Operations, 149--198 (John Wiley & Sons)
work page 2018
-
[25]
Nature Reviews Psychology 2(11):688--701
Demszky D, Yang D, Yeager DS, Bryan CJ, Clapper M, Chandhok S, Eichstaedt JC, Hecht C, Jamieson J, Johnson M, Jones M, Krettek-Cobb D, Lai L, JonesMitchell N, Ong DC, Dweck CS, Gross JJ, Pennebaker JW (2023) Using large language models in psychology. Nature Reviews Psychology 2(11):688--701
work page 2023
-
[26]
Advances in Neural Information Processing Systems 36, 10088--10115
Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2023) QLoRA : efficient finetuning of quantized LLMs . Advances in Neural Information Processing Systems 36, 10088--10115
work page 2023
-
[27]
(2018) The Handbook of Behavioral Operations (John Wiley & Sons)
Donohue K, Katok E, Leider S, eds. (2018) The Handbook of Behavioral Operations (John Wiley & Sons)
work page 2018
-
[28]
Manufacturing & Service Operations Management 22(1):191--202
Donohue K, \" O zer O, Zheng Y (2020) Behavioral operations: Past, present, and future. Manufacturing & Service Operations Management 22(1):191--202
work page 2020
-
[29]
Mathematics of Operations Research 48(2):954--998
Drusvyatskiy D, Xiao L (2023) Stochastic optimization with decision-dependent distributions. Mathematics of Operations Research 48(2):954--998
work page 2023
-
[30]
Transactions on Machine Learning Research
Feng X, Dou L, Li M, Wang Q, Guo Y, Wang H, Ma C, Kong L (2025) A survey on large language model-based social agents in game-theoretic scenarios. Transactions on Machine Learning Research
work page 2025
-
[31]
Frazier PI (2018) Bayesian optimization. Gel E, Ntaimo L, eds., Recent Advances in Optimization and Modeling of Contemporary Problems, 255--278, INFORMS TutORials in Operations Research (INFORMS)
work page 2018
-
[32]
Journal of Economic Perspectives 19(4):25--42
Frederick S (2005) Cognitive reflection and decision making. Journal of Economic Perspectives 19(4):25--42
work page 2005
-
[33]
Humanities and Social Sciences Communications 11(1):1259
Gao C, Lan X, Li N, Yuan Y, Ding J, Zhou Z, Xu F, Li Y (2024) Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanities and Social Sciences Communications 11(1):1259
work page 2024
-
[34]
SIAM Journal on Optimization 23(4):2341--2368
Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4):2341--2368
work page 2013
-
[35]
Goli A, Singh A (2024) Frontiers: Can large language models capture human preferences? Marketing Science 43(4):709--722
work page 2024
-
[36]
separate crowdsourcing contests
Hu M, Wang L (2021) Joint vs. separate crowdsourcing contests. Management Science 67(5):2711--2728
work page 2021
-
[37]
Operations Research 73(6):2986--3009
Huang C, Tang Z, Hu S, Jiang R, Zheng X, Ge D, Wang B, Wang Z (2025) ORLM : A customizable framework in training large models for automated optimization modeling. Operations Research 73(6):2986--3009
work page 2025
-
[38]
Management Science, forthcoming
Huang L, Zhang J, Zhang J (2026) Optimal contests with negative prizes: Theory and experiment. Management Science, forthcoming
work page 2026
-
[39]
Manufacturing & Service Operations Management 15(2):263--279
Huang T, Allon G, Bassamboo A (2013) Bounded rationality in service systems. Manufacturing & Service Operations Management 15(2):263--279
work page 2013
-
[40]
arXiv preprint arXiv:2502.07115 , year=
Jaillet P, Jiang J, Mellou K, Molinaro M, Podimata C, Zhou Z (2025) Online scheduling for LLM inference with KV cache constraints. https://arxiv.org/abs/2502.07115
-
[41]
SIAM Journal on Control and Optimization 42(4):1143--1166
Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4):1143--1166
work page 2003
-
[42]
Production and Operations Management 22(5):1035--1055
Krass D, Nedorezov T, Ovchinnikov A (2013) Environmental taxes and the choice of green technology. Production and Operations Management 22(5):1035--1055
work page 2013
-
[43]
Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer), 2nd edition
work page 2003
-
[44]
Foundations and Trends in Optimization 8(1-3):1--332
LA P, Bhatnagar S (2025) Gradient-based algorithms for zeroth-order optimization. Foundations and Trends in Optimization 8(1-3):1--332
work page 2025
-
[45]
Management Science 36(11):1364--1383
L'Ecuyer P (1990) A unified view of the IPA , SF , and LR gradient estimation techniques. Management Science 36(11):1364--1383
work page 1990
-
[46]
Management Science 40(11):1562--1578
L'Ecuyer P, Glynn PW (1994) Stochastic optimization by simulation: Convergence proofs for the GI/G/1 queue in steady-state. Management Science 40(11):1562--1578
work page 1994
-
[47]
Operations Research, forthcoming
Li X, Liang J, Chen X, Zhang Z (2026) Convergence and inference of stream stochastic gradient descent, with applications to queueing systems and inventory control. Operations Research, forthcoming
work page 2026
-
[48]
arXiv preprint arXiv:2504.07347 , year=
Li Y, Dai J, Peng T (2025) Throughput-optimal scheduling algorithms for LLM inference and AI agents. https://arxiv.org/abs/2504.07347
-
[49]
Journal of Economic Theory 175:291--317
Liu B, Lu J, Wang R, Zhang J (2018) Optimal prize allocation in contests: The role of negative prizes. Journal of Economic Theory 175:291--317
work page 2018
-
[50]
ACM Computing Surveys 55(9):Article No
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):Article No. 195
work page 2023
-
[51]
International Conference on Machine Learning, 4264--4273
Maheswaranathan N, Metz L, Tucker G, Choi D, Sohl-Dickstein J (2019) Guided evolutionary strategies: Augmenting random search with surrogate gradients. International Conference on Machine Learning, 4264--4273
work page 2019
-
[52]
Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability (Springer), 2nd edition
work page 2012
-
[53]
Stochastic Systems 15(3):195--219
Mitzenmacher M, Shahout R (2025) Queueing, predictions, and large language models: Challenges and open problems. Stochastic Systems 15(3):195--219
work page 2025
-
[54]
Foundations of Computational Mathematics 17(2):527--566
Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations of Computational Mathematics 17(2):527--566
work page 2017
-
[55]
Management Science 72(2):1095--1111
Nittala L, Erat S (2026) Designing knowledge-driven innovation contests. Management Science 72(2):1095--1111
work page 2026
-
[56]
Park JS, Popowski L, Cai C, Morris MR, Liang P, Bernstein MS (2022) Social simulacra: Creating populated prototypes for social computing systems. Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, Article No.: 74
work page 2022
-
[57]
Proceedings of the 39th International Conference on Machine Learning, 18332--18346
Rajbhandari S, Li C, Yao Z, Zhang M, Aminabadi RY, Awan AA, Rasley J, He Y (2022) D eep S peed- M o E : Advancing mixture-of-experts inference and training to power next-generation AI scale. Proceedings of the 39th International Conference on Machine Learning, 18332--18346
work page 2022
-
[58]
Raschka S (2024) Build A Large Language Model (From Scratch) (Manning)
work page 2024
-
[59]
https://arxiv.org/abs/2510.26494
Shirani S, Bayati M (2025) Simulating and experimenting with social media mobilization using LLM agents. https://arxiv.org/abs/2510.26494
-
[60]
AI in Supply Chains: Perspectives from Global Thought Leaders, 93--104 (Springer)
Simchi-Levi D, Mellou K, Menache I, Pathuri J (2026) Large language models for supply chain decisions. AI in Supply Chains: Perspectives from Global Thought Leaders, 93--104 (Springer)
work page 2026
-
[61]
(2022) Chain-of-thought prompting elicits reasoning in large language models
Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35, 24824--24837
work page 2022
-
[62]
Operations Research 70(6):3519--3537
Zhang Q, Hu J (2022) Actor-critic--like stochastic adaptive search for continuous simulation optimization. Operations Research 70(6):3519--3537
work page 2022
-
[63]
Zhang Y, Zhou Y, Ji K, Zavlanos MM (2022) A new one-point residual-feedback oracle for black-box learning and control. Automatica 136:110006
work page 2022
-
[64]
arXiv preprint arXiv:2507.11737 , year=
Zhou C, Yang J, Xin L, Chen Y, He Z, Ge D (2025) Auto-formulating dynamic programming problems with large language models. https://arxiv.org/abs/2507.11737
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.