Recognition: unknown
Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation
Pith reviewed 2026-05-10 06:03 UTC · model grok-4.3
The pith
Heterogeneous LLM agents in supply chain simulations exhibit myopic self-interested behaviors that worsen inefficiencies, but information sharing mitigates these effects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Results indicate that agents exhibit myopic and self-interested behaviors that exacerbate systemic inefficiencies. However, we demonstrate that information sharing effectively mitigates these adverse effects.
Load-bearing premise
That behaviors observed in LLM-based agents with varying reasoning sophistication accurately proxy and generalize human cognitive biases and decision-making in real multi-stage supply chains.
Figures
read the original abstract
Modeling coordination among generative agents in complex multi-round decision-making presents a core challenge for AI and operations management. Although behavioral experiments have revealed cognitive biases behind supply chain inefficiencies, traditional methods face scalability and control limitations. We introduce a scalable experimental paradigm using Large Language Models (LLMs) to simulate multi-stage supply chain dynamics. Grounded in a Hierarchical Reasoning Framework, this study specifically analyzes the impact of cognitive heterogeneity on agent interactions. Unlike prior homogeneous settings, we employ DeepSeek and GPT agents to systematically vary reasoning sophistication across supply chain tiers. Through rigorously replicated and statistically validated simulations, we investigate how this cognitive diversity influences collective outcomes. Results indicate that agents exhibit myopic and self-interested behaviors that exacerbate systemic inefficiencies. However, we demonstrate that information sharing effectively mitigates these adverse effects. Our findings extend traditional behavioral methods and offer new insights into the dynamics of AI-enabled organizations. This work underscores both the potential and limitations of LLM-based agents as proxies for human decision-making in complex operational environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a scalable LLM-based simulation paradigm for multi-stage supply chains, using a Hierarchical Reasoning Framework with DeepSeek and GPT agents to instantiate cognitive heterogeneity across tiers. It reports that heterogeneous agents exhibit myopic and self-interested behaviors exacerbating systemic inefficiencies in multi-round decisions, but that information sharing mitigates these effects. The work claims rigorous replication, statistical validation, and positions the approach as extending traditional behavioral experiments while highlighting limitations of LLMs as human proxies.
Significance. If the LLM agents faithfully reproduce human-like cognitive biases, the framework offers a scalable, controllable method to study supply-chain coordination beyond the limits of human-subject experiments such as the beer game. It provides concrete evidence on the role of reasoning heterogeneity and the value of information sharing, with direct relevance to designing AI-augmented operational systems. The explicit use of multiple model families and emphasis on replication are strengths that could support reproducible follow-on work.
major comments (1)
- The central claim that information sharing mitigates inefficiencies arising from cognitive heterogeneity rests on the assumption that behaviors observed in DeepSeek and GPT agents accurately proxy human decision-making biases. No calibration against human data, no comparison to established beer-game results, and no ablation of prompting or model artifacts are reported, so the attribution of outcomes to intended heterogeneity rather than LLM-specific factors cannot be assessed.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The feedback highlights an important aspect of our work's positioning, and we address it directly below while outlining planned revisions.
read point-by-point responses
-
Referee: The central claim that information sharing mitigates inefficiencies arising from cognitive heterogeneity rests on the assumption that behaviors observed in DeepSeek and GPT agents accurately proxy human decision-making biases. No calibration against human data, no comparison to established beer-game results, and no ablation of prompting or model artifacts are reported, so the attribution of outcomes to intended heterogeneity rather than LLM-specific factors cannot be assessed.
Authors: We agree that the manuscript does not include direct calibration to human-subject beer-game data or quantitative comparisons to established results from the literature, nor does it report systematic ablations isolating prompting or model-specific artifacts. Our primary aim was to develop and validate a scalable LLM-based simulation framework for investigating cognitive heterogeneity, rather than to establish LLMs as precise human proxies. The observed myopic and self-interested behaviors are presented as emerging from the instantiated heterogeneity across model families, with information sharing shown to reduce inefficiencies within these simulations; we note qualitative alignment with known supply-chain coordination issues but did not perform formal benchmarking. In revision, we will expand the limitations and discussion sections to explicitly state the absence of human calibration, add a dedicated subsection on potential model artifacts with any feasible additional checks, and reframe the central claims to emphasize dynamics within LLM agent systems while noting implications for human-like bias studies. This preserves the contribution as a complementary methodological tool. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper reports outcomes from LLM-based simulations of multi-stage supply chains, using DeepSeek and GPT agents to instantiate cognitive heterogeneity and observing myopic behaviors mitigated by information sharing. These are generated results from the experimental runs rather than any derivation, fitted parameter, or self-referential definition that reduces to the inputs by construction. No equations, uniqueness theorems, or ansatzes are presented in the provided text that would trigger self-definitional, fitted-prediction, or self-citation load-bearing patterns. The study is self-contained as a simulation paradigm with independent content from its setup and statistical validation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents with different models can reliably simulate distinct levels of human reasoning sophistication and associated behavioral biases in supply chain decisions
Reference graph
Works this paper leans on
-
[1]
Management science , volume=
Modeling managerial behavior: Misperceptions of feedback in a dynamic decision making experiment , author=. Management science , volume=. 1989 , publisher=
1989
-
[2]
The quarterly journal of economics , volume=
A theory of fairness, competition, and cooperation , author=. The quarterly journal of economics , volume=. 1999 , publisher=
1999
-
[3]
Management science , volume=
Behavioral causes of the bullwhip effect and the observed value of inventory information , author=. Management science , volume=. 2006 , publisher=
2006
-
[4]
Journal of Operations Management , volume=
Behavioral operations: the state of the field , author=. Journal of Operations Management , volume=. 2013 , publisher=
2013
-
[5]
Management science , volume=
Supply chain decision making: Will shorter cycle times and shared point-of-sale information necessarily help? , author=. Management science , volume=. 2004 , publisher=
2004
-
[6]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[7]
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Emergent Abilities of Large Language Models
Emergent abilities of large language models , author=. arXiv preprint arXiv:2206.07682 , year=
work page internal anchor Pith review arXiv
-
[9]
Advances in neural information processing systems , volume=
Are emergent abilities of large language models a mirage? , author=. Advances in neural information processing systems , volume=
-
[10]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of artificial general intelligence: Early experiments with gpt-4 , author=. arXiv preprint arXiv:2303.12712 , year=
work page internal anchor Pith review arXiv
-
[11]
On the Opportunities and Risks of Foundation Models
On the opportunities and risks of foundation models , author=. arXiv preprint arXiv:2108.07258 , year=
work page internal anchor Pith review arXiv
-
[12]
Evaluating Large Language Models Trained on Code
Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
-
[14]
Nature Human Behaviour , pages=
Playing repeated games with large language models , author=. Nature Human Behaviour , pages=. 2025 , publisher=
2025
-
[15]
The eleventh international conference on learning representations , year=
Large language models are human-level prompt engineers , author=. The eleventh international conference on learning representations , year=
-
[16]
Nature Human Behaviour , volume=
Emergent analogical reasoning in large language models , author=. Nature Human Behaviour , volume=. 2023 , publisher=
2023
-
[17]
Political Analysis , volume=
Out of one, many: Using language models to simulate human samples , author=. Political Analysis , volume=. 2023 , publisher=
2023
-
[18]
arXiv preprint arXiv:2302.02083 , year=
Theory of mind may have spontaneously emerged in large language models , author=. arXiv preprint arXiv:2302.02083 , volume=
-
[19]
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Truthfulqa: Measuring how models mimic human falsehoods , author=. arXiv preprint arXiv:2109.07958 , year=
work page internal anchor Pith review arXiv
-
[20]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Can llms reliably simulate human learner actions? a simulation authoring framework for open-ended learning environments , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[21]
Advances in Neural Information Processing Systems , volume=
Decision-making behavior evaluation framework for llms under uncertain context , author=. Advances in Neural Information Processing Systems , volume=
-
[22]
ACM transactions on intelligent systems and technology , volume=
A survey on evaluation of large language models , author=. ACM transactions on intelligent systems and technology , volume=. 2024 , publisher=
2024
-
[23]
Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing , pages=
Toward a better understanding of the emotional dynamics of negotiation with large language models , author=. Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing , pages=
-
[24]
Journal of Physics: Complexity , volume=
Measuring an artificial intelligence language model’s trust in humans using machine incentives , author=. Journal of Physics: Complexity , volume=. 2024 , publisher=
2024
-
[25]
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI feedback , author=. arXiv preprint arXiv:2212.08073 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
Advances in Neural Information Processing Systems , volume=
Meta-in-context learning in large language models , author=. Advances in Neural Information Processing Systems , volume=
-
[27]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[28]
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements , author=. arXiv preprint arXiv:2209.14375 , year=
work page internal anchor Pith review arXiv
-
[29]
OpenAI blog , volume=
Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
-
[30]
arXiv preprint arXiv:2505.18597 , year=
LLMs for Supply Chain Management , author=. arXiv preprint arXiv:2505.18597 , year=
-
[31]
arXiv preprint arXiv:2312.00798 , year=
A Turing Test: Are AI Chatbots Behaviorally Similar to Humans? , author=. arXiv preprint arXiv:2312.00798 , year=
-
[32]
The quarterly journal of economics , volume=
A cognitive hierarchy model of games , author=. The quarterly journal of economics , volume=. 2004 , publisher=
2004
-
[33]
UNSW Business School Research Paper Forthcoming , year=
Artificial agents and operations management decision-making , author=. UNSW Business School Research Paper Forthcoming , year=
-
[34]
Management Science , volume=
A replication study of operations management experiments in management science , author=. Management Science , volume=. 2023 , publisher=
2023
-
[35]
Nature Computational Science , pages=
A large-scale replication of scenario-based experiments in psychology and management using large language models , author=. Nature Computational Science , pages=. 2025 , publisher=
2025
-
[36]
Nature , volume=
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , author=. Nature , volume=. 2025 , publisher=
2025
-
[37]
Invagent: A large language model based multi-agent system for inventory management in supply chains , author=. arXiv preprint arXiv:2407.11384 , year=
-
[38]
Nature Human Behaviour , pages=
Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement , author=. Nature Human Behaviour , pages=. 2025 , publisher=
2025
-
[39]
Is independent learning all you need in the starcraft multi-agent challenge? , author=. arXiv preprint arXiv:2011.09533 , year=
-
[40]
Advances in neural information processing systems , volume=
The surprising effectiveness of ppo in cooperative multi-agent games , author=. Advances in neural information processing systems , volume=
-
[41]
arXiv preprint arXiv:2307.03875 , year=
Large language models for supply chain optimization , author=. arXiv preprint arXiv:2307.03875 , year=
-
[42]
International Journal of Production Research , pages=
Agentic LLMs in the supply chain: towards autonomous multi-agent consensus-seeking , author=. International Journal of Production Research , pages=. 2026 , publisher=
2026
-
[43]
Findings of the Association for Computational Linguistics: ACL 2025 , pages=
MegaAgent: A large-scale autonomous LLM-based multi-agent system without predefined SOPs , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=
2025
-
[44]
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Multiagentbench: Evaluating the collaboration and competition of llm agents , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[45]
Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
Llm-coordination: evaluating and analyzing multi-agent coordination abilities in large language models , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
2025
-
[46]
Operations Research , volume=
Disruption and rerouting in supply chain networks , author=. Operations Research , volume=. 2023 , publisher=
2023
-
[47]
Production and Operations Management , volume=
Renegotiations in the presence of supply disruptions , author=. Production and Operations Management , volume=. 2025 , publisher=
2025
-
[48]
Management Science , volume=
Disruption risk and optimal sourcing in multitier supply networks , author=. Management Science , volume=. 2017 , publisher=
2017
-
[49]
Management Science , volume=
Impact of traceability technology adoption in food supply chain networks , author=. Management Science , volume=. 2023 , publisher=
2023
-
[50]
arXiv preprint arXiv:2511.01448 , year=
Licomemory: Lightweight and cognitive agentic memory for efficient long-term reasoning , author=. arXiv preprint arXiv:2511.01448 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.