If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Adrian de Wynter

arxiv: 2605.31514 · v3 · pith:5ABTJAMEnew · submitted 2026-05-29 · 💻 cs.CL · cs.AI· cs.CY

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Adrian de Wynter This is my paper

Pith reviewed 2026-06-28 22:27 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY

keywords anthropomorphismlarge language modelsAge of Empires IIneural networksTuring completenesssubstrate dependencenull hypothesisagentic workflows

0 comments

The pith

Anthropomorphic attributes like morality or language understanding ascribed to LLMs can also be ascribed to a neural network trained on Age of Empires II, making them non-unique to language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that if LLMs are granted human-like attributes such as understanding or morality based on their behavior, then the same attributes can be granted to a simple neural network trained on Age of Empires II game data. This follows because any sufficiently powerful system in a different substrate could produce behavior that invites the same interpretation. The authors conclude that without explicit measurement criteria for these attributes, their assignment depends on the chosen representation and can shift with the substrate. Assuming the attributes exist or do not exist in a substrate-independent way produces either circular or uninformative results. They therefore recommend adopting a null assumption of non-uniqueness when designing experiments on these properties.

Core claim

The purported anthropomorphic attributes of LLMs are empirically non-unique: although some properties such as responses to prompts could remain invariant, others such as the interpretation of their perceived behaviour might change with the substrate. Any entity in a sufficiently-powerful substrate, such as a neural network trained on Age of Empires II or even non-computational systems like LEGO, could present such attributes. Hence any empirically-grounded discussion requires explicit measurement criteria; otherwise the interpretation is left to the representation. Assuming the attributes exist or not in a system independent of the substrate leads to circular or uninformative conclusions reg

What carries the argument

A simple neural network trained on Age of Empires II gameplay data, used to show that the same anthropomorphic attributes can be attributed to non-LLM systems depending on substrate and interpretation.

If this is right

Responses to prompts may stay the same across substrates, but the interpretation of perceived behavior changes.
Empirically grounded claims about these attributes require explicit measurement criteria rather than open interpretation.
Assuming the attributes exist or do not exist in a generalized, substrate-independent manner yields circular or uninformative conclusions.
Adopting a null assumption of non-uniqueness provides a workable starting point for setting up experiments on the topic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same non-uniqueness argument could be tested by training networks on other complete game environments to see whether attribute ascription shifts.
Discussions of AI consciousness or agency would need to specify measurement criteria before substrate comparisons become meaningful.
The Turing-completeness result for Age of Empires II supplies a concrete example that any sufficiently expressive system can serve as the substrate for such tests.

Load-bearing premise

A neural network trained on Age of Empires II will exhibit the same anthropomorphic attributes as LLMs when its outputs are interpreted in the same way.

What would settle it

A demonstration that the trained Age of Empires II neural network produces no outputs or behaviors that can be interpreted as possessing morality, understanding, or other listed anthropomorphic attributes under the same criteria applied to LLMs.

Figures

Figures reproduced from arXiv: 2605.31514 by Adrian de Wynter.

**Figure 1.** Figure 1: is better geared towards automation, and, although complex, builds a circuit less vulnerable to asynchronicity [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Bipolar 1-bit implementation of a perceptron in (left) AoE II, and (right) as a gate diagram, [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Ansatz-based training algorithm for our 1-bit perceptron, as a circuit (top) and as an AoE [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Breakdown of our dataset by composition (left) and year-wise trends (right). Overall it can [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

read the original abstract

Much research has been carried out on large language models (LLMs) and LLM-powered agentic workflows. However, many works within the field state emergence of, ascribe to, or assume, generalised anthropomorphic attributes to them (e.g., morality or understanding of natural language). Our goal is not to argue in favour or against the existence of these attributes, but to point out that these conclusions could be incorrect. For this we build and train a simple neural network on the videogame Age of Empires II, and note that any entity in a sufficiently-powerful substrate, such as LEGO or the Greater Boston Area, could also present such attributes. Hence, the purported anthropomorphic attributes of LLMs are empirically non-unique: although some properties (e.g., responses to prompts) could remain invariant, others, such as the interpretation of their perceived behaviour, might change with the substrate. Thus, any empirically-grounded discussion on these attributes requires explicit measurement criteria; otherwise the interpretation is left to the representation. We then show that assuming that these attributes exist or not in a system, independent of the substrate and in a generalised way, leads to either circular or uninformative conclusions. This is regardless of the experimenter's viewpoint on the subject, or whether the outcome shows existence or non-existence. Finally we propose a 'null' assumption, where one assumes LLM non-uniqueness instead of assuming anthropomorphic attributes to set up an experiment, along with examples of it. We also discuss potential objections to our work, briefly survey the field, and prove that Age of Empires II is functionally- and Turing-complete.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's move is to train a net on Age of Empires II as a counterexample showing LLM anthropomorphism isn't substrate-unique, but it never measures whether that net actually shows the claimed attributes.

read the letter

The punchline is that this paper tries to undercut claims of human-like attributes in LLMs by pointing to a simple neural net trained on Age of Empires II. If that net (or any system in a powerful enough substrate) could exhibit the same behaviors, then the attributes aren't unique to LLMs. They also prove the game is functionally and Turing complete, and they suggest starting experiments from a null assumption of non-uniqueness instead of assuming the attributes exist or don't.

What is new is the specific AoE II example plus the null-assumption framing. The paper does a clear job explaining why explicit measurement criteria are needed; without them, interpretations of behavior can shift with the substrate chosen.

The soft spot is exactly what the stress-test note flags. The abstract describes building and training the network and gives the completeness proof, but it supplies no results, metrics, or examples showing the net displays morality, natural language understanding, or the other attributes listed. The non-uniqueness claim therefore rests on logical possibility rather than observed presence of the attributes. That weakens the empirical part of the argument. The circularity discussion is reasonable but the null assumption itself could still need tighter justification to avoid the problem it aims to solve.

This is for people working on LLM evaluation and anthropomorphism debates. A reader who wants to see claims about emergence tightened up will find a useful angle here, even if the concrete example stays thin.

I would send it to peer review. The point about measurement criteria is worth referee time, and the authors engage the literature directly.

Referee Report

2 major / 1 minor

Summary. The paper claims that anthropomorphic attributes (e.g., morality, natural language understanding) ascribed to LLMs are empirically non-unique because a simple neural network trained on Age of Empires II could exhibit them; the game is shown to be functionally and Turing-complete, and any sufficiently powerful substrate could present such attributes. It argues that assuming existence or non-existence of these attributes leads to circular or uninformative conclusions independent of viewpoint, proposes a 'null' assumption of non-uniqueness for experiments, calls for explicit measurement criteria, surveys the field, and discusses objections.

Significance. If the central argument holds, the work would push the field toward substrate-independent measurement criteria when discussing LLM attributes, reducing reliance on interpretive assumptions. The Turing-completeness proof for AoE II and the concrete proposal of a null assumption are constructive elements that could support improved experimental designs.

major comments (2)

[Abstract and neural network training section] Abstract and the section describing the neural network training: the claim that the AoE II neural network 'could also present such attributes' is load-bearing for the empirical non-uniqueness conclusion, yet the manuscript supplies only training details and the Turing-completeness result with no evaluation, metrics, behavioral examples, or comparison showing that the trained network exhibits morality, natural language understanding, or the other listed attributes.
[Section on the 'null' assumption] Section proposing the 'null' assumption: the argument that assuming non-uniqueness avoids the circularity identified in other assumptions is not accompanied by a concrete demonstration or worked example showing how the null setup produces non-circular, informative conclusions where the alternatives do not.

minor comments (1)

Notation for 'anthropomorphic attributes' is used without an explicit enumerated list or table early in the manuscript, making it harder to track which properties are claimed to be non-unique.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below, indicating revisions where appropriate.

read point-by-point responses

Referee: [Abstract and neural network training section] Abstract and the section describing the neural network training: the claim that the AoE II neural network 'could also present such attributes' is load-bearing for the empirical non-uniqueness conclusion, yet the manuscript supplies only training details and the Turing-completeness result with no evaluation, metrics, behavioral examples, or comparison showing that the trained network exhibits morality, natural language understanding, or the other listed attributes.

Authors: The argument for empirical non-uniqueness relies on the Turing-completeness of Age of Empires II, which establishes that the substrate supports arbitrary computation and thus could in principle exhibit any behavior interpretable as an anthropomorphic attribute. The neural network training illustrates applicability of standard ML methods to this substrate but does not claim or demonstrate that this specific network exhibits morality, NLU, or similar attributes. The 'could' phrasing in the abstract reflects substrate capacity rather than observed behavior in the trained model. We will revise the abstract and training section to make this distinction explicit and to foreground the completeness proof as the grounding for the potential. revision: partial
Referee: [Section on the 'null' assumption] Section proposing the 'null' assumption: the argument that assuming non-uniqueness avoids the circularity identified in other assumptions is not accompanied by a concrete demonstration or worked example showing how the null setup produces non-circular, informative conclusions where the alternatives do not.

Authors: The manuscript presents the null assumption with conceptual examples illustrating how it structures experiments to test uniqueness rather than presuppose it. We acknowledge that an expanded, fully worked empirical-style example would improve clarity. We will add such an example in the revised manuscript, contrasting experimental designs under the null assumption versus existence or non-existence assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper's argument is conceptual rather than a formal derivation: it constructs and trains a neural network on Age of Empires II, proves the game is functionally and Turing complete, and concludes that anthropomorphic attributes are empirically non-unique because any sufficiently powerful substrate could present them. The proposal of a 'null' assumption of non-uniqueness is offered as a methodological alternative to assumptions of existence or non-existence. No load-bearing step reduces by construction to its inputs; there are no equations, fitted parameters renamed as predictions, self-citations, or self-definitional loops. The central claim rests on logical possibility of substrate dependence and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that behavioral attributes can be compared across different substrates without additional criteria.

axioms (1)

domain assumption Anthropomorphic attributes can be identified in behavioral outputs without substrate-specific measurement criteria
This is invoked to argue that the NN in AoE II shows the attributes.

pith-pipeline@v0.9.1-grok · 5824 in / 1332 out tokens · 30915 ms · 2026-06-28T22:27:09.526776+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War
cs.AI 2026-06 unverdicted novelty 6.0

Introduces Age of LLM benchmark pitting LLMs in a 13x7 grid game with fog of war, diplomacy, and JSON reliability constraints, reporting nuclear rush dominance in 54 matches and a weak reliability-win link.

Reference graph

Works this paper leans on

20 extracted references · 10 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

How Value Induction Reshapes LLM Behaviour

URLhttps://arxiv.org/abs/2605.07925. Z. Ben-Zion, K. Witte, A. K. Jagadish, O. Duek, I. Harpaz-Rotem, M.-C. Khorsandian, A. Burrer, E. Seifritz, P. Homan, E. Schulz, and T. R. Spiller. Assessing and alleviating state anxiety in large language models.npj Digital Medicine, 8, 2025. J. Betley, X. Bao, M. Soto, A. Sztyber-Betley, J. Chua, and O. Evans. Tell m...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf00413692 2025
[2]

doi: 10.1007/s11023-018-9474-5. C. Duffy. Parents of 16-year-old sue OpenAI, claiming ChatGPT advised on his suicide,

work page doi:10.1007/s11023-018-9474-5
[3]

i’m not sure, but

URL https://www.cnn.com/2025/08/26/tech/openai-chatgpt-teen- suicide-lawsuit. P. Duhem.The Aim and Structure of Physical Theory. Princeton University Press, 1982. ISBN 9780691079059. URLhttp://www.jstor.org/stable/j.ctv1nj34vm. B. V . Eckhardt. Connectionism and the propositional attitudes. In D. M. Johnson and C. E. Erneling, editors,The Mind As a Scient...

work page doi:10.1016/0010-0277(88)90031-5 2025
[4]

doi: 10.1080/106351599260247. W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.The Bulletin of Mathematical Biophysics, 5:115–133, 1943. doi: 10.1007/BF02478259. D. McDermott. Artificial intelligence meets natural stupidity.SIGART Bull., (57):4–9, Apr. 1976. ISSN 0163-5719. doi: 10 .1145/1045339.1045340. URL https...

work page doi:10.1080/106351599260247 1943
[5]

URL https://proceedings.neurips.cc/paper_files/paper/2024/ file/ca9567d8ef6b2ea2da0d7eed57b933ee-Paper-Conference.pdf. G. Piccinini.Physical Computation: A Mechanistic Account. Oxford University Press UK, Oxford, GB, 2015. A. Placani. Anthropomorphism in AI: hype and fallacy.AI and Ethics, 4, 2024. Politifact. The 50 largest cities in the United States, 2...

work page doi:10.2307/2266637 2024
[6]

doi: https://doi .org/10.1016/S0304-3975(96)00077-1

ISSN 0304-3975. doi: https://doi .org/10.1016/S0304-3975(96)00077-1. URL https: //www.sciencedirect.com/science/article/pii/S0304397596000771. F. Rosenblatt. The perceptron: A perceiving and recognizing automaton. Technical Report 85-460-1, Cornell Aeronautical Laboratory, 1957. F. Rosenblatt. The perceptron: A probabilistic model for information storage ...

work page doi:10.1016/s0304-3975(96)00077-1 1957
[7]

URLhttps://openreview.net/forum?id=aY9TAAQuFD. S. Shen, F. Jiang, P. Wang, Y . Feng, Y . Jiang, and C. Liu. Do LLMs know and understand domain conceptual knowledge? In C. Christodoulopoulos, T. Chakraborty, C. Rose, and V . Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5967–5976, Suzhou, China, Nov. 2025. Asso...

work page doi:10.1145/3571884.3597144 2025
[8]

URL https://www.brusselstimes.com/430098/belgian-man-commits- suicide-following-exchanges-with-chatgpt. N. Weinberger and C. Allen. Static-dynamic hybridity in dynamical models of cognition.Philosophy of Science, 89(2):283–301, 2022. doi: 10.1017/psa.2021.27. J. Weizenbaum.Computer Power and Human Reason: From Judgment to Calculation. W H Freeman & Co, 19...

work page doi:10.1017/psa.2021.27 2022
[9]

doi: https://doi .org/10.1016/j.chbah.2024.100072

ISSN 2949-8821. doi: https://doi .org/10.1016/j.chbah.2024.100072. URL https: //www.sciencedirect.com/science/article/pii/S294988212400032X. J. Woodward and L. Ross. 20th Century Theories of Scientific Explanation. In E. N. Zalta and U. Nodelman, editors,The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Winter 2025 ed...

work page doi:10.1016/j.chbah.2024.100072 2024
[10]

doi: 10 .18653/v1/2023.findings-acl.551

Association for Computational Linguistics. doi: 10 .18653/v1/2023.findings-acl.551. URL https://aclanthology.org/2023.findings-acl.551/. T. Yu, S. Pan, C. Fan, S. Luo, Y . Jin, and B. Zhao. Can large language models exhibit cognitive and affective empathy as humans?Computers in Human Behavior: Artificial Humans, 6:100233,

2023
[11]

doi: https://doi .org/10.1016/j.chbah.2025.100233

ISSN 2949-8821. doi: https://doi .org/10.1016/j.chbah.2025.100233. URL https:// www.sciencedirect.com/science/article/pii/S2949882125001173. Y . Yuan, M. Su, and X. Li. What makes people say thanks to AI. In H. Degen and S. Ntoa, editors, Artificial Intelligence in HCI, pages 131–149, Cham, 2024. Springer Nature Switzerland. ISBN 978-3-031-60606-9. J. Zho...

work page doi:10.1016/j.chbah.2025.100233 2025
[12]

Here we also count technical reports and benchmarks as long as they explicitly evaluate agentic workflows/LLMs

article: a scientific article, but not a book chapter, survey, or opinion piece (e.g., these with ‘Opinion’ in the title). Here we also count technical reports and benchmarks as long as they explicitly evaluate agentic workflows/LLMs
[13]

survey: a review of the field
[14]

Broadly, any text without a clear contribution and/or scientific rigour

book: a textbook, book chapter, how-to guide, or opinion piece. Broadly, any text without a clear contribution and/or scientific rigour
[15]

This could be, for example, a text describing a codebase supporting workflows

platform: a scientific article where an agentic workflow or LLM is NOT being evaluated. This could be, for example, a text describing a codebase supporting workflows. Additionally, I’d like you to tell me if an LLM/agentic workflow is the centre of study: - ‘llm_is_central’: whether an LLM or LLM-powered agentic workflow is the central object of study: 1 ...
[16]

0 if it didn’t, 1 if it did

human_like_assumptions: whether the paper assumed or attributed human-like attributes to the LLM or LLM-powered agentic workflow _except_ in the conclusion. 0 if it didn’t, 1 if it did
[17]

0 if it doesn’t, 1 if it did

human_like_study: whether the paper central study is human-like attributes in LLMs or LLM- powered agentic workflow. 0 if it doesn’t, 1 if it did
[18]

This must be 1 if there is _at least_ some part of the conclusion indicating this, and only 0 if they don’t present them at all

human_like_conclusion: whether the paper concludes that the LLM or LLM-powered agentic workflows have human-like attributes. This must be 1 if there is _at least_ some part of the conclusion indicating this, and only 0 if they don’t present them at all. 0 if it didn’t, 1 if it did
[19]

0 if it doesn’t, 1 if it does

emergent_assumptions: whether the paper assumes that these human-like attributes (in either the assumptions or the conclusion) are emergent, and not product of (say) training or memorisation. 0 if it doesn’t, 1 if it does
[20]

A few notes: - By ‘human-like attributes’ we mean various aspect of human cognition and behaviour, as well as those of general intelligence

which_ones: a list of which emergent properties are being assumed in the work. A few notes: - By ‘human-like attributes’ we mean various aspect of human cognition and behaviour, as well as those of general intelligence. Here are some examples (but not the only ones!): - Cooperation - Empathy - Emotions (anxiety, happiness, anger, etc.) - Deceit - Values, ...

[1] [1]

How Value Induction Reshapes LLM Behaviour

URLhttps://arxiv.org/abs/2605.07925. Z. Ben-Zion, K. Witte, A. K. Jagadish, O. Duek, I. Harpaz-Rotem, M.-C. Khorsandian, A. Burrer, E. Seifritz, P. Homan, E. Schulz, and T. R. Spiller. Assessing and alleviating state anxiety in large language models.npj Digital Medicine, 8, 2025. J. Betley, X. Bao, M. Soto, A. Sztyber-Betley, J. Chua, and O. Evans. Tell m...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf00413692 2025

[2] [2]

doi: 10.1007/s11023-018-9474-5. C. Duffy. Parents of 16-year-old sue OpenAI, claiming ChatGPT advised on his suicide,

work page doi:10.1007/s11023-018-9474-5

[3] [3]

i’m not sure, but

URL https://www.cnn.com/2025/08/26/tech/openai-chatgpt-teen- suicide-lawsuit. P. Duhem.The Aim and Structure of Physical Theory. Princeton University Press, 1982. ISBN 9780691079059. URLhttp://www.jstor.org/stable/j.ctv1nj34vm. B. V . Eckhardt. Connectionism and the propositional attitudes. In D. M. Johnson and C. E. Erneling, editors,The Mind As a Scient...

work page doi:10.1016/0010-0277(88)90031-5 2025

[4] [4]

doi: 10.1080/106351599260247. W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.The Bulletin of Mathematical Biophysics, 5:115–133, 1943. doi: 10.1007/BF02478259. D. McDermott. Artificial intelligence meets natural stupidity.SIGART Bull., (57):4–9, Apr. 1976. ISSN 0163-5719. doi: 10 .1145/1045339.1045340. URL https...

work page doi:10.1080/106351599260247 1943

[5] [5]

URL https://proceedings.neurips.cc/paper_files/paper/2024/ file/ca9567d8ef6b2ea2da0d7eed57b933ee-Paper-Conference.pdf. G. Piccinini.Physical Computation: A Mechanistic Account. Oxford University Press UK, Oxford, GB, 2015. A. Placani. Anthropomorphism in AI: hype and fallacy.AI and Ethics, 4, 2024. Politifact. The 50 largest cities in the United States, 2...

work page doi:10.2307/2266637 2024

[6] [6]

doi: https://doi .org/10.1016/S0304-3975(96)00077-1

ISSN 0304-3975. doi: https://doi .org/10.1016/S0304-3975(96)00077-1. URL https: //www.sciencedirect.com/science/article/pii/S0304397596000771. F. Rosenblatt. The perceptron: A perceiving and recognizing automaton. Technical Report 85-460-1, Cornell Aeronautical Laboratory, 1957. F. Rosenblatt. The perceptron: A probabilistic model for information storage ...

work page doi:10.1016/s0304-3975(96)00077-1 1957

[7] [7]

URLhttps://openreview.net/forum?id=aY9TAAQuFD. S. Shen, F. Jiang, P. Wang, Y . Feng, Y . Jiang, and C. Liu. Do LLMs know and understand domain conceptual knowledge? In C. Christodoulopoulos, T. Chakraborty, C. Rose, and V . Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5967–5976, Suzhou, China, Nov. 2025. Asso...

work page doi:10.1145/3571884.3597144 2025

[8] [8]

URL https://www.brusselstimes.com/430098/belgian-man-commits- suicide-following-exchanges-with-chatgpt. N. Weinberger and C. Allen. Static-dynamic hybridity in dynamical models of cognition.Philosophy of Science, 89(2):283–301, 2022. doi: 10.1017/psa.2021.27. J. Weizenbaum.Computer Power and Human Reason: From Judgment to Calculation. W H Freeman & Co, 19...

work page doi:10.1017/psa.2021.27 2022

[9] [9]

doi: https://doi .org/10.1016/j.chbah.2024.100072

ISSN 2949-8821. doi: https://doi .org/10.1016/j.chbah.2024.100072. URL https: //www.sciencedirect.com/science/article/pii/S294988212400032X. J. Woodward and L. Ross. 20th Century Theories of Scientific Explanation. In E. N. Zalta and U. Nodelman, editors,The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Winter 2025 ed...

work page doi:10.1016/j.chbah.2024.100072 2024

[10] [10]

doi: 10 .18653/v1/2023.findings-acl.551

Association for Computational Linguistics. doi: 10 .18653/v1/2023.findings-acl.551. URL https://aclanthology.org/2023.findings-acl.551/. T. Yu, S. Pan, C. Fan, S. Luo, Y . Jin, and B. Zhao. Can large language models exhibit cognitive and affective empathy as humans?Computers in Human Behavior: Artificial Humans, 6:100233,

2023

[11] [11]

doi: https://doi .org/10.1016/j.chbah.2025.100233

ISSN 2949-8821. doi: https://doi .org/10.1016/j.chbah.2025.100233. URL https:// www.sciencedirect.com/science/article/pii/S2949882125001173. Y . Yuan, M. Su, and X. Li. What makes people say thanks to AI. In H. Degen and S. Ntoa, editors, Artificial Intelligence in HCI, pages 131–149, Cham, 2024. Springer Nature Switzerland. ISBN 978-3-031-60606-9. J. Zho...

work page doi:10.1016/j.chbah.2025.100233 2025

[12] [12]

Here we also count technical reports and benchmarks as long as they explicitly evaluate agentic workflows/LLMs

article: a scientific article, but not a book chapter, survey, or opinion piece (e.g., these with ‘Opinion’ in the title). Here we also count technical reports and benchmarks as long as they explicitly evaluate agentic workflows/LLMs

[13] [13]

survey: a review of the field

[14] [14]

Broadly, any text without a clear contribution and/or scientific rigour

book: a textbook, book chapter, how-to guide, or opinion piece. Broadly, any text without a clear contribution and/or scientific rigour

[15] [15]

This could be, for example, a text describing a codebase supporting workflows

platform: a scientific article where an agentic workflow or LLM is NOT being evaluated. This could be, for example, a text describing a codebase supporting workflows. Additionally, I’d like you to tell me if an LLM/agentic workflow is the centre of study: - ‘llm_is_central’: whether an LLM or LLM-powered agentic workflow is the central object of study: 1 ...

[16] [16]

0 if it didn’t, 1 if it did

human_like_assumptions: whether the paper assumed or attributed human-like attributes to the LLM or LLM-powered agentic workflow _except_ in the conclusion. 0 if it didn’t, 1 if it did

[17] [17]

0 if it doesn’t, 1 if it did

human_like_study: whether the paper central study is human-like attributes in LLMs or LLM- powered agentic workflow. 0 if it doesn’t, 1 if it did

[18] [18]

This must be 1 if there is _at least_ some part of the conclusion indicating this, and only 0 if they don’t present them at all

human_like_conclusion: whether the paper concludes that the LLM or LLM-powered agentic workflows have human-like attributes. This must be 1 if there is _at least_ some part of the conclusion indicating this, and only 0 if they don’t present them at all. 0 if it didn’t, 1 if it did

[19] [19]

0 if it doesn’t, 1 if it does

emergent_assumptions: whether the paper assumes that these human-like attributes (in either the assumptions or the conclusion) are emergent, and not product of (say) training or memorisation. 0 if it doesn’t, 1 if it does

[20] [20]

A few notes: - By ‘human-like attributes’ we mean various aspect of human cognition and behaviour, as well as those of general intelligence

which_ones: a list of which emergent properties are being assumed in the work. A few notes: - By ‘human-like attributes’ we mean various aspect of human cognition and behaviour, as well as those of general intelligence. Here are some examples (but not the only ones!): - Cooperation - Empathy - Emotions (anxiety, happiness, anger, etc.) - Deceit - Values, ...