arxiv: 2604.22764 · v1 · submitted 2026-03-23 · 💻 cs.CY · cs.AI· cs.IR

Recognition: no theorem link

Implicit Humanization in Everyday LLM Moral Judgments

Hoda Ayad , Tanu Mitra

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:27 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.IR

keywords LLManthropomorphismmoral judgmenthumanizationoverrelianceAI trustconversational AIAI ethics

0 comments

The pith

LLM answers to moral judgment requests in social conflicts reinforce assumptions that the AI thinks and acts like a human.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Users sometimes ask large language models to decide who was wrong in a personal dispute. The paper treats these queries as implicitly humanizing because they project human-like moral reasoning onto the model. It built a dataset of such simulated queries and examined responses from four major general-purpose LLMs for linguistic, behavioral, and cognitive cues that treat the model as human. The responses consistently reinforced rather than corrected those cues. If the pattern holds, everyday use of LLMs for personal advice could increase overreliance and misplaced trust in the systems' actual capabilities.

Core claim

Requests for moral judgments on social conflicts are implicitly humanizing queries that carry anthropomorphic projections. Examination of four major LLMs shows their responses reinforce linguistic, behavioral, and cognitive anthropomorphic cues instead of correcting them, which may heighten risks of overreliance or misplaced trust.

What carries the argument

Measurement of reinforcement via linguistic, behavioral, and cognitive anthropomorphic cues in LLM responses to a new simulated dataset of moral judgment queries.

If this is right

LLM responses may increase overreliance when users seek personal moral advice.
Reinforcement of human-like cues can produce misplaced trust in model capabilities.
System designs should correct misaligned expectations rather than reinforce them.
Future research must expand the concept of anthropomorphism to include implicit user-side projections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reinforcement pattern could appear in other advice-seeking queries that involve emotional or relational content.
Repeated exposure might strengthen users' tendency to form attachments to conversational AI beyond moral topics.
Simple response interventions such as explicit capability disclaimers could be tested to reduce the reinforcement effect.

Load-bearing premise

The selected anthropomorphic cues validly capture implicit humanization and the simulated queries accurately stand in for real user requests without introducing bias.

What would settle it

A controlled user study in which participants interact with LLMs on moral judgment tasks and report no greater perceived human-likeness or trust than in a control condition using neutral queries.

Figures

Figures reproduced from arXiv: 2604.22764 by Hoda Ayad, Tanu Mitra.

**Figure 1.** Figure 1: Humanizing Linguistic Cues across LLM responses. All scores are positive, indicating human-leaning response style for all models. GPT 4.1 Mini scored highest on style cues while Gemini 2.5 Flash used first person most. Errors calculated at a 95% CI [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Behavioral Cues measured by level of compliance with the moral judgment task. No models gave explicit refusals. Gemini 2.5 Flash is the only model with more full compliance than partial. which seem friendlier or more human-like (e.g. “It sounds like your feelings are completely valid”), it is much less likely to do so by inserting itself as an actor in conversation with statements referring to itself (e.g… view at source ↗

read the original abstract

Recent adoption of conversational information systems has expanded the scope of user queries to include complex tasks such as personal advice-seeking. However, we identify a specific type of sought advice-a request for a moral judgment (i.e. "who was wrong?") in a social conflict-as an implicitly humanizing query which carries potentially harmful anthropomorphic projections. In this study, we examine the reinforcement of these assumptions in the responses of four major general-purpose LLMs through the use of linguistic, behavioral, and cognitive anthropomorphic cues. We also contribute a novel dataset of simulated user queries for moral judgments. We find current LLM system responses reinforce implicit humanization in queries, potentially exacerbating risks like overreliance or misplaced trust. We call for future work to expand the understanding of anthropomorphism to include implicit userside humanization and to design solutions that address user needs while correcting misaligned expectations of model capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Moral judgment queries lead LLMs to reinforce implicit humanization according to a new simulated dataset, but the cue definitions and dataset construction lack the detail needed to assess the claim.

read the letter

The main point is that requests for moral judgments on social conflicts function as implicitly humanizing queries, and the four LLMs tested tend to respond in ways that reinforce that framing through linguistic, behavioral, and cognitive cues. The authors also release a dataset of simulated queries to support this observation. That combination is the concrete addition here. The work does a reasonable job isolating this query type from more explicit anthropomorphism cases and connecting it to downstream risks like overreliance or misplaced trust in everyday advice-seeking. The dataset itself could serve as a starting point for others who want to run controlled tests on similar patterns. The soft spots sit mainly in the measurement choices. The abstract supplies no definitions for the chosen cues, no rationale for why those particular linguistic or cognitive markers were selected over others, and no checks on inter-rater reliability or cue overlap with generic helpfulness. The simulated query set also receives no description of its generation protocol, so it is unclear whether the phrasing or selection introduces bias that would not appear in real user logs. Without those elements the reported reinforcement could be an artifact of the method rather than a stable property of the models. This paper is aimed at researchers working on AI ethics and human-AI interaction who already track anthropomorphism issues. A reader focused on practical deployment risks would find the framing useful for thinking about query design, though the current evidence is too thin to treat the findings as settled. I would send it to peer review. The topic is relevant, the dataset is a tangible output, and the authors have identified a gap worth exploring, but any serious referee would need to see expanded methods, validation steps, and example outputs before the conclusions can be evaluated properly.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an empirical study claiming that user queries requesting moral judgments on social conflicts implicitly humanize LLMs by projecting anthropomorphic assumptions. By examining responses from four major general-purpose LLMs to a novel simulated dataset of such queries, using linguistic, behavioral, and cognitive cues, the authors find that LLM responses reinforce these implicit humanizations, which may exacerbate risks like overreliance or misplaced trust. The paper calls for future work on implicit user-side humanization and design solutions.

Significance. Should the measurement approach prove robust, the paper's significance lies in shifting focus from explicit anthropomorphism in AI design to implicit humanization induced by query types in everyday use. The contribution of a simulated dataset supports reproducibility in this emerging area of AI ethics research. It provides a foundation for developing mitigations against misplaced trust in LLMs for moral advice.

major comments (3)

[Methods] Methods section: The definitions, selection criteria, and operationalization of the linguistic, behavioral, and cognitive anthropomorphic cues are not provided (no examples of coded responses or validation against human judgments). This is load-bearing for the central claim, as it is unclear whether observed patterns reflect humanization reinforcement or generic conversational style.
[Dataset] Dataset section: The generation protocol, sample size, and bias controls for the simulated moral judgment queries are not described. This directly affects whether the dataset represents real user queries, undermining generalization of the reinforcement finding.
[Results] Results section: No quantitative metrics (e.g., cue frequencies, statistical tests, or inter-rater reliability) are reported to support the conclusion that responses 'reinforce implicit humanization'.

minor comments (2)

[Abstract] Abstract: Naming the four specific LLMs and reporting the dataset size would provide necessary context without lengthening the paragraph.
[Introduction] Introduction: The distinction between implicit user-side humanization and explicit model anthropomorphism could be illustrated with one concrete query example for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving clarity and rigor. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our methods, dataset, and results.

read point-by-point responses

Referee: [Methods] Methods section: The definitions, selection criteria, and operationalization of the linguistic, behavioral, and cognitive anthropomorphic cues are not provided (no examples of coded responses or validation against human judgments). This is load-bearing for the central claim, as it is unclear whether observed patterns reflect humanization reinforcement or generic conversational style.

Authors: We agree that expanded detail is essential. In the revised manuscript, we will add precise definitions for each cue category drawn from the anthropomorphism literature, along with explicit selection criteria and operationalization rules. We will include multiple concrete examples of coded LLM responses for each cue type. We will also add a limitations discussion noting the absence of formal human validation in the current study and propose such validation as valuable future work. These changes will better distinguish the targeted humanization patterns from generic conversational features. revision: yes
Referee: [Dataset] Dataset section: The generation protocol, sample size, and bias controls for the simulated moral judgment queries are not described. This directly affects whether the dataset represents real user queries, undermining generalization of the reinforcement finding.

Authors: We accept this critique and will substantially expand the Dataset section. The revision will describe the full generation protocol, including how queries were constructed to emulate real-world moral judgment requests on social conflicts. We will report the exact sample size and detail bias controls such as scenario diversity, phrasing variation, and demographic balance. These additions will directly support claims about the dataset's representativeness and the generalizability of the reinforcement findings. revision: yes
Referee: [Results] Results section: No quantitative metrics (e.g., cue frequencies, statistical tests, or inter-rater reliability) are reported to support the conclusion that responses 'reinforce implicit humanization'.

Authors: We agree that quantitative support would strengthen the results. In the revised manuscript, we will report cue frequencies across the four LLMs, include appropriate statistical tests to demonstrate reinforcement patterns, and describe any consistency measures used during coding. We will also clarify the primarily qualitative nature of the original analysis while adding these metrics to provide more robust empirical grounding for the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical observation

full rationale

The paper conducts an empirical analysis of LLM outputs against a set of predefined linguistic, behavioral, and cognitive anthropomorphic cues applied to a contributed simulated dataset of moral-judgment queries. No equations, derivations, parameter fitting, or self-referential definitions appear in the described chain; the central finding is framed as direct observation of cue presence in model responses rather than any prediction or result that reduces to the inputs by construction. Self-citations, if present, are not load-bearing for the core claim, and the study does not invoke uniqueness theorems or smuggle ansatzes. This is a standard non-circular empirical setup.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that moral judgment queries carry implicit humanizing projections and that the selected cues reliably detect reinforcement of those projections in LLM outputs. No free parameters or invented physical entities are involved.

axioms (2)

domain assumption Requests for moral judgments in social conflicts constitute implicitly humanizing queries that project human-like qualities onto LLMs
This identification is the foundational premise stated in the abstract.
domain assumption Linguistic, behavioral, and cognitive anthropomorphic cues can be used to measure reinforcement of humanization in model responses
The study relies on these cues without detailing validation in the abstract.

invented entities (1)

implicit humanization no independent evidence
purpose: To name the hidden anthropomorphic projections carried by moral judgment queries
Introduced as a specific type of user-side assumption in the abstract.

pith-pipeline@v0.9.0 · 5446 in / 1379 out tokens · 44472 ms · 2026-05-15T01:27:27.744661+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 1 internal anchor

[1]

Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, and Zeerak Talat. 2023. Mirages: On Anthropomorphism in Dialogue Systems. doi:10. 48550/arXiv.2305.09800 arXiv:2305.09800 [cs]

work page arXiv 2023
[2]

Nader Aboul-Fettouh, Kevin P Lee, Natalie Kash, Kathleen Kroger, and Sirunya Silapunt. 2023. Social Media and Dermatology During the COVID-19 Pandemic: Analyzing User-Submitted Posts Seeking Dermatologic Advice on Reddit. Cureus (Jan. 2023). doi:10.7759/cureus.33720

work page doi:10.7759/cureus.33720 2023
[3]

Canfer Akbulut, Laura Weidinger, Arianna Manzini, Iason Gabriel, and Verena Rieser. 2024. All too human? Mapping and mitigating the risk from anthropo- morphic AI. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , Vol. 7. 13–26. Issue: 1

work page 2024
[4]

Birkun and Adhish Gautam

Alexei A. Birkun and Adhish Gautam. 2023. Large Language Model (LLM)- Powered Chatbots Fail to Generate Guideline-Consistent Content on Resusci- tation and May Provide Potentially Harmful Advice. Prehospital and Disaster Medicine 38, 6 (Dec. 2023), 757–763. doi:10.1017/S1049023X23006568

work page doi:10.1017/s1049023x23006568 2023
[5]

Smith, Yejin Choi, and Hannaneh Hajishirzi

Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, and Hannaneh Hajishirzi

work page
[6]

doi:10.48550/arXiv.2407.12043 arXiv:2407.12043 [cs]

The Art of Saying No: Contextual Noncompliance in Language Models. doi:10.48550/arXiv.2407.12043 arXiv:2407.12043 [cs]

work page doi:10.48550/arxiv.2407.12043
[7]

Liora Braunstain, Oren Kurland, David Carmel, Idan Szpektor, and Anna Shtok

work page
[8]

Supporting Human Answers for Advice-Seeking Questions in CQA Sites. In Advances in Information Retrieval, Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hauff, and Gianmaria Silvello (Eds.). Vol. 9626. Springer International Publishing, Cham, 129–141. doi:10.1007/978-3-319-30671-1_10...

work page doi:10.1007/978-3-319-30671-1_10
[9]

Viv Burr. 2015. Social Constructionism. In International Encyclopedia of the Social & Behavioral Sciences. Elsevier, 222–227. doi:10.1016/B978-0-08-097086-8.24049- X

work page doi:10.1016/b978-0-08-097086-8.24049- 2015
[10]

Emily Cannon, Bianca Crouse, Souvick Ghosh, Nicholas Rihn, and Kristen Chua

work page
[11]

Don’t Downvote A\$\$\$\$\$\$s!!

"Don’t Downvote A\$\$\$\$\$\$s!!": An Exploration of Reddit’s Advice Communities. doi:10.24251/HICSS.2022.363

work page doi:10.24251/hicss.2022.363 2022
[12]

Aaron Chatterji, Tom Cunningham, David Deming, Zoe Hitzig, Christopher Ong, Carl Shan, and Kevin Wadman. [n. d.]. How People Use ChatGPT. ([n. d.])

work page
[13]

Jiahao Chen, Fu Guo, Zenggen Ren, Mingming Li, and Jaap Ham. 2024. Effects of Anthropomorphic Design Cues of Chatbots on Users’ Perception and Visual Behaviors. International Journal of Human–Computer Interaction 40, 14 (July 2024), 3636–3654. doi:10.1080/10447318.2023.2193514

work page doi:10.1080/10447318.2023.2193514 2024
[14]

I Am 30F and Need Advice!

Yixin Chen, Scott Hale, and Bernie Hogan. 2024. “I Am 30F and Need Advice!”: A Mixed-Method Analysis of the Effects of Advice-Seekers’ Self-Disclosure on Received Replies. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 276–288. doi:10.1609/icwsm.v18i1.31313

work page doi:10.1609/icwsm.v18i1.31313 2024
[15]

Myra Cheng, Sunny Yu, and Dan Jurafsky. 2025. HumT DumT: Measuring and controlling human-like language in LLMs. doi:10.48550/arXiv.2502.13259 arXiv:2502.13259 [cs]

work page doi:10.48550/arxiv.2502.13259 2025
[16]

Dalal and Silvia Bonaccio

Reeshad S. Dalal and Silvia Bonaccio. 2010. What types of advice do decision- makers prefer? Organizational Behavior and Human Decision Processes 112, 1 (May 2010), 11–23. doi:10.1016/j.obhdp.2009.11.007

work page doi:10.1016/j.obhdp.2009.11.007 2010
[17]

Danish, Yogesh Dahiya, and Partha Talukdar. 2021. Discovering Response- Eliciting Factors in Social Question Answering : A Reddit Inspired Study. Pro- ceedings of the International AAAI Conference on Web and Social Media 10, 1 (Aug. 2021), 82–91. doi:10.1609/icwsm.v10i1.14752

work page doi:10.1609/icwsm.v10i1.14752 2021
[18]

Joanna Demaree-Cotton and Guy Kahane. 2025. Moral dilemmas. (2025). Pub- lisher: Cambridge University Press

work page 2025
[19]

Eccles and Güler Arsal

David W. Eccles and Güler Arsal. 2017. The think aloud method: what is it and how do I use it? Qualitative Research in Sport, Exercise and Health 9, 4 (Aug. 2017), 514–531. doi:10.1080/2159676X.2017.1331501

work page doi:10.1080/2159676x.2017.1331501 2017
[20]

Chufan Gao, Sanjit Singh Batra, Alexander Russell Pelletier, Gregory D Lyng, Zhichao Yang, Eran Halperin, and Robert E Tillman. [n. d.]. BioSynNER: Synthetic Data for Biomedical Named Entity Recognition. In Workshop on Large Language Models and Generative AI for Health at AAAI 2025

work page 2025
[21]

Yujia Gao, Wenna Qin, Aniruddha Murali, Christopher Eckart, Xuhui Zhou, Jacob Daniel Beel, Yi-Chia Wang, and Diyi Yang. 2024. A Crisis of Civility? Modeling Incivility and Its Effects in Political Discourse Online. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 408–421. doi:10.1609/icwsm.v18i1.31323

work page doi:10.1609/icwsm.v18i1.31323 2024
[22]

Schweitzer

Alexis Gordon and Maurice E. Schweitzer. 2025. Motivated Advice Seeking: Are Advice Seekers Trying to Be More Accurate? doi:10.2139/ssrn.5254795

work page doi:10.2139/ssrn.5254795 2025
[23]

H. P. (Herbert Paul) Grice. 1989. Studies in the way of words . Cam- bridge, Mass. ; London : Harvard University Press. http://archive.org/details/ studiesinwayofwo00gric

work page 1989
[24]

Quzhe Huang, Mingxu Tao, Chen Zhang, Zhenwei An, Cong Jiang, Zhibin Chen, Zirui Wu, and Yansong Feng. 2023. Lawyer LLaMA Technical Report. doi:10. 48550/arXiv.2305.15062 arXiv:2305.15062 [cs]

work page arXiv 2023
[25]

Matin, Gorav N

Bright Huo, Amy Boyle, Nana Marfo, Wimonchat Tangamornsuksan, Jeremy P. Steen, Tyler McKechnie, Yung Lee, Julio Mayol, Stavros A. Antoniou, Arun James Thirunavukarasu, Stephanie Sanger, Karim Ramji, and Gordon Guyatt. 2025. Large Language Models for Chatbot Health Advice Studies: A Systematic Review. JAMA Network Open 8, 2 (Feb. 2025), e2457879. doi:10.10...

work page doi:10.1001/jamanetworkopen 2025
[26]

Romy Menghao Jia, Jia Tina Du, and Yuxiang Chris Zhao. 2021. Needs for Relatedness: LGBTQ+ Individuals’ Information Seeking and Sharing in an Online Community. In Proceedings of the 2021 Conference on Human In- formation Interaction and Retrieval . ACM, Canberra ACT Australia, 291–294. doi:10.1145/3406522.3446040

work page doi:10.1145/3406522.3446040 2021
[27]

Qiaoling Liu, Eugene Agichtein, Gideon Dror, Yoelle Maarek, and Idan Szpektor

work page
[28]

In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

When web search fails, searchers become askers: understanding the transi- tion. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval . ACM, Portland Oregon USA, 801–810. doi:10.1145/2348283.2348390

work page doi:10.1145/2348283.2348390
[29]

Steven Lukes. 2010. The Social Construction of Morality? In Handbook of the Sociology of Morality , Steven Hitlin and Stephen Vaisey (Eds.). Springer New York, New York, NY, 549–560. doi:10.1007/978-1-4419-6896-8_29 Series Title: Handbooks of Sociology and Social Research

work page doi:10.1007/978-1-4419-6896-8_29 2010
[30]

Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s al- gorithm, governance, and culture support toxic technocultures. New Media & Society 19, 3 (March 2017), 329–346. doi:10.1177/1461444815608807

work page doi:10.1177/1461444815608807 2017
[31]

Miles McCain, Ryn Linthicum, Chloe Lubinski, Alex Tamkin, Saffron Huang, Michael Stern, Kunal Handa, Esin Durmus, Tyler Neylon, Stuart Ritchie, Kamya Jagadish, Paruul Maheshwary, Sarah Heck, Alexandra Sanderford, and Deep Ganguli. 2025. How People Use Claude for Support, Advice, and Companion- ship. https://www.anthropic.com/news/how-people-use-claude-for...

work page 2025
[32]

McGannon and John C

Kerry R. McGannon and John C. Spence. 2010. Speaking of the self and under- standing physical activity participation: what discursive psychology can tell us about an old problem. Qualitative Research in Sport and Exercise 2, 1 (March 2010), 17–38. doi:10.1080/19398440903510145

work page doi:10.1080/19398440903510145 2010
[33]

Alexandra Olteanu, Solon Barocas, Su Lin Blodgett, Lisa Egede, Alicia DeVrio, and Myra Cheng. 2025. AI Automatons: AI Systems Intended to Imitate Humans. doi:10.48550/arXiv.2503.02250 arXiv:2503.02250 [cs]

work page doi:10.48550/arxiv.2503.02250 2025
[34]

Pennebaker, Ryan L

James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015. The Development and Psychometric Properties of LIWC2015. (2015). doi:10. 15781/T29G6Z Publisher: University of Texas at Austin

work page 2015
[35]

Sandra Peter, Kai Riemer, and Jevin D. West. 2025. The benefits and dangers of anthropomorphic conversational agents. Proceedings of the National Academy of Sciences 122, 22 (June 2025), e2415898122. doi:10.1073/pnas.2415898122

work page doi:10.1073/pnas.2415898122 2025
[36]

Liu, Valdemar Danry, Eunhae Lee, Samantha W

Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal, Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, and Pattie Maes. 2025. Investigating Affective Use and Emotional Well-being on ChatGPT. doi:10.48550/arXiv.2504.03888 arXiv:2504.03888 [cs]

work page doi:10.48550/arxiv.2504.03888 2025
[37]

Barbara Plank. 2022. The ’Problem’ of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation. doi:10.48550/arXiv.2211.02570 arXiv:2211.02570 [cs]

work page doi:10.48550/arxiv.2211.02570 2022
[38]

Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst

Inioluwa Deborah Raji, I. Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst

work page
[39]

InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22)

The Fallacy of AI Functionality. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 959–972. doi:10.1145/3531146.3533158

work page doi:10.1145/3531146.3533158 2022
[40]

Joseph Reagle. 2025. A history of the advice genre on Reddit: Evolutionary paths and sibling rivalries. First Monday (Feb. 2025). doi:10.5210/fm.v30i2.13729

work page doi:10.5210/fm.v30i2.13729 2025
[41]

Reif, Richard P

Jessica A. Reif, Richard P. Larrick, and Jack B. Soll. 2024. The inclusion of anchors when seeking advice: Causes and consequences. Organizational Behavior and Human Decision Processes 185 (Nov. 2024), 104378. doi:10.1016/j.obhdp.2024. 104378

work page doi:10.1016/j.obhdp.2024 2024
[42]

Michael B Robb. 2025. Talk, Trust and Trade-Offs: How and Why Teens Use AI Companions. Common Sense Media (2025)

work page 2025
[43]

Pratik Sachdeva and Tom Van Nuenen. 2025. Normative Evaluation of Large Language Models with Everyday Moral Dilemmas. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency . ACM, Athens Greece, 690–709. doi:10.1145/3715275.3732044

work page doi:10.1145/3715275.3732044 2025
[44]

Chelsea Schein. 2020. The Importance of Context in Moral Judgments. Per- spectives on Psychological Science 15, 2 (March 2020), 207–215. doi:10.1177/ 1745691620904083 Publisher: SAGE Publications Inc

work page 2020
[45]

Eike Schneiders, Tina Seabrooke, Joshua Krook, Richard Hyde, Natalie Leesakul, Jeremie Clos, and Joel E Fischer. 2025. Objection Overruled! Lay People can Distinguish Large Language Models from Lawyers, but still Favour Advice from an LLM. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). ACM, 1–14. doi:10.1145/370...

work page doi:10.1145/3706598.3713470 2025
[46]

Walter Sinnott-Armstrong. 1988. Moral Dilemmas. Blackwell, New York, NY, USA

work page 1988
[47]

Stalnaker

Robert C. Stalnaker. 1999. Context and Content: Essays on Intentionality in Speech and Thought (1 ed.). Oxford University PressOxford. doi:10.1093/0198237073.001. 0001 CHIIR ’26, March 22–26, 2026, Seattle, WA, USA Hoda Ayad and Tanu Mitra

work page doi:10.1093/0198237073.001 1999
[48]

Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas W Mayer, and Padhraic Smyth. 2025. What large language models know and what people think they know. Nature Machine Intelligence 7, 2 (2025), 221–231. Publisher: Nature Publishing Group UK London

work page 2025
[49]

Zeerak Talat, Hagen Blix, Josef Valvoda, Maya Indira Ganesh, Ryan Cotterell, and Adina Williams. 2022. On the Machine Learning of Ethical Judgments from Natu- ral Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Marine Carpuat, Marie-Catherine de M...

work page doi:10.18653/v1/2022.naacl-main.56 2022
[50]

Wondimagegnhue Tsegaye Tufa, Ilia Markov, and Piek Vossen. 2024. The Con- stant in HATE: Analyzing Toxicity in Reddit across Topics and Languages. doi:10.48550/arXiv.2404.18726 arXiv:2404.18726 [cs]

work page doi:10.48550/arxiv.2404.18726 2024
[51]

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Lan- guage Models with Self-Generated Instructions. doi:10.48550/arXiv.2212.10560 arXiv:2212.10560 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.10560 2023
[52]

Wellman, Susan Carey, Lila Gleitman, Elissa L

Henry M. Wellman, Susan Carey, Lila Gleitman, Elissa L. Newport, and Elizabeth S. Spelke. 1990. The Child’s Theory of Mind. The MIT Press. doi:10.7551/mitpress/ 1811.001.0001

work page doi:10.7551/mitpress/ 1990
[53]

Bo Wen, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Farhana Zulkernine, and Huamin Chen. 2024. Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health. In 2024 IEEE International Conference on Digital Health (ICDH) . IEEE, Shenzhen, China, 104–113. doi:10. 1109/ICDH62654.2024.00027

work page arXiv 2024
[54]

Joel Wester, Sander De Jong, Henning Pohl, and Niels Van Berkel. 2024. Exploring people’s perceptions of LLM-generated advice.Computers in Human Behavior: Artificial Humans 2, 2 (Aug. 2024), 100072. doi:10.1016/j.chbah.2024.100072

work page doi:10.1016/j.chbah.2024.100072 2024
[55]

Deirdre Wilson and Dan Sperber. 2015. Outline of Relevance Theory. HERMES - Journal of Language and Communication in Business 3, 5 (July 2015), 35. doi:10. 7146/hjlcb.v3i5.21436

work page 2015
[56]

Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu, and Mona T Diab. 2025. Human- izing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design. arXiv preprint arXiv:2508.17573 (2025)

work page arXiv 2025
[57]

Daniel Alexander Yudkin, Geoffrey Philip Goodwin, Andrew Reece, Kurt Gray, and Sudeep Bhatia. 2025. A large-scale investigation of everyday moral dilemmas. PNAS Nexus 4, 5 (April 2025), pgaf119. doi:10.1093/pnasnexus/pgaf119

work page doi:10.1093/pnasnexus/pgaf119 2025
[58]

Ong, and Junyi Jessy Li

Hongli Zhan, Desmond C. Ong, and Junyi Jessy Li. 2023. Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023 , Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 14418–14446. doi:10.18653/v1/2023.findin...

work page doi:10.18653/v1/2023.findings-emnlp.962 2023
[59]

Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, and Maarten Sap

Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, and Maarten Sap. 2024. Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance. doi:10.48550/arXiv.2407.07950 arXiv:2407.07950 [cs]

work page doi:10.48550/arxiv.2407.07950 2024