pith. machine review for the scientific record. sign in

arxiv: 2604.22764 · v1 · submitted 2026-03-23 · 💻 cs.CY · cs.AI· cs.IR

Recognition: no theorem link

Implicit Humanization in Everyday LLM Moral Judgments

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:27 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.IR
keywords LLManthropomorphismmoral judgmenthumanizationoverrelianceAI trustconversational AIAI ethics
0
0 comments X

The pith

LLM answers to moral judgment requests in social conflicts reinforce assumptions that the AI thinks and acts like a human.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Users sometimes ask large language models to decide who was wrong in a personal dispute. The paper treats these queries as implicitly humanizing because they project human-like moral reasoning onto the model. It built a dataset of such simulated queries and examined responses from four major general-purpose LLMs for linguistic, behavioral, and cognitive cues that treat the model as human. The responses consistently reinforced rather than corrected those cues. If the pattern holds, everyday use of LLMs for personal advice could increase overreliance and misplaced trust in the systems' actual capabilities.

Core claim

Requests for moral judgments on social conflicts are implicitly humanizing queries that carry anthropomorphic projections. Examination of four major LLMs shows their responses reinforce linguistic, behavioral, and cognitive anthropomorphic cues instead of correcting them, which may heighten risks of overreliance or misplaced trust.

What carries the argument

Measurement of reinforcement via linguistic, behavioral, and cognitive anthropomorphic cues in LLM responses to a new simulated dataset of moral judgment queries.

If this is right

  • LLM responses may increase overreliance when users seek personal moral advice.
  • Reinforcement of human-like cues can produce misplaced trust in model capabilities.
  • System designs should correct misaligned expectations rather than reinforce them.
  • Future research must expand the concept of anthropomorphism to include implicit user-side projections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reinforcement pattern could appear in other advice-seeking queries that involve emotional or relational content.
  • Repeated exposure might strengthen users' tendency to form attachments to conversational AI beyond moral topics.
  • Simple response interventions such as explicit capability disclaimers could be tested to reduce the reinforcement effect.

Load-bearing premise

The selected anthropomorphic cues validly capture implicit humanization and the simulated queries accurately stand in for real user requests without introducing bias.

What would settle it

A controlled user study in which participants interact with LLMs on moral judgment tasks and report no greater perceived human-likeness or trust than in a control condition using neutral queries.

Figures

Figures reproduced from arXiv: 2604.22764 by Hoda Ayad, Tanu Mitra.

Figure 1
Figure 1. Figure 1: Humanizing Linguistic Cues across LLM responses. All scores are positive, indicating human-leaning response style for all models. GPT 4.1 Mini scored highest on style cues while Gemini 2.5 Flash used first person most. Errors calculated at a 95% CI [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Behavioral Cues measured by level of compliance with the moral judgment task. No models gave explicit refusals. Gemini 2.5 Flash is the only model with more full compliance than partial. which seem friendlier or more human-like (e.g. “It sounds like your feelings are completely valid”), it is much less likely to do so by inserting itself as an actor in conversation with statements refer￾ring to itself (e.g… view at source ↗
read the original abstract

Recent adoption of conversational information systems has expanded the scope of user queries to include complex tasks such as personal advice-seeking. However, we identify a specific type of sought advice-a request for a moral judgment (i.e. "who was wrong?") in a social conflict-as an implicitly humanizing query which carries potentially harmful anthropomorphic projections. In this study, we examine the reinforcement of these assumptions in the responses of four major general-purpose LLMs through the use of linguistic, behavioral, and cognitive anthropomorphic cues. We also contribute a novel dataset of simulated user queries for moral judgments. We find current LLM system responses reinforce implicit humanization in queries, potentially exacerbating risks like overreliance or misplaced trust. We call for future work to expand the understanding of anthropomorphism to include implicit userside humanization and to design solutions that address user needs while correcting misaligned expectations of model capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an empirical study claiming that user queries requesting moral judgments on social conflicts implicitly humanize LLMs by projecting anthropomorphic assumptions. By examining responses from four major general-purpose LLMs to a novel simulated dataset of such queries, using linguistic, behavioral, and cognitive cues, the authors find that LLM responses reinforce these implicit humanizations, which may exacerbate risks like overreliance or misplaced trust. The paper calls for future work on implicit user-side humanization and design solutions.

Significance. Should the measurement approach prove robust, the paper's significance lies in shifting focus from explicit anthropomorphism in AI design to implicit humanization induced by query types in everyday use. The contribution of a simulated dataset supports reproducibility in this emerging area of AI ethics research. It provides a foundation for developing mitigations against misplaced trust in LLMs for moral advice.

major comments (3)
  1. [Methods] Methods section: The definitions, selection criteria, and operationalization of the linguistic, behavioral, and cognitive anthropomorphic cues are not provided (no examples of coded responses or validation against human judgments). This is load-bearing for the central claim, as it is unclear whether observed patterns reflect humanization reinforcement or generic conversational style.
  2. [Dataset] Dataset section: The generation protocol, sample size, and bias controls for the simulated moral judgment queries are not described. This directly affects whether the dataset represents real user queries, undermining generalization of the reinforcement finding.
  3. [Results] Results section: No quantitative metrics (e.g., cue frequencies, statistical tests, or inter-rater reliability) are reported to support the conclusion that responses 'reinforce implicit humanization'.
minor comments (2)
  1. [Abstract] Abstract: Naming the four specific LLMs and reporting the dataset size would provide necessary context without lengthening the paragraph.
  2. [Introduction] Introduction: The distinction between implicit user-side humanization and explicit model anthropomorphism could be illustrated with one concrete query example for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving clarity and rigor. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our methods, dataset, and results.

read point-by-point responses
  1. Referee: [Methods] Methods section: The definitions, selection criteria, and operationalization of the linguistic, behavioral, and cognitive anthropomorphic cues are not provided (no examples of coded responses or validation against human judgments). This is load-bearing for the central claim, as it is unclear whether observed patterns reflect humanization reinforcement or generic conversational style.

    Authors: We agree that expanded detail is essential. In the revised manuscript, we will add precise definitions for each cue category drawn from the anthropomorphism literature, along with explicit selection criteria and operationalization rules. We will include multiple concrete examples of coded LLM responses for each cue type. We will also add a limitations discussion noting the absence of formal human validation in the current study and propose such validation as valuable future work. These changes will better distinguish the targeted humanization patterns from generic conversational features. revision: yes

  2. Referee: [Dataset] Dataset section: The generation protocol, sample size, and bias controls for the simulated moral judgment queries are not described. This directly affects whether the dataset represents real user queries, undermining generalization of the reinforcement finding.

    Authors: We accept this critique and will substantially expand the Dataset section. The revision will describe the full generation protocol, including how queries were constructed to emulate real-world moral judgment requests on social conflicts. We will report the exact sample size and detail bias controls such as scenario diversity, phrasing variation, and demographic balance. These additions will directly support claims about the dataset's representativeness and the generalizability of the reinforcement findings. revision: yes

  3. Referee: [Results] Results section: No quantitative metrics (e.g., cue frequencies, statistical tests, or inter-rater reliability) are reported to support the conclusion that responses 'reinforce implicit humanization'.

    Authors: We agree that quantitative support would strengthen the results. In the revised manuscript, we will report cue frequencies across the four LLMs, include appropriate statistical tests to demonstrate reinforcement patterns, and describe any consistency measures used during coding. We will also clarify the primarily qualitative nature of the original analysis while adding these metrics to provide more robust empirical grounding for the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical observation

full rationale

The paper conducts an empirical analysis of LLM outputs against a set of predefined linguistic, behavioral, and cognitive anthropomorphic cues applied to a contributed simulated dataset of moral-judgment queries. No equations, derivations, parameter fitting, or self-referential definitions appear in the described chain; the central finding is framed as direct observation of cue presence in model responses rather than any prediction or result that reduces to the inputs by construction. Self-citations, if present, are not load-bearing for the core claim, and the study does not invoke uniqueness theorems or smuggle ansatzes. This is a standard non-circular empirical setup.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that moral judgment queries carry implicit humanizing projections and that the selected cues reliably detect reinforcement of those projections in LLM outputs. No free parameters or invented physical entities are involved.

axioms (2)
  • domain assumption Requests for moral judgments in social conflicts constitute implicitly humanizing queries that project human-like qualities onto LLMs
    This identification is the foundational premise stated in the abstract.
  • domain assumption Linguistic, behavioral, and cognitive anthropomorphic cues can be used to measure reinforcement of humanization in model responses
    The study relies on these cues without detailing validation in the abstract.
invented entities (1)
  • implicit humanization no independent evidence
    purpose: To name the hidden anthropomorphic projections carried by moral judgment queries
    Introduced as a specific type of user-side assumption in the abstract.

pith-pipeline@v0.9.0 · 5446 in / 1379 out tokens · 44472 ms · 2026-05-15T01:27:27.744661+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 1 internal anchor

  1. [1]

    Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, Verena Rieser, and Zeerak Talat. 2023. Mirages: On Anthropomorphism in Dialogue Systems. doi:10. 48550/arXiv.2305.09800 arXiv:2305.09800 [cs]

  2. [2]

    Nader Aboul-Fettouh, Kevin P Lee, Natalie Kash, Kathleen Kroger, and Sirunya Silapunt. 2023. Social Media and Dermatology During the COVID-19 Pandemic: Analyzing User-Submitted Posts Seeking Dermatologic Advice on Reddit. Cureus (Jan. 2023). doi:10.7759/cureus.33720

  3. [3]

    Canfer Akbulut, Laura Weidinger, Arianna Manzini, Iason Gabriel, and Verena Rieser. 2024. All too human? Mapping and mitigating the risk from anthropo- morphic AI. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , Vol. 7. 13–26. Issue: 1

  4. [4]

    Birkun and Adhish Gautam

    Alexei A. Birkun and Adhish Gautam. 2023. Large Language Model (LLM)- Powered Chatbots Fail to Generate Guideline-Consistent Content on Resusci- tation and May Provide Potentially Harmful Advice. Prehospital and Disaster Medicine 38, 6 (Dec. 2023), 757–763. doi:10.1017/S1049023X23006568

  5. [5]

    Smith, Yejin Choi, and Hannaneh Hajishirzi

    Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, and Hannaneh Hajishirzi

  6. [6]

    doi:10.48550/arXiv.2407.12043 arXiv:2407.12043 [cs]

    The Art of Saying No: Contextual Noncompliance in Language Models. doi:10.48550/arXiv.2407.12043 arXiv:2407.12043 [cs]

  7. [7]

    Liora Braunstain, Oren Kurland, David Carmel, Idan Szpektor, and Anna Shtok

  8. [8]

    Supporting Human Answers for Advice-Seeking Questions in CQA Sites. In Advances in Information Retrieval, Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hauff, and Gianmaria Silvello (Eds.). Vol. 9626. Springer International Publishing, Cham, 129–141. doi:10.1007/978-3-319-30671-1_10...

  9. [9]

    Viv Burr. 2015. Social Constructionism. In International Encyclopedia of the Social & Behavioral Sciences. Elsevier, 222–227. doi:10.1016/B978-0-08-097086-8.24049- X

  10. [10]

    Emily Cannon, Bianca Crouse, Souvick Ghosh, Nicholas Rihn, and Kristen Chua

  11. [11]

    Don’t Downvote A\$\$\$\$\$\$s!!

    "Don’t Downvote A\$\$\$\$\$\$s!!": An Exploration of Reddit’s Advice Communities. doi:10.24251/HICSS.2022.363

  12. [12]

    Aaron Chatterji, Tom Cunningham, David Deming, Zoe Hitzig, Christopher Ong, Carl Shan, and Kevin Wadman. [n. d.]. How People Use ChatGPT. ([n. d.])

  13. [13]

    Jiahao Chen, Fu Guo, Zenggen Ren, Mingming Li, and Jaap Ham. 2024. Effects of Anthropomorphic Design Cues of Chatbots on Users’ Perception and Visual Behaviors. International Journal of Human–Computer Interaction 40, 14 (July 2024), 3636–3654. doi:10.1080/10447318.2023.2193514

  14. [14]

    I Am 30F and Need Advice!

    Yixin Chen, Scott Hale, and Bernie Hogan. 2024. “I Am 30F and Need Advice!”: A Mixed-Method Analysis of the Effects of Advice-Seekers’ Self-Disclosure on Received Replies. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 276–288. doi:10.1609/icwsm.v18i1.31313

  15. [15]

    Myra Cheng, Sunny Yu, and Dan Jurafsky. 2025. HumT DumT: Measuring and controlling human-like language in LLMs. doi:10.48550/arXiv.2502.13259 arXiv:2502.13259 [cs]

  16. [16]

    Dalal and Silvia Bonaccio

    Reeshad S. Dalal and Silvia Bonaccio. 2010. What types of advice do decision- makers prefer? Organizational Behavior and Human Decision Processes 112, 1 (May 2010), 11–23. doi:10.1016/j.obhdp.2009.11.007

  17. [17]

    Danish, Yogesh Dahiya, and Partha Talukdar. 2021. Discovering Response- Eliciting Factors in Social Question Answering : A Reddit Inspired Study. Pro- ceedings of the International AAAI Conference on Web and Social Media 10, 1 (Aug. 2021), 82–91. doi:10.1609/icwsm.v10i1.14752

  18. [18]

    Joanna Demaree-Cotton and Guy Kahane. 2025. Moral dilemmas. (2025). Pub- lisher: Cambridge University Press

  19. [19]

    Eccles and Güler Arsal

    David W. Eccles and Güler Arsal. 2017. The think aloud method: what is it and how do I use it? Qualitative Research in Sport, Exercise and Health 9, 4 (Aug. 2017), 514–531. doi:10.1080/2159676X.2017.1331501

  20. [20]

    Chufan Gao, Sanjit Singh Batra, Alexander Russell Pelletier, Gregory D Lyng, Zhichao Yang, Eran Halperin, and Robert E Tillman. [n. d.]. BioSynNER: Synthetic Data for Biomedical Named Entity Recognition. In Workshop on Large Language Models and Generative AI for Health at AAAI 2025

  21. [21]

    Yujia Gao, Wenna Qin, Aniruddha Murali, Christopher Eckart, Xuhui Zhou, Jacob Daniel Beel, Yi-Chia Wang, and Diyi Yang. 2024. A Crisis of Civility? Modeling Incivility and Its Effects in Political Discourse Online. Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024), 408–421. doi:10.1609/icwsm.v18i1.31323

  22. [22]

    Schweitzer

    Alexis Gordon and Maurice E. Schweitzer. 2025. Motivated Advice Seeking: Are Advice Seekers Trying to Be More Accurate? doi:10.2139/ssrn.5254795

  23. [23]

    H. P. (Herbert Paul) Grice. 1989. Studies in the way of words . Cam- bridge, Mass. ; London : Harvard University Press. http://archive.org/details/ studiesinwayofwo00gric

  24. [24]

    Quzhe Huang, Mingxu Tao, Chen Zhang, Zhenwei An, Cong Jiang, Zhibin Chen, Zirui Wu, and Yansong Feng. 2023. Lawyer LLaMA Technical Report. doi:10. 48550/arXiv.2305.15062 arXiv:2305.15062 [cs]

  25. [25]

    Matin, Gorav N

    Bright Huo, Amy Boyle, Nana Marfo, Wimonchat Tangamornsuksan, Jeremy P. Steen, Tyler McKechnie, Yung Lee, Julio Mayol, Stavros A. Antoniou, Arun James Thirunavukarasu, Stephanie Sanger, Karim Ramji, and Gordon Guyatt. 2025. Large Language Models for Chatbot Health Advice Studies: A Systematic Review. JAMA Network Open 8, 2 (Feb. 2025), e2457879. doi:10.10...

  26. [26]

    Romy Menghao Jia, Jia Tina Du, and Yuxiang Chris Zhao. 2021. Needs for Relatedness: LGBTQ+ Individuals’ Information Seeking and Sharing in an Online Community. In Proceedings of the 2021 Conference on Human In- formation Interaction and Retrieval . ACM, Canberra ACT Australia, 291–294. doi:10.1145/3406522.3446040

  27. [27]

    Qiaoling Liu, Eugene Agichtein, Gideon Dror, Yoelle Maarek, and Idan Szpektor

  28. [28]

    In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

    When web search fails, searchers become askers: understanding the transi- tion. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval . ACM, Portland Oregon USA, 801–810. doi:10.1145/2348283.2348390

  29. [29]

    Steven Lukes. 2010. The Social Construction of Morality? In Handbook of the Sociology of Morality , Steven Hitlin and Stephen Vaisey (Eds.). Springer New York, New York, NY, 549–560. doi:10.1007/978-1-4419-6896-8_29 Series Title: Handbooks of Sociology and Social Research

  30. [30]

    Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s al- gorithm, governance, and culture support toxic technocultures. New Media & Society 19, 3 (March 2017), 329–346. doi:10.1177/1461444815608807

  31. [31]

    Miles McCain, Ryn Linthicum, Chloe Lubinski, Alex Tamkin, Saffron Huang, Michael Stern, Kunal Handa, Esin Durmus, Tyler Neylon, Stuart Ritchie, Kamya Jagadish, Paruul Maheshwary, Sarah Heck, Alexandra Sanderford, and Deep Ganguli. 2025. How People Use Claude for Support, Advice, and Companion- ship. https://www.anthropic.com/news/how-people-use-claude-for...

  32. [32]

    McGannon and John C

    Kerry R. McGannon and John C. Spence. 2010. Speaking of the self and under- standing physical activity participation: what discursive psychology can tell us about an old problem. Qualitative Research in Sport and Exercise 2, 1 (March 2010), 17–38. doi:10.1080/19398440903510145

  33. [33]

    Alexandra Olteanu, Solon Barocas, Su Lin Blodgett, Lisa Egede, Alicia DeVrio, and Myra Cheng. 2025. AI Automatons: AI Systems Intended to Imitate Humans. doi:10.48550/arXiv.2503.02250 arXiv:2503.02250 [cs]

  34. [34]

    Pennebaker, Ryan L

    James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015. The Development and Psychometric Properties of LIWC2015. (2015). doi:10. 15781/T29G6Z Publisher: University of Texas at Austin

  35. [35]

    Sandra Peter, Kai Riemer, and Jevin D. West. 2025. The benefits and dangers of anthropomorphic conversational agents. Proceedings of the National Academy of Sciences 122, 22 (June 2025), e2415898122. doi:10.1073/pnas.2415898122

  36. [36]

    Liu, Valdemar Danry, Eunhae Lee, Samantha W

    Jason Phang, Michael Lampe, Lama Ahmad, Sandhini Agarwal, Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, and Pattie Maes. 2025. Investigating Affective Use and Emotional Well-being on ChatGPT. doi:10.48550/arXiv.2504.03888 arXiv:2504.03888 [cs]

  37. [37]

    Barbara Plank. 2022. The ’Problem’ of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation. doi:10.48550/arXiv.2211.02570 arXiv:2211.02570 [cs]

  38. [38]

    Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst

    Inioluwa Deborah Raji, I. Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst

  39. [39]

    InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22)

    The Fallacy of AI Functionality. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 959–972. doi:10.1145/3531146.3533158

  40. [40]

    Joseph Reagle. 2025. A history of the advice genre on Reddit: Evolutionary paths and sibling rivalries. First Monday (Feb. 2025). doi:10.5210/fm.v30i2.13729

  41. [41]

    Reif, Richard P

    Jessica A. Reif, Richard P. Larrick, and Jack B. Soll. 2024. The inclusion of anchors when seeking advice: Causes and consequences. Organizational Behavior and Human Decision Processes 185 (Nov. 2024), 104378. doi:10.1016/j.obhdp.2024. 104378

  42. [42]

    Michael B Robb. 2025. Talk, Trust and Trade-Offs: How and Why Teens Use AI Companions. Common Sense Media (2025)

  43. [43]

    Pratik Sachdeva and Tom Van Nuenen. 2025. Normative Evaluation of Large Language Models with Everyday Moral Dilemmas. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency . ACM, Athens Greece, 690–709. doi:10.1145/3715275.3732044

  44. [44]

    Chelsea Schein. 2020. The Importance of Context in Moral Judgments. Per- spectives on Psychological Science 15, 2 (March 2020), 207–215. doi:10.1177/ 1745691620904083 Publisher: SAGE Publications Inc

  45. [45]

    Eike Schneiders, Tina Seabrooke, Joshua Krook, Richard Hyde, Natalie Leesakul, Jeremie Clos, and Joel E Fischer. 2025. Objection Overruled! Lay People can Distinguish Large Language Models from Lawyers, but still Favour Advice from an LLM. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). ACM, 1–14. doi:10.1145/370...

  46. [46]

    Walter Sinnott-Armstrong. 1988. Moral Dilemmas. Blackwell, New York, NY, USA

  47. [47]

    Stalnaker

    Robert C. Stalnaker. 1999. Context and Content: Essays on Intentionality in Speech and Thought (1 ed.). Oxford University PressOxford. doi:10.1093/0198237073.001. 0001 CHIIR ’26, March 22–26, 2026, Seattle, WA, USA Hoda Ayad and Tanu Mitra

  48. [48]

    Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas W Mayer, and Padhraic Smyth. 2025. What large language models know and what people think they know. Nature Machine Intelligence 7, 2 (2025), 221–231. Publisher: Nature Publishing Group UK London

  49. [49]

    Zeerak Talat, Hagen Blix, Josef Valvoda, Maya Indira Ganesh, Ryan Cotterell, and Adina Williams. 2022. On the Machine Learning of Ethical Judgments from Natu- ral Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Marine Carpuat, Marie-Catherine de M...

  50. [50]

    Wondimagegnhue Tsegaye Tufa, Ilia Markov, and Piek Vossen. 2024. The Con- stant in HATE: Analyzing Toxicity in Reddit across Topics and Languages. doi:10.48550/arXiv.2404.18726 arXiv:2404.18726 [cs]

  51. [51]

    Self-Instruct: Aligning Language Models with Self-Generated Instructions

    Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Lan- guage Models with Self-Generated Instructions. doi:10.48550/arXiv.2212.10560 arXiv:2212.10560 [cs]

  52. [52]

    Wellman, Susan Carey, Lila Gleitman, Elissa L

    Henry M. Wellman, Susan Carey, Lila Gleitman, Elissa L. Newport, and Elizabeth S. Spelke. 1990. The Child’s Theory of Mind. The MIT Press. doi:10.7551/mitpress/ 1811.001.0001

  53. [53]

    Bo Wen, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Farhana Zulkernine, and Huamin Chen. 2024. Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health. In 2024 IEEE International Conference on Digital Health (ICDH) . IEEE, Shenzhen, China, 104–113. doi:10. 1109/ICDH62654.2024.00027

  54. [54]

    Joel Wester, Sander De Jong, Henning Pohl, and Niels Van Berkel. 2024. Exploring people’s perceptions of LLM-generated advice.Computers in Human Behavior: Artificial Humans 2, 2 (Aug. 2024), 100072. doi:10.1016/j.chbah.2024.100072

  55. [55]

    Deirdre Wilson and Dan Sperber. 2015. Outline of Relevance Theory. HERMES - Journal of Language and Communication in Business 3, 5 (July 2015), 35. doi:10. 7146/hjlcb.v3i5.21436

  56. [56]

    Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu, and Mona T Diab. 2025. Human- izing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design. arXiv preprint arXiv:2508.17573 (2025)

  57. [57]

    Daniel Alexander Yudkin, Geoffrey Philip Goodwin, Andrew Reece, Kurt Gray, and Sudeep Bhatia. 2025. A large-scale investigation of everyday moral dilemmas. PNAS Nexus 4, 5 (April 2025), pgaf119. doi:10.1093/pnasnexus/pgaf119

  58. [58]

    Ong, and Junyi Jessy Li

    Hongli Zhan, Desmond C. Ong, and Junyi Jessy Li. 2023. Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023 , Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 14418–14446. doi:10.18653/v1/2023.findin...

  59. [59]

    Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, and Maarten Sap

    Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, and Maarten Sap. 2024. Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance. doi:10.48550/arXiv.2407.07950 arXiv:2407.07950 [cs]