pith. sign in

arxiv: 2504.15801 · v2 · submitted 2025-04-22 · 💻 cs.CL · cs.AI· cs.CY

A closer look at how large language models trust humans: patterns and biases

Pith reviewed 2026-05-22 18:16 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY
keywords large language modelstrust dynamicstrustworthiness dimensionsdemographic biasAI decision makinghuman-AI interactionfinancial scenariossimulated experiments
0
0 comments X

The pith

Large language models develop trust toward humans in ways that mirror human patterns, guided by competence, benevolence, and integrity but also shaped by age, religion, and gender in financial contexts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs form effective trust in humans according to the same trustworthiness dimensions that shape human-to-human trust. It runs thousands of simulated decision scenarios with five popular models to measure how much trust depends on competence, benevolence, and integrity, and whether demographics such as age, religion, and gender introduce additional biases. The central finding is that trust aligns with human-like mechanisms in most cases, especially in familiar scenarios and newer models, though results vary across models and are weaker in some settings. This matters because LLMs increasingly participate in decisions like loan evaluations where their implicit trust judgments can affect real people. If these patterns hold, they point to the need to monitor and adjust how models weigh trustworthiness and demographics to avoid unintended consequences.

Core claim

Using established behavioral theories, the study finds that LLM trust development shows an overall similarity to human trust development. In most but not all cases, LLM trust is strongly predicted by the trustworthiness dimensions of competence, benevolence, and integrity, and in some cases also biased by age, religion, and gender, especially in financial scenarios. This is particularly true for scenarios common in the literature and for newer models. Different models exhibit variation in how they estimate trust, with trustworthiness and demographic factors serving as weak predictors in some instances.

What carries the argument

The three trustworthiness dimensions of competence, benevolence, and integrity, together with demographic variables, used to predict effective trust across simulated experiments.

If this is right

  • In most scenarios, trust levels track the human subject's competence, benevolence, and integrity.
  • Demographic biases by age, religion, and gender appear most clearly in financial decision contexts.
  • Newer models tend to display stronger and more human-like reliance on trustworthiness dimensions.
  • Some models show weaker overall prediction by these factors, indicating inconsistent trust formation.
  • Unchecked patterns could produce unintended effects in applications where LLMs influence decisions about people.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the patterns persist outside simulations, AI agents might systematically favor or disfavor certain demographic groups in loan or hiring contexts without explicit intent.
  • Adjusting prompts or fine-tuning could reduce unwanted demographic influences on trust judgments in future models.
  • Similar trust mechanisms might surface in non-financial domains such as medical or legal advice if the same dimensions are at play.
  • Longitudinal testing with live users would help determine whether simulation results translate to deployed systems.

Load-bearing premise

The simulated experiments accurately capture how LLMs would form effective trust in real deployment contexts rather than merely reflecting patterns in training data or prompt phrasing.

What would settle it

Conducting parallel experiments where actual humans interact with the same LLMs on identical trust-sensitive tasks and finding no systematic link between the trustworthiness dimensions or demographics and the LLMs' trust-related outputs would undermine the central claim.

read the original abstract

As large language models (LLMs) and LLM-based agents increasingly interact with humans in decision-making contexts, understanding the trust dynamics between humans and AI agents becomes a central concern. While considerable literature studies how humans trust AI agents, it is much less understood how LLM-based agents develop effective trust in humans. LLM-based agents likely rely on some sort of implicit effective trust in trust-related contexts (e.g., evaluating individual loan applications) to assist and affect decision making. Using established behavioral theories, we develop an approach that studies whether LLMs trust depends on the three major trustworthiness dimensions: competence, benevolence and integrity of the human subject. We also study how demographic variables affect effective trust. Across 43,200 simulated experiments, for five popular language models, across five different scenarios we find that LLM trust development shows an overall similarity to human trust development. We find that in most, but not all cases, LLM trust is strongly predicted by trustworthiness, and in some cases also biased by age, religion and gender, especially in financial scenarios. This is particularly true for scenarios common in the literature and for newer models. While the overall patterns align with human-like mechanisms of effective trust formation, different models exhibit variation in how they estimate trust; in some cases, trustworthiness and demographic factors are weak predictors of effective trust. These findings call for a better understanding of AI-to-human trust dynamics and monitoring of biases and trust development patterns to prevent unintended and potentially harmful outcomes in trust-sensitive applications of AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an empirical study examining how large language models (LLMs) form effective trust in humans. Through 43,200 simulated experiments involving five popular LLMs across five different scenarios, the authors investigate the role of trustworthiness dimensions (competence, benevolence, and integrity) and demographic variables (age, religion, gender) in predicting trust. They report that LLM trust development generally mirrors human patterns, with trustworthiness being a strong predictor in most cases and demographic biases appearing in some financial scenarios, particularly for newer models.

Significance. If the findings hold under rigorous scrutiny, this work would provide valuable insights into AI-to-human trust dynamics, which is increasingly relevant as LLMs are deployed in decision-making roles. The large scale of simulations and comparison across models are strengths. However, the significance is tempered by potential limitations in how the simulations capture real-world trust formation versus training data artifacts.

major comments (3)
  1. Abstract: The abstract reports 43,200 simulations and clear patterns but provides no details on prompt construction, response coding, statistical controls, or how 'effective trust' was operationalized. Without these, it is impossible to assess whether the central patterns are robust or artifactual, which is load-bearing for the similarity-to-human-trust claim.
  2. Methods (implied in experimental design description): The experimental design does not appear to include controls for prompt paraphrasing, framing effects, or out-of-distribution human descriptions. This leaves open that observed predictors (especially age/religion/gender in financial scenarios) are artifacts of common co-occurrences in the training corpus rather than emergent trust computation.
  3. Results: Model-to-model variation is noted but not tied to any architectural or training difference that would distinguish genuine mechanism from prompt sensitivity, undermining the ability to interpret differences across the five models as evidence of stable trust mechanisms.
minor comments (2)
  1. Abstract: The phrasing 'in most, but not all cases' and 'in some cases also biased' could be made more precise by referencing specific tables or figures that quantify the effect sizes.
  2. Discussion: Explicitly address how the five scenarios were selected and whether they include both common literature scenarios and novel ones to strengthen the generalizability argument.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and robustness of our work on LLM trust dynamics. We address each major comment below and have revised the manuscript to incorporate additional details, robustness checks, and discussion where feasible.

read point-by-point responses
  1. Referee: Abstract: The abstract reports 43,200 simulations and clear patterns but provides no details on prompt construction, response coding, statistical controls, or how 'effective trust' was operationalized. Without these, it is impossible to assess whether the central patterns are robust or artifactual, which is load-bearing for the similarity-to-human-trust claim.

    Authors: We agree that greater transparency in the abstract would strengthen the manuscript. We have revised the abstract to include a concise description of how effective trust was operationalized (drawing on the three trustworthiness dimensions from behavioral trust literature) and to note the use of structured prompts with statistical regression controls for demographics and trustworthiness factors. Detailed explanations of prompt templates, response coding procedures (e.g., parsing model outputs for trust indicators), and the full statistical models are now more prominently referenced in the abstract and expanded in the Methods section. revision: yes

  2. Referee: Methods (implied in experimental design description): The experimental design does not appear to include controls for prompt paraphrasing, framing effects, or out-of-distribution human descriptions. This leaves open that observed predictors (especially age/religion/gender in financial scenarios) are artifacts of common co-occurrences in the training corpus rather than emergent trust computation.

    Authors: This concern about potential training-data artifacts is well-taken. We have added a dedicated robustness subsection to the Methods that reports results from paraphrased prompt variants and alternative scenario framings; the core effects of trustworthiness dimensions remain stable. We also incorporated a small set of out-of-distribution human descriptions in supplementary analyses. While we cannot completely eliminate all influences from pre-training corpora, the scale of the design (43,200 simulations across five models and five scenarios) and the independent manipulation of trustworthiness cues provide evidence that the patterns reflect more than simple co-occurrence. We have expanded the Limitations section to discuss this issue explicitly. revision: yes

  3. Referee: Results: Model-to-model variation is noted but not tied to any architectural or training difference that would distinguish genuine mechanism from prompt sensitivity, undermining the ability to interpret differences across the five models as evidence of stable trust mechanisms.

    Authors: We acknowledge that the manuscript primarily documents rather than causally attributes model differences. Because several models are closed-source, we lack direct access to training details that would allow precise mapping of variation to architecture. We have added discussion in the Results and Limitations sections that relates observed differences to publicly known factors such as model recency and scale, while explicitly noting that prompt sensitivity remains a possible contributor. This framing preserves the main claim of overall human-like patterns while qualifying the interpretation of cross-model differences. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical simulation study

full rationale

The paper conducts an empirical investigation using prompted simulations across 43,200 scenarios with five LLMs and five domains to measure trust formation based on trustworthiness dimensions and demographics. No mathematical derivations, equations, or self-citation chains are present that would reduce the central claims to inputs by construction. The results derive directly from model output analysis rather than from fitted parameters renamed as predictions or ansatzes smuggled via prior work. This is a standard empirical design with independent content from the experimental data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study relies on the assumption that prompting LLMs with scenario descriptions elicits stable, interpretable trust judgments that can be compared to human behavioral data. No free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5798 in / 1159 out tokens · 39826 ms · 2026-05-22T18:16:11.264065+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

  1. [1]

    W., Lu, Z., Li, Z., & Yin, M

    Chiang, C. W., Lu, Z., Li, Z., & Yin, M. (2024, March). Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate. In Proceedings of the 29th International Conference on Intelligent User Interfaces (pp. 103-119)

  2. [2]

    Eigner, E., & Händler, T. (2024). Determinants of llm-assisted decision- making. arXiv preprint arXiv:2402.17385

  3. [3]

    C., Dalisan, D., Korecki, M., Hausladen, C

    Yang, J. C., Dalisan, D., Korecki, M., Hausladen, C. I., & Helbing, D. (2024, October). LLM Voting: Human Choices and AI Collective Decision-Making. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (Vol. 7, pp. 1696-1708)

  4. [4]

    S., Xu, X., McDuff, D.,

    Kim, Y., Park, C., Jeong, H., Chan, Y. S., Xu, X., McDuff, D., ... & Park, H. (2024). Mdagents: An adaptive collaboration of llms for medical decision- making. Advances in Neural Information Processing Systems, 37, 79410- 79452

  5. [5]

    E., Vuruma, S

    Lakkaraju, K., Jones, S. E., Vuruma, S. K. R., Pallagani, V., Muppasani, B. C., & Srivastava, B. (2023, November). Llms for financial advisement: A fairness and efficacy study in personal decision making. In Proceedings of the Fourth ACM International Conference on AI in Finance (pp. 100-107)

  6. [6]

    Languagempc: Large language models as decision makers for autonomous driving

    Sha, H., Mu, Y., Jiang, Y., Chen, L., Xu, C., Luo, P., ... & Ding, M. (2023). Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026

  7. [7]

    Chen, M., Tao, Z., Tang, W., Qin, T., Yang, R., & Zhu, C. (2024). Enhancing emergency decision-making with knowledge graphs and large language models. International Journal of Disaster Risk Reduction, 113, 104804. 20

  8. [8]

    (2025, February 24)

    Saeedy, A. (2025, February 24). The rise of artificial intelligence at JPMorgan. The Wall Street Journal

  9. [9]

    Jiao, J., Afroogh, S., Xu, Y., & Phillips, C. (2024). Navigating llm ethics: Advancements, challenges, and future directions. arXiv preprint arXiv:2406.18841

  10. [10]

    Tian, Y., Yang, X., Zhang, J., Dong, Y., & Su, H. (2023). Evil geniuses: Delving into the safety of llm-based agents. arXiv preprint arXiv:2311.11855

  11. [11]

    (2024, February 22)

    Springs Apps. (2024, February 22). Large language model statistics and numbers (2024). Springs Apps

  12. [12]

    (2023, November 21)

    Datanami Staff. (2023, November 21). New data unveils realities of generative AI adoption in the enterprise. Datanami

  13. [13]

    Tamkin, M

    Tamkin, A., McCain, M., Handa, K., Durmus, E., Lovitt, L., Rathi, A., ... & Ganguli, D. (2024). Clio: Privacy-Preserving Insights into Real-World AI Use. arXiv preprint arXiv:2412.13678

  14. [14]

    Glikson, E., & Woolley, A. W. (2020). Human trust in artificial intelligence: Review of empirical research. Academy of management annals, 14(2), 627- 660

  15. [15]

    E., & Choudhury, A

    Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: focus on clinicians. Journal of medical Internet research, 22(6), e15154

  16. [16]

    Choung, H., David, P., & Ross, A. (2023). Trust in AI and its role in the acceptance of AI technologies. International Journal of Human–Computer Interaction, 39(9), 1727-1739

  17. [17]

    Lindsey, J., Gurnee, W., Ameisen, E., Chen, B., Pearce, A., Turner, N. L., ... Batson, J. (2025). On the biology of a large language model. Anthropic

  18. [18]

    Johnson, T., & Obradovich, N. (2022). Measuring an artificial intelligence agent's trust in humans using machine incentives. arXiv preprint arXiv:2212.13371

  19. [19]

    X., Wu, Y ., Chen, K

    Wu, J. X., Wu, Y ., Chen, K. Y., & Hua, L. (2023). Building socially intelligent AI systems: Evidence from the trust game using artificial agents with deep learning. Management Science, 69(12), 7236-7252

  20. [20]

    C., & Davis, J

    Mayer, R. C., & Davis, J. H. (1999). The effect of the performance appraisal system on trust for management: A field quasi-experiment. Journal of applied psychology, 84(1), 123

  21. [21]

    A., Scott, B

    Colquitt, J. A., Scott, B. A., & LePine, J. A. (2007). Trust, trustworthiness, and trust propensity: a meta-analytic test of their unique relationships with risk taking and job performance. Journal of applied psychology, 92(4), 909. 21

  22. [22]

    L., Gooty, J., Little, L

    Frazier, M. L., Gooty, J., Little, L. M., & Nelson, D. L. (2015). Employee attachment: Implications for supervisor trustworthiness and trust. Journal of Business and Psychology, 30, 373-386

  23. [23]

    Poon, J. M. (2013). Effects of benevolence, integrity, and ability on trust‐in‐ supervisor. Employee Relations, 35(4), 396-407

  24. [24]

    Minza, M. (2019). Benevolence, competency, and integrity: Which one is more influential on trust in friendships. Jurnal Psikologi Vol, 18(1), 91-105

  25. [25]

    R., Liu, R., Richardson, S

    Anthis, J. R., Liu, R., Richardson, S. M., Kozlowski, A. C., Koch, B., Evans, J., ... & Bernstein, M. (2025). LLM Social Simulations Are a Promising Research Method. arXiv preprint arXiv:2504.02234

  26. [26]

    Gao, C., Lan, X., Lu, Z., Mao, J., Piao, J., Wang, H., ... & Li, Y. (2023). S3: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984

  27. [27]

    Gürcan, Ö. (2024). Llm-augmented agent-based modelling for social simulations: Challenges and opportunities. HHAI 2024: Hybrid Human AI Systems for the Social Good, 134-144

  28. [28]

    A., & Baer, M

    Colquitt, J. A., & Baer, M. D. (2023). Foster trust through ability, benevolence, and integrity. Principles of Organizational Behavior: The Handbook of Evidence‐Based Management 3rd Edition, 345-363

  29. [30]

    Lleo de Nalda, A., Guillen, M., & Gil Pechuan, I. (2016). The influence of ability, benevolence, and integrity in trust between managers and subordinates: The role of ethical reasoning. Business Ethics: A European Review, 25(4), 556-576

  30. [31]

    L., Crook, T

    Connelly, B. L., Crook, T. R., Combs, J. G., Ketchen Jr, D. J., & Aguinis, H. (2018). Competence-and integrity-based trust in interorganizational relationships: which matters more?. Journal of Management, 44(3), 919-945

  31. [32]

    kelly is a warm person, joseph is a role model

    Wan, Y., Pu, G., Sun, J., Garimella, A., Chang, K. W., & Peng, N. (2023). " kelly is a warm person, joseph is a role model": Gender biases in llm- generated reference letters. arXiv preprint arXiv:2310.09219

  32. [33]

    M., Curry, A

    Plaza-del-Arco, F. M., Curry, A. C., Paoli, S., Curry, A., & Hovy, D. (2024). Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models. arXiv preprint arXiv:2407.06908. 22

  33. [34]

    Kamruzzaman, M., Shovon, M. M. I., & Kim, G. L. (2023). Investigating subtler biases in llms: Ageism, beauty, institutional, and nationality bias in generative models. arXiv preprint arXiv:2309.08902

  34. [35]

    McCann, R., & Giles, H. (2002). Ageism in the workplace: A communication perspective

  35. [36]

    Heilman, M. E. (2012). Gender stereotypes and workplace bias. Research in organizational Behavior, 32, 113-135

  36. [37]

    M., Barclay, L

    Ghumman, S., Ryan, A. M., Barclay, L. A., & Markel, K. S. (2013). Religious discrimination in the workplace: A review and examination of current and future trends. Journal of Business and Psychology, 28, 439-454

  37. [38]

    Xie, Y., & Peng, S. (2009). How to repair customer trust after negative publicity: The roles of competence, integrity, benevolence, and forgiveness. Psychology & Marketing, 26(7), 572-589

  38. [39]

    H., & Tan, C

    Tan, H. H., & Tan, C. S. (2000). Toward the differentiation of trust in supervisor and trust in organization. Genetic, social, and general psychology monographs, 126(2), 241

  39. [40]

    E., & McNally, J

    Gill, H., Boies, K., Finegan, J. E., & McNally, J. (2005). Antecedents of trust: Establishing a boundary condition for the relation between propensity to trust and intention to trust. Journal of business and psychology, 19, 287-302

  40. [41]

    Mitigating Gender Bias in Natural Language Processing: Literature Review

    Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., ... & Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976. 23 SUPPLEMENTAL MATERIALS: The following tables serve as appendices to Tables 1–4 presented in the main Results section. They provide additional details and ex...