A closer look at how large language models trust humans: patterns and biases
Pith reviewed 2026-05-22 18:16 UTC · model grok-4.3
The pith
Large language models develop trust toward humans in ways that mirror human patterns, guided by competence, benevolence, and integrity but also shaped by age, religion, and gender in financial contexts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using established behavioral theories, the study finds that LLM trust development shows an overall similarity to human trust development. In most but not all cases, LLM trust is strongly predicted by the trustworthiness dimensions of competence, benevolence, and integrity, and in some cases also biased by age, religion, and gender, especially in financial scenarios. This is particularly true for scenarios common in the literature and for newer models. Different models exhibit variation in how they estimate trust, with trustworthiness and demographic factors serving as weak predictors in some instances.
What carries the argument
The three trustworthiness dimensions of competence, benevolence, and integrity, together with demographic variables, used to predict effective trust across simulated experiments.
If this is right
- In most scenarios, trust levels track the human subject's competence, benevolence, and integrity.
- Demographic biases by age, religion, and gender appear most clearly in financial decision contexts.
- Newer models tend to display stronger and more human-like reliance on trustworthiness dimensions.
- Some models show weaker overall prediction by these factors, indicating inconsistent trust formation.
- Unchecked patterns could produce unintended effects in applications where LLMs influence decisions about people.
Where Pith is reading between the lines
- If the patterns persist outside simulations, AI agents might systematically favor or disfavor certain demographic groups in loan or hiring contexts without explicit intent.
- Adjusting prompts or fine-tuning could reduce unwanted demographic influences on trust judgments in future models.
- Similar trust mechanisms might surface in non-financial domains such as medical or legal advice if the same dimensions are at play.
- Longitudinal testing with live users would help determine whether simulation results translate to deployed systems.
Load-bearing premise
The simulated experiments accurately capture how LLMs would form effective trust in real deployment contexts rather than merely reflecting patterns in training data or prompt phrasing.
What would settle it
Conducting parallel experiments where actual humans interact with the same LLMs on identical trust-sensitive tasks and finding no systematic link between the trustworthiness dimensions or demographics and the LLMs' trust-related outputs would undermine the central claim.
read the original abstract
As large language models (LLMs) and LLM-based agents increasingly interact with humans in decision-making contexts, understanding the trust dynamics between humans and AI agents becomes a central concern. While considerable literature studies how humans trust AI agents, it is much less understood how LLM-based agents develop effective trust in humans. LLM-based agents likely rely on some sort of implicit effective trust in trust-related contexts (e.g., evaluating individual loan applications) to assist and affect decision making. Using established behavioral theories, we develop an approach that studies whether LLMs trust depends on the three major trustworthiness dimensions: competence, benevolence and integrity of the human subject. We also study how demographic variables affect effective trust. Across 43,200 simulated experiments, for five popular language models, across five different scenarios we find that LLM trust development shows an overall similarity to human trust development. We find that in most, but not all cases, LLM trust is strongly predicted by trustworthiness, and in some cases also biased by age, religion and gender, especially in financial scenarios. This is particularly true for scenarios common in the literature and for newer models. While the overall patterns align with human-like mechanisms of effective trust formation, different models exhibit variation in how they estimate trust; in some cases, trustworthiness and demographic factors are weak predictors of effective trust. These findings call for a better understanding of AI-to-human trust dynamics and monitoring of biases and trust development patterns to prevent unintended and potentially harmful outcomes in trust-sensitive applications of AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical study examining how large language models (LLMs) form effective trust in humans. Through 43,200 simulated experiments involving five popular LLMs across five different scenarios, the authors investigate the role of trustworthiness dimensions (competence, benevolence, and integrity) and demographic variables (age, religion, gender) in predicting trust. They report that LLM trust development generally mirrors human patterns, with trustworthiness being a strong predictor in most cases and demographic biases appearing in some financial scenarios, particularly for newer models.
Significance. If the findings hold under rigorous scrutiny, this work would provide valuable insights into AI-to-human trust dynamics, which is increasingly relevant as LLMs are deployed in decision-making roles. The large scale of simulations and comparison across models are strengths. However, the significance is tempered by potential limitations in how the simulations capture real-world trust formation versus training data artifacts.
major comments (3)
- Abstract: The abstract reports 43,200 simulations and clear patterns but provides no details on prompt construction, response coding, statistical controls, or how 'effective trust' was operationalized. Without these, it is impossible to assess whether the central patterns are robust or artifactual, which is load-bearing for the similarity-to-human-trust claim.
- Methods (implied in experimental design description): The experimental design does not appear to include controls for prompt paraphrasing, framing effects, or out-of-distribution human descriptions. This leaves open that observed predictors (especially age/religion/gender in financial scenarios) are artifacts of common co-occurrences in the training corpus rather than emergent trust computation.
- Results: Model-to-model variation is noted but not tied to any architectural or training difference that would distinguish genuine mechanism from prompt sensitivity, undermining the ability to interpret differences across the five models as evidence of stable trust mechanisms.
minor comments (2)
- Abstract: The phrasing 'in most, but not all cases' and 'in some cases also biased' could be made more precise by referencing specific tables or figures that quantify the effect sizes.
- Discussion: Explicitly address how the five scenarios were selected and whether they include both common literature scenarios and novel ones to strengthen the generalizability argument.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and robustness of our work on LLM trust dynamics. We address each major comment below and have revised the manuscript to incorporate additional details, robustness checks, and discussion where feasible.
read point-by-point responses
-
Referee: Abstract: The abstract reports 43,200 simulations and clear patterns but provides no details on prompt construction, response coding, statistical controls, or how 'effective trust' was operationalized. Without these, it is impossible to assess whether the central patterns are robust or artifactual, which is load-bearing for the similarity-to-human-trust claim.
Authors: We agree that greater transparency in the abstract would strengthen the manuscript. We have revised the abstract to include a concise description of how effective trust was operationalized (drawing on the three trustworthiness dimensions from behavioral trust literature) and to note the use of structured prompts with statistical regression controls for demographics and trustworthiness factors. Detailed explanations of prompt templates, response coding procedures (e.g., parsing model outputs for trust indicators), and the full statistical models are now more prominently referenced in the abstract and expanded in the Methods section. revision: yes
-
Referee: Methods (implied in experimental design description): The experimental design does not appear to include controls for prompt paraphrasing, framing effects, or out-of-distribution human descriptions. This leaves open that observed predictors (especially age/religion/gender in financial scenarios) are artifacts of common co-occurrences in the training corpus rather than emergent trust computation.
Authors: This concern about potential training-data artifacts is well-taken. We have added a dedicated robustness subsection to the Methods that reports results from paraphrased prompt variants and alternative scenario framings; the core effects of trustworthiness dimensions remain stable. We also incorporated a small set of out-of-distribution human descriptions in supplementary analyses. While we cannot completely eliminate all influences from pre-training corpora, the scale of the design (43,200 simulations across five models and five scenarios) and the independent manipulation of trustworthiness cues provide evidence that the patterns reflect more than simple co-occurrence. We have expanded the Limitations section to discuss this issue explicitly. revision: yes
-
Referee: Results: Model-to-model variation is noted but not tied to any architectural or training difference that would distinguish genuine mechanism from prompt sensitivity, undermining the ability to interpret differences across the five models as evidence of stable trust mechanisms.
Authors: We acknowledge that the manuscript primarily documents rather than causally attributes model differences. Because several models are closed-source, we lack direct access to training details that would allow precise mapping of variation to architecture. We have added discussion in the Results and Limitations sections that relates observed differences to publicly known factors such as model recency and scale, while explicitly noting that prompt sensitivity remains a possible contributor. This framing preserves the main claim of overall human-like patterns while qualifying the interpretation of cross-model differences. revision: partial
Circularity Check
No significant circularity in empirical simulation study
full rationale
The paper conducts an empirical investigation using prompted simulations across 43,200 scenarios with five LLMs and five domains to measure trust formation based on trustworthiness dimensions and demographics. No mathematical derivations, equations, or self-citation chains are present that would reduce the central claims to inputs by construction. The results derive directly from model output analysis rather than from fitted parameters renamed as predictions or ansatzes smuggled via prior work. This is a standard empirical design with independent content from the experimental data.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Across 43,200 simulated experiments... LLM trust is strongly predicted by trustworthiness... biased by age, religion and gender, especially in financial scenarios.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
trustworthiness dimensions: competence, benevolence and integrity
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Chiang, C. W., Lu, Z., Li, Z., & Yin, M. (2024, March). Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate. In Proceedings of the 29th International Conference on Intelligent User Interfaces (pp. 103-119)
work page 2024
- [2]
-
[3]
C., Dalisan, D., Korecki, M., Hausladen, C
Yang, J. C., Dalisan, D., Korecki, M., Hausladen, C. I., & Helbing, D. (2024, October). LLM Voting: Human Choices and AI Collective Decision-Making. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (Vol. 7, pp. 1696-1708)
work page 2024
-
[4]
Kim, Y., Park, C., Jeong, H., Chan, Y. S., Xu, X., McDuff, D., ... & Park, H. (2024). Mdagents: An adaptive collaboration of llms for medical decision- making. Advances in Neural Information Processing Systems, 37, 79410- 79452
work page 2024
-
[5]
Lakkaraju, K., Jones, S. E., Vuruma, S. K. R., Pallagani, V., Muppasani, B. C., & Srivastava, B. (2023, November). Llms for financial advisement: A fairness and efficacy study in personal decision making. In Proceedings of the Fourth ACM International Conference on AI in Finance (pp. 100-107)
work page 2023
-
[6]
Languagempc: Large language models as decision makers for autonomous driving
Sha, H., Mu, Y., Jiang, Y., Chen, L., Xu, C., Luo, P., ... & Ding, M. (2023). Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026
-
[7]
Chen, M., Tao, Z., Tang, W., Qin, T., Yang, R., & Zhu, C. (2024). Enhancing emergency decision-making with knowledge graphs and large language models. International Journal of Disaster Risk Reduction, 113, 104804. 20
work page 2024
-
[8]
Saeedy, A. (2025, February 24). The rise of artificial intelligence at JPMorgan. The Wall Street Journal
work page 2025
- [9]
- [10]
-
[11]
Springs Apps. (2024, February 22). Large language model statistics and numbers (2024). Springs Apps
work page 2024
-
[12]
Datanami Staff. (2023, November 21). New data unveils realities of generative AI adoption in the enterprise. Datanami
work page 2023
- [13]
-
[14]
Glikson, E., & Woolley, A. W. (2020). Human trust in artificial intelligence: Review of empirical research. Academy of management annals, 14(2), 627- 660
work page 2020
-
[15]
Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: focus on clinicians. Journal of medical Internet research, 22(6), e15154
work page 2020
-
[16]
Choung, H., David, P., & Ross, A. (2023). Trust in AI and its role in the acceptance of AI technologies. International Journal of Human–Computer Interaction, 39(9), 1727-1739
work page 2023
-
[17]
Lindsey, J., Gurnee, W., Ameisen, E., Chen, B., Pearce, A., Turner, N. L., ... Batson, J. (2025). On the biology of a large language model. Anthropic
work page 2025
- [18]
-
[19]
Wu, J. X., Wu, Y ., Chen, K. Y., & Hua, L. (2023). Building socially intelligent AI systems: Evidence from the trust game using artificial agents with deep learning. Management Science, 69(12), 7236-7252
work page 2023
-
[20]
Mayer, R. C., & Davis, J. H. (1999). The effect of the performance appraisal system on trust for management: A field quasi-experiment. Journal of applied psychology, 84(1), 123
work page 1999
-
[21]
Colquitt, J. A., Scott, B. A., & LePine, J. A. (2007). Trust, trustworthiness, and trust propensity: a meta-analytic test of their unique relationships with risk taking and job performance. Journal of applied psychology, 92(4), 909. 21
work page 2007
-
[22]
Frazier, M. L., Gooty, J., Little, L. M., & Nelson, D. L. (2015). Employee attachment: Implications for supervisor trustworthiness and trust. Journal of Business and Psychology, 30, 373-386
work page 2015
-
[23]
Poon, J. M. (2013). Effects of benevolence, integrity, and ability on trust‐in‐ supervisor. Employee Relations, 35(4), 396-407
work page 2013
-
[24]
Minza, M. (2019). Benevolence, competency, and integrity: Which one is more influential on trust in friendships. Jurnal Psikologi Vol, 18(1), 91-105
work page 2019
-
[25]
Anthis, J. R., Liu, R., Richardson, S. M., Kozlowski, A. C., Koch, B., Evans, J., ... & Bernstein, M. (2025). LLM Social Simulations Are a Promising Research Method. arXiv preprint arXiv:2504.02234
-
[26]
Gao, C., Lan, X., Lu, Z., Mao, J., Piao, J., Wang, H., ... & Li, Y. (2023). S3: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Gürcan, Ö. (2024). Llm-augmented agent-based modelling for social simulations: Challenges and opportunities. HHAI 2024: Hybrid Human AI Systems for the Social Good, 134-144
work page 2024
-
[28]
Colquitt, J. A., & Baer, M. D. (2023). Foster trust through ability, benevolence, and integrity. Principles of Organizational Behavior: The Handbook of Evidence‐Based Management 3rd Edition, 345-363
work page 2023
-
[30]
Lleo de Nalda, A., Guillen, M., & Gil Pechuan, I. (2016). The influence of ability, benevolence, and integrity in trust between managers and subordinates: The role of ethical reasoning. Business Ethics: A European Review, 25(4), 556-576
work page 2016
-
[31]
Connelly, B. L., Crook, T. R., Combs, J. G., Ketchen Jr, D. J., & Aguinis, H. (2018). Competence-and integrity-based trust in interorganizational relationships: which matters more?. Journal of Management, 44(3), 919-945
work page 2018
-
[32]
kelly is a warm person, joseph is a role model
Wan, Y., Pu, G., Sun, J., Garimella, A., Chang, K. W., & Peng, N. (2023). " kelly is a warm person, joseph is a role model": Gender biases in llm- generated reference letters. arXiv preprint arXiv:2310.09219
-
[33]
Plaza-del-Arco, F. M., Curry, A. C., Paoli, S., Curry, A., & Hovy, D. (2024). Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models. arXiv preprint arXiv:2407.06908. 22
- [34]
-
[35]
McCann, R., & Giles, H. (2002). Ageism in the workplace: A communication perspective
work page 2002
-
[36]
Heilman, M. E. (2012). Gender stereotypes and workplace bias. Research in organizational Behavior, 32, 113-135
work page 2012
-
[37]
Ghumman, S., Ryan, A. M., Barclay, L. A., & Markel, K. S. (2013). Religious discrimination in the workplace: A review and examination of current and future trends. Journal of Business and Psychology, 28, 439-454
work page 2013
-
[38]
Xie, Y., & Peng, S. (2009). How to repair customer trust after negative publicity: The roles of competence, integrity, benevolence, and forgiveness. Psychology & Marketing, 26(7), 572-589
work page 2009
-
[39]
Tan, H. H., & Tan, C. S. (2000). Toward the differentiation of trust in supervisor and trust in organization. Genetic, social, and general psychology monographs, 126(2), 241
work page 2000
-
[40]
Gill, H., Boies, K., Finegan, J. E., & McNally, J. (2005). Antecedents of trust: Establishing a boundary condition for the relation between propensity to trust and intention to trust. Journal of business and psychology, 19, 287-302
work page 2005
-
[41]
Mitigating Gender Bias in Natural Language Processing: Literature Review
Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., ... & Wang, W. Y. (2019). Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976. 23 SUPPLEMENTAL MATERIALS: The following tables serve as appendices to Tables 1–4 presented in the main Results section. They provide additional details and ex...
work page internal anchor Pith review Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.