pith. sign in

arxiv: 2605.20512 · v1 · pith:B4MK6PMNnew · submitted 2026-05-19 · 💻 cs.HC

Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks

Pith reviewed 2026-05-21 06:12 UTC · model grok-4.3

classification 💻 cs.HC
keywords AI reliancevalue framingLLM writing assistancehuman-AI interactionoverreliance reductionbias awarenesswriting tasksuser personalization
0
0 comments X

The pith

Framing an AI with specific values reduces users' reliance on its text suggestions by an average of 20 percent in writing tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether making an AI's value biases explicit can decrease over-reliance on its outputs during writing. Large language models often generate text aligned with Western values, and users frequently accept large portions of these suggestions, which can homogenize styles across different cultural backgrounds. Through a between-subjects experiment involving Indian and American participants completing AI-supported writing tasks, the authors compare a control condition to two interventions: one showing an overview of the AI's framed values and another comparing those values to the user's own. Results demonstrate that exposure to the AI's values alone lowers the share of AI-generated content in final essays by about 20 percent and increases the amount of unique text produced. This points to a practical way to encourage more individualized writing by raising awareness of AI value alignments.

Core claim

In the experiment, participants wrote essays with AI assistance under three conditions: no intervention, viewing an overview of the AI's framed values, or viewing those values compared to their own. The proportion of the final essay generated by the AI dropped by an average of 20 percent when participants saw the AI's framed values. Essays also showed more unique text in the condition where values were shown without personal comparison, suggesting that simple value disclosures can prompt users to personalize their outputs rather than default to AI suggestions.

What carries the argument

The intervention of displaying an overview of the AI's framed values, which serves to raise user awareness of potential biases and thereby decrease acceptance of AI-generated text in the final product.

If this is right

  • Users produce essays with a higher share of their own writing when informed about the AI's values.
  • Writing outputs become less homogenized and more reflective of individual perspectives.
  • A low-effort display of AI values can serve as an intervention to counter over-acceptance of suggestions.
  • The effect holds across participants from different cultural backgrounds in the tested groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar value-framing displays could be tested in other AI-assisted tasks such as summarization or idea generation to check if they also boost user originality.
  • Interface designers might consider making value disclosures a default feature to support user agency over time.
  • Widespread adoption could slow the convergence of global writing styles toward the AI's default value set.

Load-bearing premise

That participants notice the value overview and change their writing behavior because of it, rather than due to other unmeasured influences or experimental demand effects.

What would settle it

A follow-up study in which participants who see the AI value overview still generate final essays with the same proportion of AI text as those in the no-intervention control group.

Figures

Figures reproduced from arXiv: 2605.20512 by Alice Gao, Andrew N. Meltzoff, Katharina Reinecke, Maarten Sap.

Figure 1
Figure 1. Figure 1: A snapshot of our intervention, showing the AI’s framed values we showed participants with the AI’s answer to [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: AI reliance metrics across our different study conditions. We observe a decrease in the AI reliance metrics (a) AI [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AI reliance metrics across our different intervention conditions within each country. Though the (b) AI acceptance [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Quantitative writing metrics from our participants across different conditions. (a) Lexical diversity does not differ [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Our interface for the writing tasks for all participants. AI suggestions were shown in light gray and could be accepted [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
read the original abstract

Despite a global user base adopting large language models (LLMs) for daily writing tasks, model suggestions tend to align with Western values. Research has shown users commonly accept a high fraction of these AI suggestions, homogenizing writing styles and rendering outputs more ``Western'' than intended. While this suggests a need to reduce AI reliance, it remains unknown what kind of interventions could achieve this. Can framing the AI with specific values, and comparing it to one's own, make users less susceptible to overreliance and support more unique writing? We tested this hypothesis in a between-subjects online experiment with Indian and American participants (n=149) in which they were asked to perform AI-supported writing tasks, either 1) without an intervention, 2) after seeing an overview of the AI's framed values, or 3) after seeing an overview of the AI's framed values compared to their own. Our results show that seeing the AI's framed values reduces AI reliance, i.e., the proportion of the final essay generated by the AI, by an average of 20\%. Additionally, when participants saw an overview of the AI's framed values (without comparison to their own values), the final essays contain more unique text than without intervention. Our findings emphasize the importance of educating users about potential value biases in AI, showing that raising awareness with a simple overview of values encourages users to personalize their writing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports a between-subjects online experiment (n=149 Indian and American participants) testing AI-supported writing tasks under three conditions: no intervention, overview of the AI's framed values, or overview of the AI's framed values compared to the participant's own values. The central claim is that seeing the AI's framed values reduces AI reliance—defined as the proportion of the final essay generated by the AI—by an average of 20%, with an additional finding that the non-comparison condition produces essays with more unique text.

Significance. If the measurement and statistical claims hold, the work offers a low-cost intervention for reducing overreliance on value-biased LLMs in writing tasks and for encouraging more personalized output. The between-subjects design with participants from two cultural groups provides a concrete empirical basis for HCI research on value transparency and AI literacy.

major comments (3)
  1. [§3.2] §3.2 (Measurement of AI Reliance): The proportion of final essay text attributed to the AI is treated as a direct behavioral measure of reliance, yet the manuscript does not specify whether this is computed via string overlap, edit distance, LLM-based attribution, or another method, nor whether it was validated against human-coded ground truth. This leaves open the possibility that heavy post-editing of copied AI blocks registers as reduced reliance even when the suggestion was initially accepted.
  2. [§4] §4 (Results): The headline 20% average reduction is stated without reported statistical tests, confidence intervals, effect sizes, or randomization checks for the between-subjects assignment. These omissions make it impossible to evaluate whether the observed difference is reliable or could be explained by unmeasured demand effects or condition-specific writing styles.
  3. [§3.1] §3.1 (Experimental Conditions): The manuscript does not report how participants' perception of the value-framing overview was measured or whether manipulation checks confirmed that the intervention was interpreted as intended and independent of demand characteristics.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'more unique text' is used without a precise operational definition or reference to how uniqueness was quantified relative to the AI suggestions.
  2. [Table 1] Table 1 or equivalent demographics table: Clarify the exact distribution of Indian versus American participants across the three conditions to allow assessment of cultural balance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help us improve the clarity and rigor of our manuscript. We address each major comment below and commit to revisions that strengthen the reporting of methods and results without altering the core findings.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Measurement of AI Reliance): The proportion of final essay text attributed to the AI is treated as a direct behavioral measure of reliance, yet the manuscript does not specify whether this is computed via string overlap, edit distance, LLM-based attribution, or another method, nor whether it was validated against human-coded ground truth. This leaves open the possibility that heavy post-editing of copied AI blocks registers as reduced reliance even when the suggestion was initially accepted.

    Authors: We agree that the exact computation method requires explicit description for reproducibility. In the revised manuscript we will expand §3.2 to state that AI reliance was quantified as the normalized Levenshtein edit distance between each AI suggestion and the corresponding segment of the final essay, yielding the proportion of retained AI-generated content. This approach intentionally accounts for post-editing rather than treating any modification as zero reliance. We did not conduct a separate human-coded validation study; we will acknowledge this as a limitation and note that the measure still reflects behavioral retention of AI text in the final output. revision: yes

  2. Referee: [§4] §4 (Results): The headline 20% average reduction is stated without reported statistical tests, confidence intervals, effect sizes, or randomization checks for the between-subjects assignment. These omissions make it impossible to evaluate whether the observed difference is reliable or could be explained by unmeasured demand effects or condition-specific writing styles.

    Authors: We concur that inferential statistics and supporting details are necessary. The revised §4 will report the results of a one-way ANOVA (or appropriate non-parametric test) comparing the three conditions on AI reliance, including F-statistic, p-value, confidence intervals around the mean difference, and effect size (Cohen’s d). We will also include randomization checks (balance tests on age, gender, and cultural background across conditions) and discuss potential demand effects as a limitation. The reported 20% figure represents the observed mean reduction; the added statistics will allow readers to assess its reliability. revision: yes

  3. Referee: [§3.1] §3.1 (Experimental Conditions): The manuscript does not report how participants' perception of the value-framing overview was measured or whether manipulation checks confirmed that the intervention was interpreted as intended and independent of demand characteristics.

    Authors: We recognize the value of explicit manipulation checks. In the revision we will add to §3.1 a description of the post-task questionnaire items that probed participants’ recall and perceived relevance of the value overview. Although formal manipulation checks were not part of the original protocol, we will report any available self-report data on value awareness and will discuss demand characteristics as a potential limitation of the online between-subjects design. If the data are insufficient, we will note this and suggest it for future studies. revision: partial

Circularity Check

0 steps flagged

No circularity: result grounded in new between-subjects experiment

full rationale

The paper's central claim—that framing an AI with values reduces AI reliance by ~20%—is presented as the direct outcome of a new between-subjects online experiment (n=149, three conditions) rather than any derivation, equation, or self-referential definition. The abstract and described methods report measured proportions of AI-generated text in submitted essays without invoking fitted parameters, prior self-citations as uniqueness theorems, or ansatzes that reduce the result to its inputs by construction. No load-bearing steps collapse the reported effect to the experimental design itself; the finding remains an independent empirical observation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on standard behavioral-experiment assumptions rather than mathematical derivations. No free parameters or invented entities are introduced. The main axioms are domain assumptions about participant comprehension and measurement validity.

axioms (2)
  • domain assumption Participants understand and respond to the value overview as intended by the experimenters without significant demand characteristics or misinterpretation.
    Invoked implicitly when attributing the 20% reduction to the value-framing intervention in the abstract.
  • domain assumption The proportion of final essay text generated by the AI can be reliably measured and attributed to user behavior rather than interface artifacts.
    Central to the reported outcome variable but not detailed in the abstract.

pith-pipeline@v0.9.0 · 5789 in / 1393 out tokens · 34377 ms · 2026-05-21T06:12:33.607293+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

131 extracted references · 131 canonical work pages · 2 internal anchors

  1. [1]

    Dhruv Agarwal, Mor Naaman, and Aditya Vashistha. 2025. AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). ACM, 1–21. doi:10.1145/ 3706598.3713564

  2. [2]

    Barrett R Anderson, Jash Hemant Shah, and Max Kreminski. 2024. Homogenization Effects of Large Language Models on Human Creative Ideation. InProceedings of the 16th Conference on Creativity & Cognition(Chicago, IL, USA)(C&C ’24). Association for Computing Machinery, New York, NY, USA, 413–425. doi:10.1145/3635636.3656204

  3. [3]

    Arnold, Krysta Chauncey, and Krzysztof Z

    Kenneth C. Arnold, Krysta Chauncey, and Krzysztof Z. Gajos. 2018. Sentiment Bias in Predictive Text Recommendations Results in Biased Writing. InProceedings of the 44th Graphics Interface Conference(Toronto, Canada)(GI ’18). Canadian Human-Computer Communications Society, Waterloo, CAN, 42–49. doi:10.20380/GI2018.07

  4. [4]

    Arnold, Krysta Chauncey, and Krzysztof Z

    Kenneth C. Arnold, Krysta Chauncey, and Krzysztof Z. Gajos. 2020. Predictive text encourages predictable writing. InProceedings of the 25th International Conference on Intelligent User Interfaces(Cagliari, Italy)(IUI ’20). Association for Computing Machinery, New York, NY, USA, 128–138. doi:10.1145/3377325.3377523

  5. [5]

    Diego Aycinena, Lucas Rentschler, Benjamin Beranek, and Jonathan F. Schulz. 2022. Social norms and dishonesty across societies. Proceedings of the National Academy of Sciences119, 31 (2022), e2120138119. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2120138119 doi:10.1073/pnas.2120138119

  6. [6]

    2006.Glossary of corpus linguistics

    Paul Baker. 2006.Glossary of corpus linguistics. Edinburgh University Press

  7. [7]

    Ritwik Banerjee. 2018. On the interpretation of World Values Survey trust question-global expectations vs. local beliefs.European Journal of Political Economy55 (2018), 491–510

  8. [8]

    Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI team performance. InProceedings of the AAAI conference on human computation and crowdsourcing, Vol. 7. 2–11

  9. [9]

    Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S. Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. arXiv:2006.14779 [cs.AI] https://arxiv.org/abs/2006.14779

  10. [10]

    2009.Trasmettere valori

    Daniela Barni et al. 2009.Trasmettere valori. Tre generazioni familiari a confronto. Unicopli

  11. [11]

    Jeffrey Basoah, Daniel Chechelnitsky, Tao Long, Katharina Reinecke, Chrysoula Zerva, Kaitlyn Zhou, Mark Díaz, and Maarten Sap. 2025. Not Like Us, Hunty: Measuring Perceptions and Behavioral Effects of Minoritized Anthropomorphic Cues in LLMs. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association fo...

  12. [12]

    Cunningham, Erica Adams, Alisha Bose, Aditi Jain, Kaustubh Yadav, Zhengyang Yang, Katharina Reinecke, and Daniela Rosner

    Jeffrey Basoah, Jay L. Cunningham, Erica Adams, Alisha Bose, Aditi Jain, Kaustubh Yadav, Zhengyang Yang, Katharina Reinecke, and Daniela Rosner. 2025. Should AI Mimic People? Understanding AI-Supported Writing Technology Among Black Users.Proc. ACM Hum.-Comput. Interact.9, 7, Article CSCW242 (Oct. 2025), 51 pages. doi:10.1145/3757423

  13. [13]

    2000.Protecting Indigenous knowledge and heritage: A global challenge

    Marie Battiste and James (Sa’ke’j) Youngblood Henderson. 2000.Protecting Indigenous knowledge and heritage: A global challenge. University of British Columbia Press

  14. [14]

    Mohsen Bayati, Mark Braverman, Michael Gillam, Karen M Mack, George Ruiz, Mark S Smith, and Eric Horvitz. 2014. Data-driven decisions for reducing readmissions for heart failure: general methodology and case study.PLoS One9, 10 (Oct. 2014), e109264

  15. [15]

    Gábor Bella, Paula Helm, Gertraud Koch, and Fausto Giunchiglia. 2024. Tackling Language Modelling Bias in Support of Linguistic Diversity. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY, USA, 562–572. doi:10.1145/3630106.3658925

  16. [16]

    Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(Virtual Event, Canada)(FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623. doi:10.114...

  17. [17]

    Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, ...

  18. [18]

    Lea Boecker, David D Loschelder, and Sascha Topolinski. 2022. How individuals react emotionally to others’(mis) fortunes: A social comparison framework.Journal of Personality and Social Psychology123, 1 (2022), 55

  19. [19]

    Self-Expression Values,

    Eduard J. Bomhoff and Mary Man-Li Gu. 2012. East Asia Remains Different: A Comment on the Index of “Self-Expression Values, ” by Inglehart and Welzel.Journal of Cross-Cultural Psychology43, 3 (2012), 373–383. arXiv:https://doi.org/10.1177/0022022111435096 doi:10.1177/0022022111435096

  20. [20]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative Research in Psychology3 (01 2006), 77–101. doi:10.1191/1478088706qp063oa Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  21. [21]

    Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (April 2021), 21 pages. doi:10.1145/3449287

  22. [22]

    Daniel Buschek, Martin Zürn, and Malin Eiband. 2021. The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, Article ...

  23. [23]

    Allison Chen, Sunnie S. Y. Kim, Angel Franyutti, Amaya Dharmasiri, Kushin Mukherjee, Olga Russakovsky, and Judith E. Fan. 2026. Presenting Large Language Models as Companions Affects What Mental Capacities People Attribute to Them. arXiv:2510.18039 [cs.HC] https://arxiv.org/abs/2510.18039

  24. [24]

    Kaiping Chen, Anqi Shao, Jirayu Burapacheep, and Yixuan Li. 2024. Conversational AI and equity through assessing GPT-3’s communi- cation with diverse social groups on contentious topics.Scientific Reports14, 1 (18 Jan 2024), 1561. doi:10.1038/s41598-024-51969-w

  25. [25]

    As an AI language model, I cannot

    Paramveer S. Dhillon, Somayeh Molaei, Jiaqi Li, Maximilian Golub, Shaochun Zheng, and Lionel Peter Robert. 2024. Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New Y...

  26. [26]

    Leon Festinger. 1954. A theory of social comparison processes.Human relations7, 2 (1954), 117–140

  27. [27]

    Alexandra Fleischmann, Joris Lammers, Kathi Diel, Wilhelm Hofmann, and Adam D Galinsky. 2021. More threatening and more diagnostic: How moral comparisons differ from social comparisons.Journal of Personality and Social Psychology121, 5 (2021), 1057

  28. [28]

    Riccardo Fogliato, Shreya Chappidi, Matthew Lungren, Paul Fisher, Diane Wilson, Michael Fitzke, Mark Parkinson, Eric Horvitz, Kori Inkpen, and Besmira Nushi. 2022. Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(Seoul, Republic o...

  29. [29]

    Michael C Frank. 2023. Baby steps in evaluating the capacities of large language models.Nature Reviews Psychology2, 8 (2023), 451–452

  30. [30]

    I wouldn’t say offensive but

    Vinitha Gadiraju, Shaun Kane, Sunipa Dev, Alex Taylor, Ding Wang, Remi Denton, and Robin Brewer. 2023. "I wouldn’t say offensive but... ": Disability-Centered Perspectives on Large Language Models. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New Y...

  31. [31]

    Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A Smith. 2020. Realtoxicityprompts: Evaluating neural toxic degeneration in language models.arXiv preprint arXiv:2009.11462(2020)

  32. [32]

    Katy Ilonka Gero, Vivian Liu, and Lydia B. Chilton. 2021. Sparks: Inspiration for Science Writing using Language Models. arXiv:2110.07640 [cs.HC] https://arxiv.org/abs/2110.07640

  33. [33]

    Sourojit Ghosh and Aylin Caliskan. 2023. ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (Montréal, QC, Canada)(AIES ’23). Association for Computing Machinery, New York, NY, USA, ...

  34. [34]

    Gill and Shaun Nichols

    Michael B. Gill and Shaun Nichols. 2008. Sentimentalist Pluralism: Moral Psychology and Philosophical Ethics.Philosophical Issues18 (2008), 143–163. http://www.jstor.org/stable/27749904

  35. [35]

    Nicole Gillespie, Steven Lockey, Tabi Ward, A Macdade, and G Hassed. 2025. Trust, attitudes and use of artificial intelligence. (2025)

  36. [36]

    Ben Green and Yiling Chen. 2019. The Principles and Limits of Algorithm-in-the-Loop Decision Making.Proc. ACM Hum.-Comput. Interact.3, CSCW, Article 50 (Nov. 2019), 24 pages. doi:10.1145/3359152

  37. [37]

    bias busting

    Jessica Guynn. 2015. Google’s “bias busting” workshops target hidden prejudices.USA Today12 (2015)

  38. [38]

    Haerpfer, R

    C. Haerpfer, R. Inglehart, A. Moreno, C. Welzel, K. Kizilova, J. Diez-Medrano, M. Lagos, P. Norris, E. Ponarin, and B. Puranen. 2024. World Values Survey Wave 7 (2017-2022) Cross-National Data-Set. doi:10.14281/18241.24 (eds.)

  39. [39]

    Kizilcec, Dominic DiFranzo, Zhila Aghajari, Hannah Mieczkowski, Karen Levy, Mor Naaman, Jeffrey Hancock, and Malte F

    Jess Hohenstein, Rene F. Kizilcec, Dominic DiFranzo, Zhila Aghajari, Hannah Mieczkowski, Karen Levy, Mor Naaman, Jeffrey Hancock, and Malte F. Jung. 2023. Artificial intelligence in communication impacts language and social relationships.Scientific Reports13, 1 (04 Apr 2023), 5487. doi:10.1038/s41598-023-30938-9

  40. [40]

    Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman. 2023. Co-Writing with Opinionated Language Models Affects Users’ Views. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 111, 15 pages. doi:10.1145/3544548.3581196

  41. [41]

    Rebecca L Johnson, Giada Pistilli, Natalia Menédez-González, Leslye Denisse Dias Duran, Enrico Panai, Julija Kalpokiene, and Donald Jay Bertulfo. 2022. The Ghost in the Machine has an American accent: value conflict in GPT-3. arXiv:2203.07785 [cs.CL] https://arxiv.org/ abs/2203.07785

  42. [42]

    Kowe Kadoma, Marianne Aubin Le Quere, Xiyu Jenny Fu, Christin Munsch, Danaë Metaxa, and Mor Naaman. 2024. The Role of Inclusion, Control, and Ownership in Workplace AI-Mediated Communication. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, A...

  43. [43]

    Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, ...

  44. [44]

    Markelle Kelly, Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. 2023. Capturing Humans’ Mental Models of AI: An Item Response Theory Approach. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1723–1734. doi:10.1145/3593013.3594111

  45. [46]

    Ariba Khan, Stephen Casper, and Dylan Hadfield-Menell. 2025. Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 2151–2165. doi:10.1145/3715275.3732147

  46. [47]

    Oliver Klingefjord, Ryan Lowe, and Joe Edelman. 2024. What are human values, and how do we align AI to them? arXiv:2404.10636 [cs.CY] https://arxiv.org/abs/2404.10636

  47. [48]

    Stephen M. Kosslyn. 1989. Understanding charts and graphs.Applied Cognitive Psychology3, 3 (1989), 185–225. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/acp.2350030302 doi:10.1002/acp.2350030302

  48. [49]

    Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more? the effects of mental model soundness on personalizing an intelligent agent. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1–10. doi:10.1145/2207676.2207678

  49. [50]

    Why is ’Chicago’ deceptive?

    Vivian Lai, Han Liu, and Chenhao Tan. 2020. "Why is ’Chicago’ deceptive?" Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376873

  50. [51]

    Cynthia Lee. 2017. Awareness as a first step toward overcoming implicit bias.Enhancing justice: Reducing bias289 (2017)

  51. [52]

    Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 388, 19 pages. doi:10.1145/3491102.3502030

  52. [53]

    Lee, Jacob M

    Messi H.J. Lee, Jacob M. Montgomery, and Calvin K. Lai. 2024. Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY,...

  53. [54]

    Yuxuan Li, Hirokazu Shirado, and Sauvik Das. 2025. Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 3303–3325. doi:10.1145/3715275.3732212

  54. [55]

    Marjaana Lindeman and Markku Verkasalo. 2005. Measuring Values With the Short Schwartz’s Value Survey.Journal of Personality Assessment85, 2 (2005), 170–178. doi:10.1207/s15327752jpa8502_09 PMID: 16171417

  55. [56]

    Thomas Mejtoft, Sarah Hale, and Ulrik Söderström. 2019. Design Friction. InProceedings of the 31st European Conference on Cognitive Ergonomics(BELFAST, United Kingdom)(ECCE ’19). Association for Computing Machinery, New York, NY, USA, 41–44. doi:10.1145/ 3335082.3335106

  56. [57]

    Jared Moore, Tanvi Deshpande, and Diyi Yang. 2024. Are Large Language Models Consistent over Value-laden Questions? arXiv:2407.02996 [cs.CL] https://arxiv.org/abs/2407.02996

  57. [58]

    Jimin Mun, Wei Bin Au Yeong, Wesley Hanwen Deng, Jana Schaich Borg, and Maarten Sap. 2025. Why (not) use AI? Analyzing People’s Reasoning and Conditions for AI Acceptability. InAIES. https://arxiv.org/abs/2502.07287

  58. [59]

    Jimin Mun, Liwei Jiang, Jenny Liang, Inyoung Cheong, Nicole DeCario, Yejin Choi, Tadayoshi Kohno, and Maarten Sap. 2024. Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits. InAIES. https://arxiv.org/abs/2403.14791

  59. [60]

    Deepa Muralidhar, Rafik Belloum, and Ashwin Ashok. 2025. Operationalizing selective transparency using progressive disclosure in artificial intelligence clinical diagnosis systems.International Journal of Human-Computer Studies204 (2025), 103591. doi:10.1016/j.ijhcs. 2025.103591

  60. [61]

    1988.The psychology of everyday things.Basic books

    Donald A Norman. 1988.The psychology of everyday things.Basic books

  61. [62]

    Gregory B Northcraft and Margaret Ann Neale. 1990. Organizational behavior: A management challenge.(No Title)(1990)

  62. [63]

    Stefan Palan and Christian Schitter. 2018. Prolific.ac—A subject pool for online experiments.Journal of Behavioral and Experimental Finance17 (2018), 22–27. doi:10.1016/j.jbef.2017.12.004

  63. [64]

    Joon Sung Park, Rick Barber, Alex Kirlik, and Karrie Karahalios. 2019. A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy.Proc. ACM Hum.-Comput. Interact.3, CSCW, Article 102 (Nov. 2019), 15 pages. doi:10.1145/3359204 Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  64. [65]

    Savvas Petridis, Nicholas Diakopoulos, Kevin Crowston, Mark Hansen, Keren Henderson, Stan Jastrzebski, Jeffrey V Nickerson, and Lydia B Chilton. 2023. AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for C...

  65. [66]

    Ritika Poddar, Rashmi Sinha, Mor Naaman, and Maurice Jakesch. 2023. AI Writing Assistants Influence Topic Choice in Self-Presentation. InExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 29, 6 pages. doi:10.1145/3544549.3585893

  66. [67]

    Devin G Pope, Joseph Price, and Justin Wolfers. 2018. Awareness reduces racial bias.Management Science64, 11 (2018), 4988–4995

  67. [68]

    Neil Rathi, Dan Jurafsky, and Kaitlyn Zhou. 2025. Humans overrely on overconfident language models, across languages. arXiv:2507.06306 [cs.CL] https://arxiv.org/abs/2507.06306

  68. [69]

    Claudia Russo, Francesca Danioni, Ioana Zagrean, and Daniela Barni. 2022. Changing Personal Values through Value-Manipulation Tasks: A Systematic Literature Review Based on Schwartz’s Theory of Basic Human Values.Eur J Investig Health Psychol Educ12, 7 (June 2022), 692–715

  69. [70]

    Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, and Dirk Hovy. 2024. Po- litical Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models. arXiv:2402.16786 [cs.CL] https://arxiv.org/abs/2402.16786

  70. [71]

    Liang, Ronan Le Bras, Katharina Reinecke, and Maarten Sap

    Sebastin Santy, Jenny T. Liang, Ronan Le Bras, Katharina Reinecke, and Maarten Sap. 2023. NLPositionality: Characterizing Design Biases of Datasets and Models. arXiv:2306.01943 [cs.CL] https://arxiv.org/abs/2306.01943

  71. [72]

    Smith, and James Pennebaker

    Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, and James Pennebaker. 2020. Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for...

  72. [73]

    Shalom Schwartz. 2006. A Theory of Cultural Value Orientations: Explication and Applications.Comparative Sociology5, 2-3 (2006), 137 – 182. doi:10.1163/156913306778667357

  73. [74]

    Richard Shiffrin and Melanie Mitchell. 2023. Probing the psychology of AI models.Proceedings of the National Academy of Sciences120, 10 (2023), e2300963120. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2300963120 doi:10.1073/pnas.2300963120

  74. [75]

    Herbert A Simon and Allen Newell. 1971. Human problem solving: The state of the theory in 1970.American psychologist26, 2 (1971), 145

  75. [76]

    Glassman

    Nikhil Singh, Guillermo Bernal, Daria Savchenko, and Elena L. Glassman. 2023. Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence.ACM Trans. Comput.-Hum. Interact.30, 5, Article 68 (Sept. 2023), 57 pages. doi:10.1145/3511599

  76. [77]

    Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, and Yejin Choi

    Taylor Sorensen, Liwei Jiang, Jena D. Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, and Yejin Choi. 2024. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties.Proceedings of the AAAI Conference on Artificial Intelligence38, 18 (March 20...

  77. [78]

    Aaron Springer and Steve Whittaker. 2020. Progressive Disclosure: When, Why, and How Do Users Want Algorithmic Transparency Information?ACM Trans. Interact. Intell. Syst.10, 4, Article 29 (Oct. 2020), 32 pages. doi:10.1145/3374218

  78. [79]

    Kate Sweeny, James A Shepperd, and Jennifer L Howell. 2012. Do as I say (not as I do): Inconsistency between behavior and values. Basic and applied social psychology34, 2 (2012), 128–135

  79. [80]

    Yan Tao, Olga Viberg, Ryan S Baker, and René F Kizilcec. 2024. Cultural bias and cultural alignment of large language models.PNAS Nexus3, 9 (09 2024), pgae346. arXiv:https://academic.oup.com/pnasnexus/article-pdf/3/9/pgae346/59151559/pgae346.pdf doi:10.1093/ pnasnexus/pgae346

  80. [81]

    Peter Todd and Izak Benbasat. 1994. The Influence of Decision Aids on Choice Strategies: An Experimental Analysis of the Role of Cognitive Effort.Organizational Behavior and Human Decision Processes60, 1 (1994), 36–74. doi:10.1006/obhd.1994.1074

Showing first 80 references.