Beliefs and Misconceptions around Integrated Conversational AI

Adam Jenkins; Jose Such; Mark Cote; William Seymour

arxiv: 2605.14849 · v1 · pith:35MPBQPZnew · submitted 2026-05-14 · 💻 cs.HC

Beliefs and Misconceptions around Integrated Conversational AI

William Seymour , Adam Jenkins , Mark Cote , Jose Such This is my paper

Pith reviewed 2026-06-30 20:11 UTC · model grok-4.3

classification 💻 cs.HC

keywords conversational AIuser trustcitationsfact-checkingintegrated AIprompting strategiesLLM perceptionsbrowser extension

0 comments

The pith

Citations in integrated conversational AI raise perceived trustworthiness without prompting users to verify the sources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how users build understanding and decide on trust when conversational AI is embedded directly into a web browser. In a study of 20 participants completing information retrieval and planning tasks with Copilot in Microsoft Edge, users combined prior views of LLMs with search engine habits to shape their prompting and evaluation. A central finding is that visible citations made answers seem more reliable, yet participants rarely felt compelled to inspect those citations themselves. When they did check facts, they frequently turned to the exact sources the AI had already referenced. This pattern shows how integration can quietly align user verification behavior with the system's own outputs.

Core claim

Participants relied on a combination of existing perceptions of LLMs and internet search, tracing the effect of beliefs about how Copilot generated answers on prompting strategies. The inclusion of citations increased the trustworthiness of answers without participants feeling the need to check them, with participants often reaching for the same information sources as the CAI when fact-checking.

What carries the argument

The controlled user study of 20 participants performing information-retrieval and planning tasks inside a browser extension, which links beliefs about answer generation to prompting choices and citation-driven trust.

If this is right

Including citations in AI responses can raise user acceptance of outputs even when verification does not occur.
Users may mirror the source selection of the integrated AI during their own fact-checking.
Pre-existing beliefs about how LLMs work directly shape how people phrase prompts to the system.
Trust mechanisms in integrated AI rest on surface markers such as citations rather than on active source inspection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of other embedded AI tools may see similar citation effects if the integration hides the AI's origins as effectively as a browser extension does.
Over-reliance on AI-listed sources could narrow the range of information people encounter when double-checking answers.
Future interfaces might need explicit prompts or training to encourage verification outside the AI's cited set.
The pattern could intensify as conversational features spread across more productivity applications beyond browsers.

Load-bearing premise

Observations from 20 participants in controlled tasks inside one browser extension will generalize to everyday real-world use of integrated conversational AI.

What would settle it

A follow-up observation in which users in uncontrolled settings independently verify citations or select different sources than those listed by the AI would undermine the reported trust and fact-checking pattern.

read the original abstract

LLM-driven conversational AI is beginning to disappear into the background, shifting from something used directly towards something increasingly integrated into existing workflows. In the process, markers of origin and training are smoothed away as LLMs become commodified in the eyes of users. We explore how people approach using a web browser with conversational AI built in, focusing on how they develop their understanding and determine whether to trust its outputs. We conducted a study where 20 participants used the Copilot AI features in Microsoft Edge to conduct information retrieval and planning tasks. Participants relied on a combination of existing perceptions of LLMs and internet search, tracing the effect of beliefs about how Copilot generated answers on prompting strategies. The inclusion of citations increased the trustworthiness of answers without participants feeling the need to be check them, with participants often reaching for the same information sources as the CAI when fact-checking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small qualitative study on Edge Copilot finds citations boost trust without much checking, but n=20 in controlled tasks limits how far the results travel.

read the letter

The paper's main observation is that citations made Copilot answers seem more trustworthy to the 20 participants, who then often skipped verification or reached for the same sources the AI had cited. This came from a study of information retrieval and planning tasks inside Microsoft Edge.

The work is new in its focus on integrated browser AI rather than standalone chat interfaces. It links existing beliefs about LLMs and search to prompting strategies and fact-checking habits, and it notes the specific effect of source overlap. That combination is not covered in the prior standalone-LLM work referenced.

It does a reasonable job of capturing real behaviors in a commercial tool that people already use. The qualitative data shows how the AI disappearing into the workflow changes how trust forms compared with explicit chatbots.

The soft spot is generalizability. The stress-test note is correct: controlled tasks in one extension with a small sample leave open whether the citation effect and source-matching behavior are stable or shaped by the experimental setup, tool novelty, and task type. Everyday multi-tool use over time could differ. The abstract also gives no detail on interview protocol, coding scheme, or analysis, so the strength of the claims depends on the full methods section.

This is for HCI researchers working on trust in embedded AI systems. Someone collecting early observations on integrated tools could find the concrete examples useful, but the paper does not support broad claims about how people develop trust in conversational AI overall.

I would send it for peer review. The empirical angle on an emerging integration pattern is worth referee time, even if the authors need to tighten the scope and methods reporting.

Referee Report

2 major / 1 minor

Summary. The paper reports a qualitative study in which 20 participants performed information-retrieval and planning tasks inside the Microsoft Edge browser using its integrated Copilot feature. It claims that participants drew on prior beliefs about LLMs and web search to shape their prompting, that the presence of citations increased perceived trustworthiness without prompting verification, and that participants tended to consult the same external sources that Copilot itself referenced when they did check facts.

Significance. If the reported patterns hold beyond the specific experimental context, the work would supply useful empirical grounding for how trust and verification behaviors emerge when conversational AI is embedded in everyday tools rather than used as a standalone interface. The study design itself, however, supplies no evidence that the observed effects are stable properties of integrated CAI rather than artifacts of the single-extension, short-session, controlled-task setting.

major comments (2)

[Methods / Results] The central claims about citation effects on trustworthiness and source-matching in fact-checking rest on an analysis whose protocol, coding scheme, and reliability metrics are not described. The abstract states the findings but the Methods and Results sections supply no detail on interview protocol, coding scheme, inter-rater reliability, or limitations; without these the reader cannot assess whether the reported patterns are reproducible or whether they are shaped by the particular task framing.
[Discussion / Limitations] The generalizability concern is load-bearing: the study uses a single browser extension, 20 participants, and controlled information-retrieval/planning tasks. The manuscript does not discuss how the novelty of the tool, the short session length, or the participant pool might have produced prompting and verification habits that would not appear in everyday, multi-tool, long-term use of integrated conversational AI.

minor comments (1)

[Introduction] The abstract and introduction use the term 'integrated conversational AI' without a precise operational definition or comparison to standalone chat interfaces; a short clarifying paragraph would help readers map the claims onto existing literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We find the feedback constructive and will revise the manuscript to address the concerns raised regarding methodological transparency and generalizability.

read point-by-point responses

Referee: [Methods / Results] The central claims about citation effects on trustworthiness and source-matching in fact-checking rest on an analysis whose protocol, coding scheme, and reliability metrics are not described. The abstract states the findings but the Methods and Results sections supply no detail on interview protocol, coding scheme, inter-rater reliability, or limitations; without these the reader cannot assess whether the reported patterns are reproducible or whether they are shaped by the particular task framing.

Authors: We agree with this assessment. The initial manuscript did not provide adequate detail on the qualitative analysis process. In the revised version, we will include a detailed description of the interview protocol (including the semi-structured questions used), the inductive thematic analysis approach, the coding scheme with examples, and any steps taken for reliability such as multiple coders reviewing transcripts. We will also explicitly discuss limitations related to task framing in a new Limitations section. This will allow readers to better evaluate the reproducibility and context of our findings. revision: yes
Referee: [Discussion / Limitations] The generalizability concern is load-bearing: the study uses a single browser extension, 20 participants, and controlled information-retrieval/planning tasks. The manuscript does not discuss how the novelty of the tool, the short session length, or the participant pool might have produced prompting and verification habits that would not appear in everyday, multi-tool, long-term use of integrated conversational AI.

Authors: We acknowledge that the manuscript's limitations section is underdeveloped on these points. While our study is positioned as an initial exploration of integrated CAI use, we will expand the Discussion to address how the specific context (single tool, short sessions, controlled tasks, participant demographics) may influence the observed behaviors. We will discuss potential novelty effects, the difference between lab-like sessions and naturalistic long-term use, and suggest directions for future research to test generalizability across tools and over time. This revision will better frame the scope of our claims without overstating them. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical qualitative study with direct observations

full rationale

The paper reports results from a controlled user study with 20 participants performing information-retrieval and planning tasks in Microsoft Edge Copilot. No equations, fitted parameters, derivations, or mathematical predictions appear in the abstract or described content. Central claims rest on direct participant observations rather than any self-referential reduction, self-citation chain, or renamed known result. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard HCI assumptions about the validity of think-aloud and interview data for inferring beliefs; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Qualitative analysis of participant behavior and statements can reliably reveal mental models of AI systems
Invoked implicitly by the choice to conduct and interpret a user study

pith-pipeline@v0.9.1-grok · 5672 in / 1069 out tokens · 24174 ms · 2026-06-30T20:11:53.511698+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 37 canonical work pages

[1]

2024.Influencer Ad Disclosure on Social Media: Instagram and TikTok

Advertising Standards Agency. 2024.Influencer Ad Disclosure on Social Media: Instagram and TikTok. Technical Report. https://www.asa.org.uk/ resource/influencer-ad-disclosure-on-social-media-instagram-and-tiktok-2024.html

2024
[2]

Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottridge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.2, 3, Article 91 (Sept. 2018), 24 pages. doi:10.1145/3264901

work page doi:10.1145/3264901 2018
[3]

Johnson, Priyanshu Rai, Tathagata Chakraborti, Thomas Gschwind, Jim A Laredo, Christoph Miksovic, Paolo Scotton, Kartik Talamadupula, and Gegi Thomas

Michelle Brachman, Qian Pan, Hyo Jin Do, Casey Dugan, Arunima Chaudhary, James M. Johnson, Priyanshu Rai, Tathagata Chakraborti, Thomas Gschwind, Jim A Laredo, Christoph Miksovic, Paolo Scotton, Kartik Talamadupula, and Gegi Thomas. 2023. Follow the Successful Herd: Towards Explanations for Improved Use and Mental Models of Natural Language Systems. InPro...

work page doi:10.1145/3581641.3584088 2023
[4]

2012.Thematic analysis.American Psychological Association

Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psychological Association

2012
[5]

2021.Thematic analysis: A practical guide

Virginia Braun and Victoria Clarke. 2021.Thematic analysis: A practical guide. SAGE publications Ltd

2021
[6]

Goran Bubaš, Snježana Babić, and Antonela Čižmešija. 2023. Usability and User Experience Related Perceptions of University Students Regarding the Use of Bing Chat Search Engine and AI Chatbot: Preliminary Evaluation of Assessment Scales. In2023 IEEE 21st Jubilee International Symposium on Intelligent Systems and Informatics (SISY). 000607–000612. doi:10.1...

work page doi:10.1109/sisy60376.2023.10417910 2023
[7]

Sara Cannizzaro, Rob Procter, Sinong Ma, and Carsten Maple. 2020. Trust in the smart home: Findings from a nationally representative survey in the UK.Plos one15, 5 (2020), e0231615

2020
[8]

Avishek Choudhury and Hamid Shamszare. 2023. Investigating the Impact of User Trust on the Adoption and Use of ChatGPT: Survey Analysis.J Med Internet Res25 (14 Jun 2023), e47184. doi:10.2196/47184

work page doi:10.2196/47184 2023
[9]

Newman, and Prabal Dutta

Meghan Clark, Mark W. Newman, and Prabal Dutta. 2017. Devices and Data and Agents, Oh My: How Smart Home Abstractions Prime End-User Mental Models.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 3, Article 44 (Sept. 2017), 26 pages. doi:10.1145/3132031

work page doi:10.1145/3132031 2017
[10]

What can i help you with?

Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. "What can i help you with?": infrequent users’ experiences of intelligent personal assistants. InProceedings of the 19th International Conference on Human- Computer Interaction with Mobile Devices and Services(Vienna, Au...

work page doi:10.1145/3098279.3098539 2017
[11]

Peter J. Denning. 2025. In Large Language Models We Trust?Commun. ACM68, 6 (June 2025), 23–25. doi:10.1145/3726009

work page doi:10.1145/3726009 2025
[12]

Johnson, and Roshni Thawani

Hyo Jin Do, Michelle Brachman, Casey Dugan, Qian Pan, Priyanshu Rai, James M. Johnson, and Roshni Thawani. 2024. Evaluating What Others Say: The Effect of Accuracy Assessment in Shaping Mental Models of AI Systems.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 373 (Nov. 2024), 26 pages. doi:10.1145/3686912

work page doi:10.1145/3686912 2024
[13]

Josh Freeman. 2025. Student generative AI survey 2025.Higher Education Policy Institute: London, UK(2025)

2025
[14]

Millen, Murray Campbell, Sadhana Kumaravel, and Wei Zhang

Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan, Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David R. Millen, Murray Campbell, Sadhana Kumaravel, and Wei Zhang. 2020. Mental Models of AI Agents in a Cooperative Game Setting. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Asso...

work page doi:10.1145/3313831.3376316 2020
[15]

Mark Grimes, Ryan M

G. Mark Grimes, Ryan M. Schuetzler, and Justin Scott Giboney. 2021. Mental models and expectation violations in conversational AI interactions. Decision Support Systems144 (2021), 113515. doi:10.1016/j.dss.2021.113515

work page doi:10.1016/j.dss.2021.113515 2021
[16]

Ellie Harmon and Melissa Mazmanian. 2013. Stories of the Smartphone in everyday discourse: conflict, tension & instability. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Paris, France)(CHI ’13). Association for Computing Machinery, New York, NY, USA, 1051–1060. doi:10.1145/2470654.2466134

work page doi:10.1145/2470654.2466134 2013
[17]

Horstmann, Clara Strathmann, Lea Lambrich, and Nicole C

Aike C. Horstmann, Clara Strathmann, Lea Lambrich, and Nicole C. Krämer. 2023. Alexa, What’s Inside of You: A Qualitative Study to Explore Users’ Mental Models of Intelligent Voice Assistants. InProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents(Würzburg, Germany)(IV A ’23). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3570945.3607335 2023
[18]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation.ACM Comput. Surv.55, 12, Article 248 (March 2023), 38 pages. doi:10.1145/3571730

work page doi:10.1145/3571730 2023
[19]

Prerna Juneja, Wenjuan Zhang, Alison Marie Smith-Renner, Hemank Lamba, Joel Tetreault, and Alex Jaimes. 2024. Dissecting users’ needs for search result explanations. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 841, 17 pages. doi:...

work page doi:10.1145/3613904.3642059 2024
[20]

Shyam Sundar

Yongnam Jung, Cheng Chen, Eunchae Jang, and S. Shyam Sundar. 2024. Do We Trust ChatGPT as much as Google Search and Wikipedia?. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 111, 9 pages. doi:10.1145/3613905.3650862

work page doi:10.1145/3613905.3650862 2024
[21]

You Always Get an Answer

Ilkka Kaate, Joni Salminen, Soon-Gyo Jung, Trang Thi Thu Xuan, Essi Häyhänen, Jinan Y. Azem, and Bernard J. Jansen. 2025. “You Always Get an Answer”: Analyzing Users’ Interaction with AI-Generated Personas Given Unanswerable Questions and Risk of Hallucination. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). As...

work page doi:10.1145/3708359.3712160 2025
[22]

Markelle Kelly, Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. 2023. Capturing Humans’ Mental Models of AI: An Item Response Theory Approach. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Manuscript submitted to ACM Beliefs and Misconceptions around Integrated Conver...

work page doi:10.1145/3593013.3594111 2023
[23]

Changhyun Lee and Kyungjin Cha and. 2024. Toward the Dynamic Relationship Between AI Transparency and Trust in AI: A Case Study on ChatGPT.International Journal of Human–Computer Interaction0, 0 (2024), 1–18. arXiv:https://doi.org/10.1080/10447318.2024.2405266 doi:10.1080/ 10447318.2024.2405266

work page doi:10.1080/10447318.2024.2405266 2024
[24]

Sunok Lee, Minji Cho, and Sangsu Lee. 2020. What If Conversational Agents Became Invisible? Comparing Users’ Mental Models According to Physical Entity of AI Speaker.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.4, 3, Article 88 (Sept. 2020), 24 pages. doi:10.1145/3411840

work page doi:10.1145/3411840 2020
[25]

Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2024. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 423 (Nov. 2024), 44 pages. doi:10.1145/3686962

work page doi:10.1145/3686962 2024
[26]

Like Having a Really Bad PA

Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297. doi:10.1145/2858036.2858288

work page doi:10.1145/2858036.2858288 2016
[27]

David Lyell and Enrico Coiera. 2017. Automation bias and verification complexity: a systematic review.Journal of the American Medical Informatics Association24, 2 (2017), 423–431

2017
[28]

Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In11th australasian conference on information systems, Vol. 53. Citeseer, 6–8

2000
[29]

Kirsti Malterud, Volkert Dirk Siersma, and Ann Dorrit Guassora. 2016. Sample size in qualitative interview studies: guided by information power. Qualitative health research26, 13 (2016), 1753–1760

2016
[30]

Artificial Intelligence (AI)

Dogan Gursoy Mesut Cicek and Lu Lu. 2024. Adverse impacts of revealing the presence of “Artificial Intelligence (AI)” technology in product and service descriptions on purchase intentions: the mediating role of emotional trust and the moderating role of perceived risk.Journal of Hospitality Marketing & Management0, 0 (2024), 1–23. doi:10.1080/19368623.202...

work page doi:10.1080/19368623.2024.2368040 2024
[31]

Brent Daniel Mittelstadt, Patrick Allo, Mariarosaria Taddeo, Sandra Wachter, and Luciano Floridi. 2016. The ethics of algorithms: Mapping the debate.Big Data & Society3, 2 (2016), 2053951716679679

2016
[32]

Vikram Mohanty, Jude Lim, and Kurt Luther. 2025. What Lies Beneath? Exploring the Impact of Underlying AI Model Updates in AI-Infused Systems. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 539, 21 pages. doi:10.1145/3706598.3713751

work page doi:10.1145/3706598.3713751 2025
[33]

Kathleen L Mosier and Linda J Skitka. 2018. Human decision makers and automated decision aids: Made for each other? InAutomation and human performance. CRC Press, 201–220

2018
[34]

Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. InProceedings of the 26th International Conference on Intelligent User Interfaces(College Station, TX, USA)(IUI ’21). Association for Computing Ma...

work page doi:10.1145/3397481.3450639 2021
[35]

Saumya Pareek, Niels van Berkel, Eduardo Velloso, and Jorge Goncalves. 2024. Effect of Explanation Conceptualisations on Trust in AI-assisted Credibility Assessment.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 383 (Nov. 2024), 31 pages. doi:10.1145/3686922

work page doi:10.1145/3686922 2024
[36]

Sohyun Park, Someen Park, Jaehoon Kim, and Kyungsik Han. 2024. Exploring the Impact of AI-Generated Images on Political News Perception and Understanding. InCompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing(San Jose, Costa Rica)(CSCW Companion ’24). Association for Computing Machinery, New York, NY, U...

work page doi:10.1145/3678884.3681907 2024
[37]

2026.Accenture ’links staff promotions to use of AI tools’

Joanna Partridge. 2026.Accenture ’links staff promotions to use of AI tools’. https://www.theguardian.com/accenture/2026/feb/19/accenture-links- staff-promotions-to-use-of-ai-tools

2026
[38]

It’s Weird That it Knows What I Want

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Trans. Comput.-Hum. Interact.31, 1, Article 4 (Nov. 2023), 31 pages. doi:10.1145/3617367

work page doi:10.1145/3617367 2023
[39]

William Seymour and Jose Such. 2023. Ignorance is Bliss? The Effect of Explanations on Perceptions of Voice Assistants.Proc. ACM Hum.-Comput. Interact.7, CSCW1, Article 64 (April 2023), 24 pages. doi:10.1145/3579497

work page doi:10.1145/3579497 2023
[40]

William Seymour and Max Van Kleek. 2021. Exploring Interactions Between Trust, Anthropomorphism, and Relationship Development in Voice Assistants.Proc. ACM Hum.-Comput. Interact.5, CSCW2, Article 371 (Oct. 2021), 16 pages. doi:10.1145/3479515

work page doi:10.1145/3479515 2021
[41]

Vera Liao, and Ziang Xiao

Nikhil Sharma, Q. Vera Liao, and Ziang Xiao. 2024. Generative Echo Chamber? Effect of LLM-Powered Search Systems on Diverse Information Seeking. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 1033, 17 pages. doi:10.1145/3613904.3642459

work page doi:10.1145/3613904.3642459 2024
[42]

Stephen P Stich and Shaun Nichols. 2003. Folk psychology.The blackwell guide to philosophy of mind(2003), 235–255

2003
[43]

Ana Stojanov, Qian Liu, and Joyce Hwee Ling Koh. 2024. University students’ self-reported reliance on ChatGPT for learning: A latent profile analysis.Computers and Education: Artificial Intelligence6, 4 (2024), 100243

2024
[44]

Haoheng Tang and Mrinalini Singha. 2024. A Mystery for You: A fact-checking game enhanced by large language models (LLMs) and a tangible interface. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 631, 5 pages. doi:10.1145/361390...

work page doi:10.1145/3613905.3648110 2024
[45]

Paul Thomas, Bodo Billerbeck, Nick Craswell, and Ryen W. White. 2019. Investigating Searchers’ Mental Models to Inform Search Explanations. ACM Trans. Inf. Syst.38, 1, Article 10 (dec 2019), 25 pages. doi:10.1145/3371390 Manuscript submitted to ACM 16 Seymour and Jenkins, et al

work page doi:10.1145/3371390 2019
[46]

Irene Weber. 2024. Large language models as software components: A taxonomy for llm-integrated applications.arXiv preprint arXiv:2406.10300 (2024)

work page arXiv 2024
[47]

Zamfirescu-Pereira, Richmond Y

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. doi:10.1...

work page doi:10.1145/3544548.3581388 2023
[48]

Xia Zeng, David La Barbera, Kevin Roitero, Arkaitz Zubiaga, and Stefano Mizzaro. 2024. Combining Large Language Models and Crowdsourcing for Hybrid Human-AI Misinformation Detection. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Ma...

work page doi:10.1145/3626772.3657965 2024
[49]

Xiao Zhan, Juan-Carlos Carrillo, William Seymour, and Jose Such. 2025. Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information. In34th USENIX Security Symposium. USENIX Association

2025
[50]

Zhang, Aditya Ranganathan, Sarah Emlen Metz, Scott Appling, Connie Moon Sehat, Norman Gilmore, Nick B

Amy X. Zhang, Aditya Ranganathan, Sarah Emlen Metz, Scott Appling, Connie Moon Sehat, Norman Gilmore, Nick B. Adams, Emmanuel Vincent, Jennifer Lee, Martin Robbins, Ed Bice, Sandro Hawke, David Karger, and An Xiao Mina. 2018. A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles. InCompanion Proceedings o...

work page doi:10.1145/3184558.3188731 2018
[51]

Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. 2023. Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, ...

work page doi:10.1145/3544548.3581318 2023

[1] [1]

2024.Influencer Ad Disclosure on Social Media: Instagram and TikTok

Advertising Standards Agency. 2024.Influencer Ad Disclosure on Social Media: Instagram and TikTok. Technical Report. https://www.asa.org.uk/ resource/influencer-ad-disclosure-on-social-media-instagram-and-tiktok-2024.html

2024

[2] [2]

Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottridge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.2, 3, Article 91 (Sept. 2018), 24 pages. doi:10.1145/3264901

work page doi:10.1145/3264901 2018

[3] [3]

Johnson, Priyanshu Rai, Tathagata Chakraborti, Thomas Gschwind, Jim A Laredo, Christoph Miksovic, Paolo Scotton, Kartik Talamadupula, and Gegi Thomas

Michelle Brachman, Qian Pan, Hyo Jin Do, Casey Dugan, Arunima Chaudhary, James M. Johnson, Priyanshu Rai, Tathagata Chakraborti, Thomas Gschwind, Jim A Laredo, Christoph Miksovic, Paolo Scotton, Kartik Talamadupula, and Gegi Thomas. 2023. Follow the Successful Herd: Towards Explanations for Improved Use and Mental Models of Natural Language Systems. InPro...

work page doi:10.1145/3581641.3584088 2023

[4] [4]

2012.Thematic analysis.American Psychological Association

Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psychological Association

2012

[5] [5]

2021.Thematic analysis: A practical guide

Virginia Braun and Victoria Clarke. 2021.Thematic analysis: A practical guide. SAGE publications Ltd

2021

[6] [6]

Goran Bubaš, Snježana Babić, and Antonela Čižmešija. 2023. Usability and User Experience Related Perceptions of University Students Regarding the Use of Bing Chat Search Engine and AI Chatbot: Preliminary Evaluation of Assessment Scales. In2023 IEEE 21st Jubilee International Symposium on Intelligent Systems and Informatics (SISY). 000607–000612. doi:10.1...

work page doi:10.1109/sisy60376.2023.10417910 2023

[7] [7]

Sara Cannizzaro, Rob Procter, Sinong Ma, and Carsten Maple. 2020. Trust in the smart home: Findings from a nationally representative survey in the UK.Plos one15, 5 (2020), e0231615

2020

[8] [8]

Avishek Choudhury and Hamid Shamszare. 2023. Investigating the Impact of User Trust on the Adoption and Use of ChatGPT: Survey Analysis.J Med Internet Res25 (14 Jun 2023), e47184. doi:10.2196/47184

work page doi:10.2196/47184 2023

[9] [9]

Newman, and Prabal Dutta

Meghan Clark, Mark W. Newman, and Prabal Dutta. 2017. Devices and Data and Agents, Oh My: How Smart Home Abstractions Prime End-User Mental Models.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 3, Article 44 (Sept. 2017), 26 pages. doi:10.1145/3132031

work page doi:10.1145/3132031 2017

[10] [10]

What can i help you with?

Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. "What can i help you with?": infrequent users’ experiences of intelligent personal assistants. InProceedings of the 19th International Conference on Human- Computer Interaction with Mobile Devices and Services(Vienna, Au...

work page doi:10.1145/3098279.3098539 2017

[11] [11]

Peter J. Denning. 2025. In Large Language Models We Trust?Commun. ACM68, 6 (June 2025), 23–25. doi:10.1145/3726009

work page doi:10.1145/3726009 2025

[12] [12]

Johnson, and Roshni Thawani

Hyo Jin Do, Michelle Brachman, Casey Dugan, Qian Pan, Priyanshu Rai, James M. Johnson, and Roshni Thawani. 2024. Evaluating What Others Say: The Effect of Accuracy Assessment in Shaping Mental Models of AI Systems.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 373 (Nov. 2024), 26 pages. doi:10.1145/3686912

work page doi:10.1145/3686912 2024

[13] [13]

Josh Freeman. 2025. Student generative AI survey 2025.Higher Education Policy Institute: London, UK(2025)

2025

[14] [14]

Millen, Murray Campbell, Sadhana Kumaravel, and Wei Zhang

Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan, Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David R. Millen, Murray Campbell, Sadhana Kumaravel, and Wei Zhang. 2020. Mental Models of AI Agents in a Cooperative Game Setting. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Asso...

work page doi:10.1145/3313831.3376316 2020

[15] [15]

Mark Grimes, Ryan M

G. Mark Grimes, Ryan M. Schuetzler, and Justin Scott Giboney. 2021. Mental models and expectation violations in conversational AI interactions. Decision Support Systems144 (2021), 113515. doi:10.1016/j.dss.2021.113515

work page doi:10.1016/j.dss.2021.113515 2021

[16] [16]

Ellie Harmon and Melissa Mazmanian. 2013. Stories of the Smartphone in everyday discourse: conflict, tension & instability. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Paris, France)(CHI ’13). Association for Computing Machinery, New York, NY, USA, 1051–1060. doi:10.1145/2470654.2466134

work page doi:10.1145/2470654.2466134 2013

[17] [17]

Horstmann, Clara Strathmann, Lea Lambrich, and Nicole C

Aike C. Horstmann, Clara Strathmann, Lea Lambrich, and Nicole C. Krämer. 2023. Alexa, What’s Inside of You: A Qualitative Study to Explore Users’ Mental Models of Intelligent Voice Assistants. InProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents(Würzburg, Germany)(IV A ’23). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3570945.3607335 2023

[18] [18]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation.ACM Comput. Surv.55, 12, Article 248 (March 2023), 38 pages. doi:10.1145/3571730

work page doi:10.1145/3571730 2023

[19] [19]

Prerna Juneja, Wenjuan Zhang, Alison Marie Smith-Renner, Hemank Lamba, Joel Tetreault, and Alex Jaimes. 2024. Dissecting users’ needs for search result explanations. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 841, 17 pages. doi:...

work page doi:10.1145/3613904.3642059 2024

[20] [20]

Shyam Sundar

Yongnam Jung, Cheng Chen, Eunchae Jang, and S. Shyam Sundar. 2024. Do We Trust ChatGPT as much as Google Search and Wikipedia?. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 111, 9 pages. doi:10.1145/3613905.3650862

work page doi:10.1145/3613905.3650862 2024

[21] [21]

You Always Get an Answer

Ilkka Kaate, Joni Salminen, Soon-Gyo Jung, Trang Thi Thu Xuan, Essi Häyhänen, Jinan Y. Azem, and Bernard J. Jansen. 2025. “You Always Get an Answer”: Analyzing Users’ Interaction with AI-Generated Personas Given Unanswerable Questions and Risk of Hallucination. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). As...

work page doi:10.1145/3708359.3712160 2025

[22] [22]

Markelle Kelly, Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. 2023. Capturing Humans’ Mental Models of AI: An Item Response Theory Approach. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Manuscript submitted to ACM Beliefs and Misconceptions around Integrated Conver...

work page doi:10.1145/3593013.3594111 2023

[23] [23]

Changhyun Lee and Kyungjin Cha and. 2024. Toward the Dynamic Relationship Between AI Transparency and Trust in AI: A Case Study on ChatGPT.International Journal of Human–Computer Interaction0, 0 (2024), 1–18. arXiv:https://doi.org/10.1080/10447318.2024.2405266 doi:10.1080/ 10447318.2024.2405266

work page doi:10.1080/10447318.2024.2405266 2024

[24] [24]

Sunok Lee, Minji Cho, and Sangsu Lee. 2020. What If Conversational Agents Became Invisible? Comparing Users’ Mental Models According to Physical Entity of AI Speaker.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.4, 3, Article 88 (Sept. 2020), 24 pages. doi:10.1145/3411840

work page doi:10.1145/3411840 2020

[25] [25]

Houjiang Liu, Anubrata Das, Alexander Boltz, Didi Zhou, Daisy Pinaroc, Matthew Lease, and Min Kyung Lee. 2024. Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 423 (Nov. 2024), 44 pages. doi:10.1145/3686962

work page doi:10.1145/3686962 2024

[26] [26]

Like Having a Really Bad PA

Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297. doi:10.1145/2858036.2858288

work page doi:10.1145/2858036.2858288 2016

[27] [27]

David Lyell and Enrico Coiera. 2017. Automation bias and verification complexity: a systematic review.Journal of the American Medical Informatics Association24, 2 (2017), 423–431

2017

[28] [28]

Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In11th australasian conference on information systems, Vol. 53. Citeseer, 6–8

2000

[29] [29]

Kirsti Malterud, Volkert Dirk Siersma, and Ann Dorrit Guassora. 2016. Sample size in qualitative interview studies: guided by information power. Qualitative health research26, 13 (2016), 1753–1760

2016

[30] [30]

Artificial Intelligence (AI)

Dogan Gursoy Mesut Cicek and Lu Lu. 2024. Adverse impacts of revealing the presence of “Artificial Intelligence (AI)” technology in product and service descriptions on purchase intentions: the mediating role of emotional trust and the moderating role of perceived risk.Journal of Hospitality Marketing & Management0, 0 (2024), 1–23. doi:10.1080/19368623.202...

work page doi:10.1080/19368623.2024.2368040 2024

[31] [31]

Brent Daniel Mittelstadt, Patrick Allo, Mariarosaria Taddeo, Sandra Wachter, and Luciano Floridi. 2016. The ethics of algorithms: Mapping the debate.Big Data & Society3, 2 (2016), 2053951716679679

2016

[32] [32]

Vikram Mohanty, Jude Lim, and Kurt Luther. 2025. What Lies Beneath? Exploring the Impact of Underlying AI Model Updates in AI-Infused Systems. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 539, 21 pages. doi:10.1145/3706598.3713751

work page doi:10.1145/3706598.3713751 2025

[33] [33]

Kathleen L Mosier and Linda J Skitka. 2018. Human decision makers and automated decision aids: Made for each other? InAutomation and human performance. CRC Press, 201–220

2018

[34] [34]

Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. InProceedings of the 26th International Conference on Intelligent User Interfaces(College Station, TX, USA)(IUI ’21). Association for Computing Ma...

work page doi:10.1145/3397481.3450639 2021

[35] [35]

Saumya Pareek, Niels van Berkel, Eduardo Velloso, and Jorge Goncalves. 2024. Effect of Explanation Conceptualisations on Trust in AI-assisted Credibility Assessment.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 383 (Nov. 2024), 31 pages. doi:10.1145/3686922

work page doi:10.1145/3686922 2024

[36] [36]

Sohyun Park, Someen Park, Jaehoon Kim, and Kyungsik Han. 2024. Exploring the Impact of AI-Generated Images on Political News Perception and Understanding. InCompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing(San Jose, Costa Rica)(CSCW Companion ’24). Association for Computing Machinery, New York, NY, U...

work page doi:10.1145/3678884.3681907 2024

[37] [37]

2026.Accenture ’links staff promotions to use of AI tools’

Joanna Partridge. 2026.Accenture ’links staff promotions to use of AI tools’. https://www.theguardian.com/accenture/2026/feb/19/accenture-links- staff-promotions-to-use-of-ai-tools

2026

[38] [38]

It’s Weird That it Knows What I Want

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Trans. Comput.-Hum. Interact.31, 1, Article 4 (Nov. 2023), 31 pages. doi:10.1145/3617367

work page doi:10.1145/3617367 2023

[39] [39]

William Seymour and Jose Such. 2023. Ignorance is Bliss? The Effect of Explanations on Perceptions of Voice Assistants.Proc. ACM Hum.-Comput. Interact.7, CSCW1, Article 64 (April 2023), 24 pages. doi:10.1145/3579497

work page doi:10.1145/3579497 2023

[40] [40]

William Seymour and Max Van Kleek. 2021. Exploring Interactions Between Trust, Anthropomorphism, and Relationship Development in Voice Assistants.Proc. ACM Hum.-Comput. Interact.5, CSCW2, Article 371 (Oct. 2021), 16 pages. doi:10.1145/3479515

work page doi:10.1145/3479515 2021

[41] [41]

Vera Liao, and Ziang Xiao

Nikhil Sharma, Q. Vera Liao, and Ziang Xiao. 2024. Generative Echo Chamber? Effect of LLM-Powered Search Systems on Diverse Information Seeking. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 1033, 17 pages. doi:10.1145/3613904.3642459

work page doi:10.1145/3613904.3642459 2024

[42] [42]

Stephen P Stich and Shaun Nichols. 2003. Folk psychology.The blackwell guide to philosophy of mind(2003), 235–255

2003

[43] [43]

Ana Stojanov, Qian Liu, and Joyce Hwee Ling Koh. 2024. University students’ self-reported reliance on ChatGPT for learning: A latent profile analysis.Computers and Education: Artificial Intelligence6, 4 (2024), 100243

2024

[44] [44]

Haoheng Tang and Mrinalini Singha. 2024. A Mystery for You: A fact-checking game enhanced by large language models (LLMs) and a tangible interface. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 631, 5 pages. doi:10.1145/361390...

work page doi:10.1145/3613905.3648110 2024

[45] [45]

Paul Thomas, Bodo Billerbeck, Nick Craswell, and Ryen W. White. 2019. Investigating Searchers’ Mental Models to Inform Search Explanations. ACM Trans. Inf. Syst.38, 1, Article 10 (dec 2019), 25 pages. doi:10.1145/3371390 Manuscript submitted to ACM 16 Seymour and Jenkins, et al

work page doi:10.1145/3371390 2019

[46] [46]

Irene Weber. 2024. Large language models as software components: A taxonomy for llm-integrated applications.arXiv preprint arXiv:2406.10300 (2024)

work page arXiv 2024

[47] [47]

Zamfirescu-Pereira, Richmond Y

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. doi:10.1...

work page doi:10.1145/3544548.3581388 2023

[48] [48]

Xia Zeng, David La Barbera, Kevin Roitero, Arkaitz Zubiaga, and Stefano Mizzaro. 2024. Combining Large Language Models and Crowdsourcing for Hybrid Human-AI Misinformation Detection. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Ma...

work page doi:10.1145/3626772.3657965 2024

[49] [49]

Xiao Zhan, Juan-Carlos Carrillo, William Seymour, and Jose Such. 2025. Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information. In34th USENIX Security Symposium. USENIX Association

2025

[50] [50]

Zhang, Aditya Ranganathan, Sarah Emlen Metz, Scott Appling, Connie Moon Sehat, Norman Gilmore, Nick B

Amy X. Zhang, Aditya Ranganathan, Sarah Emlen Metz, Scott Appling, Connie Moon Sehat, Norman Gilmore, Nick B. Adams, Emmanuel Vincent, Jennifer Lee, Martin Robbins, Ed Bice, Sandro Hawke, David Karger, and An Xiao Mina. 2018. A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles. InCompanion Proceedings o...

work page doi:10.1145/3184558.3188731 2018

[51] [51]

Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. 2023. Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, ...

work page doi:10.1145/3544548.3581318 2023