VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization

Albert Calzaretto; Alice Ferng; Jingya Chen; Mihaela Vorvoreanu; Richard Lee; Samir Passi; Sam Yu-Te Lee

arxiv: 2510.11954 · v2 · submitted 2025-10-13 · 💻 cs.HC

VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization

Sam Yu-Te Lee , Jingya Chen , Albert Calzaretto , Richard Lee , Samir Passi , Alice Ferng , Mihaela Vorvoreanu This is my paper

Pith reviewed 2026-05-18 07:05 UTC · model grok-4.3

classification 💻 cs.HC

keywords enterprise chatbotscontext visualizationtopic modelingcontext alignmentprompting strategieshuman oversightappropriate relianceresearch through design

0 comments

The pith

Visualization enables users to detect and correct misaligned context in enterprise chatbots while adapting their prompting strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enterprise chatbots often retrieve context from large databases that does not match user intentions, leading to responses that are technically correct but irrelevant. VizCopilot introduces visualization by combining topic modeling with document views so users can inspect, adjust, and align the context before generation. A Research-through-Design study using the prototype found that these visuals help users spot and fix problems and also lead them to change how they prompt the system for better context from the start. The approach aims to keep added mental effort low enough for everyday work. The work also identifies remaining gaps in support for detailed reading and trust in AI summaries.

Core claim

By combining topic modeling with document visualization, VizCopilot enables human oversight and modification of retrieved context. The study shows that visualization not only helps users detect and correct misaligned context but also encourages them to adapt their prompting strategies, enabling the system to retrieve more relevant context from the outset.

What carries the argument

VizCopilot prototype that combines topic modeling with document visualization to support human oversight and modification of retrieved context in enterprise chatbots.

If this is right

Users can detect when retrieved context diverges from their intentions and correct it directly.
Users adapt prompting strategies after seeing visual feedback, improving initial context retrieval.
Cognitive overhead stays low enough for practical oversight and changes in real tasks.
Gaps appear in verification support for close reading and building trust in AI summaries.
Future chatbot designs should explore personalization, proactivity, and sustainable human-AI collaboration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar visualization layers could be added to other retrieval-augmented AI tools to increase user control over data sources.
Making context visible may reduce cases where the chatbot produces correct but off-target answers across different domains.
Longer-term deployments could test whether improved prompting habits transfer beyond the study setting.
The work connects to broader transparency needs in retrieval-based systems by showing visualization as one concrete method.

Load-bearing premise

The chosen visualization techniques keep cognitive overhead manageable while enabling effective human oversight and modification of retrieved context in realistic enterprise use.

What would settle it

A user study finding that participants with VizCopilot show no higher rates of context correction or prompting adaptation than participants using a standard text-only chatbot interface.

Figures

Figures reproduced from arXiv: 2510.11954 by Albert Calzaretto, Alice Ferng, Jingya Chen, Mihaela Vorvoreanu, Richard Lee, Samir Passi, Sam Yu-Te Lee.

**Figure 1.** Figure 1: Overview of VizCopilot before entering a prompt. (a) The visualization panel shows topic structures of the context in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Interactions supported by VizCopilot. (a) Users can enter their prompt in the chat panel to initiate a conversation. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The highlight feature allows users to quickly check [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Enterprise chatbots show promise in supporting knowledge workers in information synthesis tasks by retrieving context from large, heterogeneous databases before generating answers. However, when the retrieved context misaligns with user intentions, the chatbot often produces "irrelevantly right" responses that provide little value. In this work, we introduce VizCopilot, a prototype that incorporates visualization techniques to actively involve end-users in context alignment. By combining topic modeling with document visualization, VizCopilot enables human oversight and modification of retrieved context while keeping cognitive overhead manageable. We used VizCopilot as a design probe in a Research-through-Design study to evaluate the role of visualization in context alignment and to surface future design opportunities. Our findings show that visualization not only helps users detect and correct misaligned context but also encourages them to adapt their prompting strategies, enabling the system to retrieve more relevant context from the outset. At the same time, the study reveals limitations in verification support regarding close-reading and trust in AI summaries. We outline future directions for visualization-enhanced chatbots, focusing on personalization, proactivity, and sustainable human-AI collaboration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VizCopilot adds a visualization layer for context oversight in enterprise chatbots and reports that it prompts users to change their prompting habits, but the adaptation benefit rests on qualitative observations without objective retrieval metrics.

read the letter

The core takeaway is that this paper builds a prototype called VizCopilot that layers topic modeling and document visualization on top of an enterprise chatbot so users can inspect and edit the retrieved context. The study claims this helps both fix misaligned context on the fly and pushes users toward better initial prompts that pull more relevant material from the start.

Referee Report

2 major / 2 minor

Summary. The paper introduces VizCopilot, a prototype that combines topic modeling with document visualization to support end-user oversight and modification of retrieved context in enterprise chatbots. Through a Research-through-Design study using the system as a design probe, the authors report that visualization aids detection and correction of misaligned context, encourages users to adapt prompting strategies for improved initial retrieval, and surfaces limitations in close-reading support and trust in AI summaries, while outlining future directions for personalization and proactivity.

Significance. If the empirical observations hold, the work contributes to HCI by illustrating how visualization can foster appropriate reliance on retrieval-augmented chatbots in knowledge work settings. The dual finding that visualization supports both reactive correction and proactive strategy change is a useful distinction for designing sustainable human-AI collaboration, and the explicit discussion of verification limitations provides concrete guidance for follow-on systems.

major comments (2)

[Study / Evaluation] The evaluation section provides no participant count, study protocol details, analysis method (e.g., thematic analysis procedure), or comparison baseline. Without these, the central claim that visualization encourages prompting adaptation yielding more relevant context from the outset cannot be rigorously assessed and may rest on post-hoc interpretation of selected observations.
[Findings on prompting adaptation] The finding that users adapt prompting strategies is supported only by qualitative think-aloud and interview data; no logged retrieval metrics, relevance judgments against ground truth, or controlled pre/post comparisons are reported. This leaves open the possibility that observed adaptations reflect novelty or social-desirability effects rather than causally improved context quality.

minor comments (2)

[Abstract] The abstract states that topic modeling combined with document visualization keeps cognitive overhead manageable, yet no specific user feedback or workload measures are referenced to support this assertion.
[VizCopilot prototype description] Figure captions and axis labels in the visualization examples could be expanded to clarify how topic clusters and document highlights map to user-modifiable context elements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback on our manuscript. We appreciate the emphasis on methodological transparency and the strength of evidence for our claims. We address each major comment in turn below, outlining how we plan to revise the paper.

read point-by-point responses

Referee: [Study / Evaluation] The evaluation section provides no participant count, study protocol details, analysis method (e.g., thematic analysis procedure), or comparison baseline. Without these, the central claim that visualization encourages prompting adaptation yielding more relevant context from the outset cannot be rigorously assessed and may rest on post-hoc interpretation of selected observations.

Authors: We acknowledge the validity of this observation. The current manuscript provides a concise description of the Research-through-Design study but omits specific details such as the exact number of participants, the full study protocol, the thematic analysis method, and any baseline comparison. To address this, we will substantially revise the Evaluation section to include these elements: we will report the participant count and selection criteria, provide a detailed protocol including task descriptions and session structure, describe the analysis method (thematic analysis following established guidelines), and clarify that the study was not designed with a comparison baseline as it focused on probing design opportunities with the prototype. These additions will enable a more rigorous assessment of the findings on prompting adaptation. revision: yes
Referee: [Findings on prompting adaptation] The finding that users adapt prompting strategies is supported only by qualitative think-aloud and interview data; no logged retrieval metrics, relevance judgments against ground truth, or controlled pre/post comparisons are reported. This leaves open the possibility that observed adaptations reflect novelty or social-desirability effects rather than causally improved context quality.

Authors: We agree that the support for the prompting adaptation finding is qualitative in nature. Given the Research-through-Design methodology, our primary aim was to generate insights into how visualization can support context oversight rather than to demonstrate causal effects through quantitative measures. The think-aloud and interview data revealed users' reasoning processes and strategy changes. However, we recognize the limitations pointed out, including potential novelty and social-desirability biases. In the revised manuscript, we will add an explicit discussion of these alternative explanations in the Findings and Limitations sections, and we will moderate the language around the claim to reflect the exploratory nature of the evidence. While we cannot retroactively add logged metrics or controlled comparisons without conducting additional studies, we believe the qualitative data offers substantive value for the HCI community in this emerging area. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on independent user study observations

full rationale

The paper reports findings from a Research-through-Design probe study using thematic analysis of think-aloud sessions and interviews. No equations, parameter fittings, derivations, or predictions appear anywhere in the manuscript. Central claims about visualization aiding context alignment and prompting strategy adaptation are presented as direct outcomes of participant data rather than reductions to self-defined inputs or self-citations. Any self-citations are incidental and non-load-bearing for the empirical results, which remain externally falsifiable via replication of the described study protocol.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on domain assumptions about user interaction with visualizations rather than fitted parameters or new invented entities; full paper would be needed to audit any unstated design choices.

axioms (1)

domain assumption Visualization techniques can keep cognitive overhead manageable while enabling effective oversight and modification of retrieved context.
This premise underpins the reported benefits of detection, correction, and prompting adaptation.

pith-pipeline@v0.9.0 · 5748 in / 1225 out tokens · 34850 ms · 2026-05-18T07:05:12.352546+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 5 internal anchors

[1]

Eric Alexander, Joe Kohlmann, Robin Valenza, Michael Witmore, and Michael Gleicher. 2014. Serendip: Topic model-driven visual exploration of text corpora. In 2014 IEEE Conference on Visual Analytics Science and Technology (V AST). 173–182. doi:10.1109/VAST.2014.7042493

work page doi:10.1109/vast.2014.7042493 2014
[2]

Bennett, Kori Inkpen, and Jaime Teevan

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human- AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. doi:10.1145/3290605.3300233

work page doi:10.1145/3290605.3300233 2019
[3]

Anthropic. 2025. Introduction to the Model Context Protocol. https:// modelcontextprotocol.io/docs/getting-started/intro. Accessed: 2025-08-27

work page 2025
[4]

Muneera Bano, Didar Zowghi, Jon Whittle, Liming Zhu, Andrew Reeson, Rob Martin, and Jen Parsons. 2025. A Qualitative Study of User Perception of M365 AI Copilot. arXiv:2503.17661 [cs.CY] https://arxiv.org/abs/2503.17661

work page arXiv 2025
[5]

Mohammad Beigi, Sijia Wang, Ying Shen, Zihao Lin, Adithya Kulkarni, Jianfeng He, Feng Chen, Ming Jin, Jin-Hee Cho, Dawei Zhou, Chang-Tien Lu, and Lifu Huang. 2024. Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models. arXiv:2410.20199 [cs.AI] https://arxiv.org/abs/ 2410.20199

work page arXiv 2024
[6]

Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis. Qualitative research in sport, exercise and health11, 4 (2019), 589–597

work page 2019
[7]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI- assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (2021), 21 pages. doi:10.1145/3449287

work page internal anchor Pith review doi:10.1145/3449287 2021
[8]

Nan Cao, David Gotz, Jimeng Sun, Yu-Ru Lin, and Huamin Qu. 2011. SolarMap: Multifaceted Visual Analytics for Topic Exploration. In2011 IEEE 11th Interna- tional Conference on Data Mining. 101–110. doi:10.1109/ICDM.2011.135

work page doi:10.1109/icdm.2011.135 2011
[9]

Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz, Shixia Liu, and Huamin Qu. 2010. FacetAtlas: Multifaceted Visualization for Rich Text Corpora.IEEE Transactions on Visualization and Computer Graphics16, 6 (2010), 1172–1181. doi:10.1109/ TVCG.2010.154

work page 2010
[10]

Angelos Chatzimparmpas. 2025. Visual Analytics for Explainable and Trustwor- thy Artificial Intelligence.IEEE Computer Graphics and Applications45, 2 (2025), 100–111. doi:10.1109/MCG.2025.3533806

work page doi:10.1109/mcg.2025.3533806 2025
[11]

Reddy, and Haesun Park

Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Ma- trix Factorization.IEEE Transactions on Visualization and Computer Graphics19, 12 (2013), 1992–2001. doi:10.1109/TVCG.2013.212

work page doi:10.1109/tvcg.2013.212 2013
[12]

Nazli Cila. 2022. Designing Human-Agent Collaborations: Commitment, respon- siveness, and support. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. Article 420, 18 pages. doi:10.1145/3491102.3517500

work page doi:10.1145/3491102.3517500 2022
[13]

Cover and P

T. Cover and P. Hart. 1967. Nearest neighbor pattern classification.IEEE Trans- actions on Information Theory13, 1 (1967), 21–27. doi:10.1109/TIT.1967.1053964

work page doi:10.1109/tit.1967.1053964 1967
[14]

Wenwen Dou, Li Yu, Xiaoyu Wang, Zhiqiang Ma, and William Ribarsky. 2013. HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hier- archies.IEEE Transactions on Visualization and Computer Graphics19, 12 (2013), 2002–2011. doi:10.1109/TVCG.2013.162

work page doi:10.1109/tvcg.2013.162 2013
[15]

It makes you think

Ian Drosos, Advait Sarkar, Xiaotong, Xu, and Neil Toronto. 2025. "It makes you think": Provocations Help Restore Critical Thinking to AI-Assisted Knowledge Work. arXiv:2501.17247 [cs.HC]

work page arXiv 2025
[16]

Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022). doi:10.48550/arXiv. 2203.05794

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2022
[17]

Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems.Proceedings of the National Academy of Sciences116, 6 (2019), 1844–1850. doi:10.1073/pnas.1807184115

work page doi:10.1073/pnas.1807184115 2019
[18]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Trans. Inf. Syst.43, 2, Article 42 (2025), 55 pages. doi:10.1145/3703155

work page doi:10.1145/3703155 2025
[19]

Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, and Lucas Dixon. 2025. LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 503–513. doi:10.1109/TVCG.2024.3456354

work page doi:10.1109/tvcg.2024.3456354 2025
[20]

Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. Article 94, 15 pages. doi:10. 1145/3526113.3545660

work page arXiv 2022
[21]

Andrej Karpathy. 2025.In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.https://x.com/karpathy/status/1937902205765607626 Tweet. Accessed: 2025-09-15

work page arXiv 2025
[22]

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. arXiv:2004.04906 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[23]

Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. 2025. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv:2506.08872 [cs.AI] https://arxiv.org/abs/2506.08872

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Ar...

work page arXiv 2025
[25]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems (NI...

work page doi:10.5555/3495724.3496517 2020
[26]

Haihan Lin, Derya Akbaba, Miriah Meyer, and Alexander Lex. 2023. Data Hunches: Incorporating Personal Knowledge into Visualizations.IEEE Transac- tions on Visualization and Computer Graphics29, 1 (2023), 504–514. doi:10.1109/ TVCG.2022.3209451

work page arXiv 2023
[27]

Shixia Liu, Xiting Wang, Christopher Collins, Wenwen Dou, Fangxin Ouyang, Mennatallah El-Assady, Liu Jiang, and Daniel A. Keim. 2019. Bridging Text Visu- alization and Mining: A Task-Driven Survey.IEEE Transactions on Visualization and Computer Graphics25, 7 (2019), 2482–2504. doi:10.1109/TVCG.2018.2834341

work page doi:10.1109/tvcg.2018.2834341 2019
[28]

Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.Journal of Open Source Software2, 11 (2017), 205. doi:10.21105/joss.00205

work page doi:10.21105/joss.00205 2017
[29]

Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, and Shenghua Liu. 2025. A Survey of Context Engineering for Large Language Models. arXiv:2507.13334 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Microsoft. 2023. Microsoft 365 Copilot. AI-powered productivity tool. https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365- copilot-overview Launched as part of Microsoft’s generative AI productivity suite

work page 2023
[31]

2025.Overreliance on AI: Risk Identification and Mitigation Frame- work

Microsoft. 2025.Overreliance on AI: Risk Identification and Mitigation Frame- work. https://learn.microsoft.com/en-us/ai/playbook/technology-guidance/ overreliance-on-ai/overreliance-on-ai Microsoft Learn. Accessed: 2025-09-15

work page 2025
[32]

Rajat Mukherjee and Jianchang Mao. 2004. Enterprise Search: Tough Stuff: Why is it that searching an intranet is so much harder than searching the Web?Queue 2, 2 (2004), 36–46

work page 2004
[33]

Coux: Collaborative visual 20 analysis of think-aloud usability test videos for digital interfaces.IEEE Transactions on Visualization and Computer Graphics, 28(1):643–653, 2022

A. Narechania, A. Karduni, R. Wesslen, and E. Wall. 2022. VITALITY: Promoting Serendipitous Discovery of Academic Literature with Transformers &; Visual Analytics.IEEE Transactions on Visualization and Computer Graphics28, 01 (2022), 486–496. doi:10.1109/TVCG.2021.3114820

work page doi:10.1109/tvcg.2021.3114820 2022
[34]

2021.Uncertainty Visual- ization

Lace Padilla, Matthew Kay, and Jessica Hullman. 2021.Uncertainty Visual- ization. John Wiley & Sons, Ltd, 1–18. doi:10.1002/9781118445112.stat08296 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118445112.stat08296

work page doi:10.1002/9781118445112.stat08296 2021
[35]

Drucker, Roland Fernandez, and Niklas Elmqvist

Deokgun Park, Steven M. Drucker, Roland Fernandez, and Niklas Elmqvist. 2018. Atom: A Grammar for Unit Visualizations.IEEE Transactions on Visualization and Computer Graphics24, 12 (2018), 3032–3043. doi:10.1109/TVCG.2017.2785807

work page doi:10.1109/tvcg.2017.2785807 2018
[36]

Deokgun Park, Seungyeon Kim, Jurim Lee, Jaegul Choo, Nicholas Diakopoulos, and Niklas Elmqvist. 2018. ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding.IEEE Transactions on Visualization and Computer Graphics24, 1 (2018), 361–370. doi:10.1109/TVCG.2017.2744478 VizCopilot: Fostering Appropriate Reliance on Enterpris...

work page doi:10.1109/tvcg.2017.2744478 2018
[37]

2024.Appropriate re- liance on Generative AI: Research synthesis

Samir Passi, Shipi Dhanorkar, and Mihaela Vorvoreanu. 2024.Appropriate re- liance on Generative AI: Research synthesis. Technical Report MSR-TR-2024-7. Microsoft. https://www.microsoft.com/en-us/research/publication/appropriate- reliance-on-generative-ai-research-synthesis/

work page 2024
[38]

2022.Overreliance on AI: Literature Review

Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report MSR-TR-2022-12. Microsoft. https://www.microsoft.com/en- us/research/publication/overreliance-on-ai-literature-review/

work page 2022
[39]

2022.Overreliance on AI: Literature Review

Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report. Microsoft Research. Accessed: 2025-08-27

work page 2022
[40]

Rui Qiu, Yamei Tu, Po-Yin Yen, and Han-Wei Shen. 2025. VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information- Seeking.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 1312–1321. doi:10.1109/TVCG.2024.3456339

work page doi:10.1109/tvcg.2024.3456339 2025
[41]

Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncer- tainty Alignment for Large Language Model Planners. arXiv:2307.01928 [cs.RO] https://arxiv.org/abs/2307.01928

work page arXiv 2023
[42]

Tara Safavi, Adam Fourney, Robert Sim, Marcin Juraszek, Shane Williams, Ned Friend, Danai Koutra, and Paul N. Bennett. 2020. Toward Activity Discovery in the Personal Web. InProceedings of the 13th International Conference on Web Search and Data Mining. 492–500. doi:10.1145/3336191.3371828

work page doi:10.1145/3336191.3371828 2020
[43]

Alireza Salemi and Hamed Zamani. 2024. Evaluating Retrieval Quality in Retrieval-Augmented Generation. doi:arXiv:2404.13781 arXiv:2404.13781 [cs.CL]

work page arXiv 2024
[44]

Roumeliotis, and Manoj Karkee

Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. 2026. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges.Information Fusion126 (Feb. 2026), 103599. doi:10.1016/j.inffus.2025.103599

work page doi:10.1016/j.inffus.2025.103599 2026
[45]

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. InArtificial Neural Networks — ICANN’97, Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud (Eds.). Springer Berlin Heidelberg, Berlin, 583–588. doi:10.1007/BFb0020217

work page doi:10.1007/bfb0020217 1997
[46]

Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reli- able, Safe & Trustworthy.International Journal of Human–Computer Interaction36, 6 (2020), 495–504. doi:10.1080/10447318.2020.1741118 arXiv:https://doi.org/10.1080/10447318.2020.1741118

work page doi:10.1080/10447318.2020.1741118 2020
[47]

Ben Shneiderman and Pattie Maes. 1997. Direct manipulation vs. interface agents. Interactions4, 6 (Nov. 1997), 42–61. doi:10.1145/267505.267514

work page doi:10.1145/267505.267514 1997
[48]

Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, and Jordan Boyd-Graber. 2024. Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong. arXiv:2310.12558 [cs.CL] https://arxiv.org/abs/2310.12558

work page arXiv 2024
[49]

Chase Stokes, Chelsea Sanker, Bridget Cogley, and Vidya Setlur. 2024. From Delays to Densities: Exploring Data Uncertainty through Speech, Text, and Visualization.Computer Graphics Forum43, 3 (2024), e15100. doi:10.1111/cgf. 15100 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.15100

work page doi:10.1111/cgf 2024
[50]

Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, and Sean Rintel. 2024. The Metacognitive Demands and Opportunities of Generative AI. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. Article 680, 24 pages. doi:10.1145/3613904. 3642902

work page doi:10.1145/3613904 2024
[51]

LangChain Team. 2024. LangGraph Studio: A Specialized Agent IDE for LLM Applications. https://langchain-ai.github.io/langgraphjs/concepts/langgraph_ studio/ Accessed: 2025-09-25

work page 2024
[52]

2025.Fostering appropriate reliance on GenAI: Lessons learned from early research

Mihaela Vorvoreanu, Samir Passi, Shipi Dhanorkar, Amy Heger, and Kath- leen Walker. 2025.Fostering appropriate reliance on GenAI: Lessons learned from early research. Technical Report MSR-TR-2025-4. Mi- crosoft. https://www.microsoft.com/en-us/research/publication/fostering- appropriate-reliance-on-genai-lessons-learned-from-early-research/

work page 2025
[53]

Smith, Kalyan Veeramachaneni, and Huamin Qu

Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12. doi:10. 1145/3290605.3300911

work page arXiv 2019
[54]

Yi Yang, Quanming Yao, and Huamin Qu. 2017. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling.Visual Informatics1, 1 (2017), 40–47. doi:10.1016/j.visinf.2017.01.005

work page doi:10.1016/j.visinf.2017.01.005 2017
[55]

2025.Eval- uation of Retrieval-Augmented Generation: A Survey

Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. 2025.Eval- uation of Retrieval-Augmented Generation: A Survey. Springer Nature Singapore, 102–120. doi:10.1007/978-981-96-1024-2_8

work page doi:10.1007/978-981-96-1024-2_8 2025
[56]

V ., Zhang, J

Bhada Yun, Dana Feng, Ace S. Chen, Afshin Nikzad, and Niloufar Salehi. 2025. Generative AI in Knowledge Work: Design Implications for Data Navigation and Decision-Making. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Article 634, 19 pages. doi:10.1145/3706598.3713337

work page doi:10.1145/3706598.3713337 2025
[57]

Zamfirescu-Pereira, Richmond Y

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang

work page
[58]

Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages

Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Article 437, 21 pages. doi:10.1145/3544548.3581388

work page doi:10.1145/3544548.3581388 2023
[59]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-based Agents.ACM Trans. Inf. Syst.43, 6, Article 155 (Sept. 2025), 47 pages. doi:10.1145/3748302

work page doi:10.1145/3748302 2025

[1] [1]

Eric Alexander, Joe Kohlmann, Robin Valenza, Michael Witmore, and Michael Gleicher. 2014. Serendip: Topic model-driven visual exploration of text corpora. In 2014 IEEE Conference on Visual Analytics Science and Technology (V AST). 173–182. doi:10.1109/VAST.2014.7042493

work page doi:10.1109/vast.2014.7042493 2014

[2] [2]

Bennett, Kori Inkpen, and Jaime Teevan

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human- AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. doi:10.1145/3290605.3300233

work page doi:10.1145/3290605.3300233 2019

[3] [3]

Anthropic. 2025. Introduction to the Model Context Protocol. https:// modelcontextprotocol.io/docs/getting-started/intro. Accessed: 2025-08-27

work page 2025

[4] [4]

Muneera Bano, Didar Zowghi, Jon Whittle, Liming Zhu, Andrew Reeson, Rob Martin, and Jen Parsons. 2025. A Qualitative Study of User Perception of M365 AI Copilot. arXiv:2503.17661 [cs.CY] https://arxiv.org/abs/2503.17661

work page arXiv 2025

[5] [5]

Mohammad Beigi, Sijia Wang, Ying Shen, Zihao Lin, Adithya Kulkarni, Jianfeng He, Feng Chen, Ming Jin, Jin-Hee Cho, Dawei Zhou, Chang-Tien Lu, and Lifu Huang. 2024. Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models. arXiv:2410.20199 [cs.AI] https://arxiv.org/abs/ 2410.20199

work page arXiv 2024

[6] [6]

Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis. Qualitative research in sport, exercise and health11, 4 (2019), 589–597

work page 2019

[7] [7]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI- assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (2021), 21 pages. doi:10.1145/3449287

work page internal anchor Pith review doi:10.1145/3449287 2021

[8] [8]

Nan Cao, David Gotz, Jimeng Sun, Yu-Ru Lin, and Huamin Qu. 2011. SolarMap: Multifaceted Visual Analytics for Topic Exploration. In2011 IEEE 11th Interna- tional Conference on Data Mining. 101–110. doi:10.1109/ICDM.2011.135

work page doi:10.1109/icdm.2011.135 2011

[9] [9]

Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz, Shixia Liu, and Huamin Qu. 2010. FacetAtlas: Multifaceted Visualization for Rich Text Corpora.IEEE Transactions on Visualization and Computer Graphics16, 6 (2010), 1172–1181. doi:10.1109/ TVCG.2010.154

work page 2010

[10] [10]

Angelos Chatzimparmpas. 2025. Visual Analytics for Explainable and Trustwor- thy Artificial Intelligence.IEEE Computer Graphics and Applications45, 2 (2025), 100–111. doi:10.1109/MCG.2025.3533806

work page doi:10.1109/mcg.2025.3533806 2025

[11] [11]

Reddy, and Haesun Park

Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Ma- trix Factorization.IEEE Transactions on Visualization and Computer Graphics19, 12 (2013), 1992–2001. doi:10.1109/TVCG.2013.212

work page doi:10.1109/tvcg.2013.212 2013

[12] [12]

Nazli Cila. 2022. Designing Human-Agent Collaborations: Commitment, respon- siveness, and support. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. Article 420, 18 pages. doi:10.1145/3491102.3517500

work page doi:10.1145/3491102.3517500 2022

[13] [13]

Cover and P

T. Cover and P. Hart. 1967. Nearest neighbor pattern classification.IEEE Trans- actions on Information Theory13, 1 (1967), 21–27. doi:10.1109/TIT.1967.1053964

work page doi:10.1109/tit.1967.1053964 1967

[14] [14]

Wenwen Dou, Li Yu, Xiaoyu Wang, Zhiqiang Ma, and William Ribarsky. 2013. HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hier- archies.IEEE Transactions on Visualization and Computer Graphics19, 12 (2013), 2002–2011. doi:10.1109/TVCG.2013.162

work page doi:10.1109/tvcg.2013.162 2013

[15] [15]

It makes you think

Ian Drosos, Advait Sarkar, Xiaotong, Xu, and Neil Toronto. 2025. "It makes you think": Provocations Help Restore Critical Thinking to AI-Assisted Knowledge Work. arXiv:2501.17247 [cs.HC]

work page arXiv 2025

[16] [16]

Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022). doi:10.48550/arXiv. 2203.05794

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2022

[17] [17]

Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems.Proceedings of the National Academy of Sciences116, 6 (2019), 1844–1850. doi:10.1073/pnas.1807184115

work page doi:10.1073/pnas.1807184115 2019

[18] [18]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Trans. Inf. Syst.43, 2, Article 42 (2025), 55 pages. doi:10.1145/3703155

work page doi:10.1145/3703155 2025

[19] [19]

Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, and Lucas Dixon. 2025. LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 503–513. doi:10.1109/TVCG.2024.3456354

work page doi:10.1109/tvcg.2024.3456354 2025

[20] [20]

Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. Article 94, 15 pages. doi:10. 1145/3526113.3545660

work page arXiv 2022

[21] [21]

Andrej Karpathy. 2025.In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.https://x.com/karpathy/status/1937902205765607626 Tweet. Accessed: 2025-09-15

work page arXiv 2025

[22] [22]

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. arXiv:2004.04906 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[23] [23]

Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. 2025. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv:2506.08872 [cs.AI] https://arxiv.org/abs/2506.08872

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Ar...

work page arXiv 2025

[25] [25]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems (NI...

work page doi:10.5555/3495724.3496517 2020

[26] [26]

Haihan Lin, Derya Akbaba, Miriah Meyer, and Alexander Lex. 2023. Data Hunches: Incorporating Personal Knowledge into Visualizations.IEEE Transac- tions on Visualization and Computer Graphics29, 1 (2023), 504–514. doi:10.1109/ TVCG.2022.3209451

work page arXiv 2023

[27] [27]

Shixia Liu, Xiting Wang, Christopher Collins, Wenwen Dou, Fangxin Ouyang, Mennatallah El-Assady, Liu Jiang, and Daniel A. Keim. 2019. Bridging Text Visu- alization and Mining: A Task-Driven Survey.IEEE Transactions on Visualization and Computer Graphics25, 7 (2019), 2482–2504. doi:10.1109/TVCG.2018.2834341

work page doi:10.1109/tvcg.2018.2834341 2019

[28] [28]

Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.Journal of Open Source Software2, 11 (2017), 205. doi:10.21105/joss.00205

work page doi:10.21105/joss.00205 2017

[29] [29]

Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, and Shenghua Liu. 2025. A Survey of Context Engineering for Large Language Models. arXiv:2507.13334 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Microsoft. 2023. Microsoft 365 Copilot. AI-powered productivity tool. https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365- copilot-overview Launched as part of Microsoft’s generative AI productivity suite

work page 2023

[31] [31]

2025.Overreliance on AI: Risk Identification and Mitigation Frame- work

Microsoft. 2025.Overreliance on AI: Risk Identification and Mitigation Frame- work. https://learn.microsoft.com/en-us/ai/playbook/technology-guidance/ overreliance-on-ai/overreliance-on-ai Microsoft Learn. Accessed: 2025-09-15

work page 2025

[32] [32]

Rajat Mukherjee and Jianchang Mao. 2004. Enterprise Search: Tough Stuff: Why is it that searching an intranet is so much harder than searching the Web?Queue 2, 2 (2004), 36–46

work page 2004

[33] [33]

Coux: Collaborative visual 20 analysis of think-aloud usability test videos for digital interfaces.IEEE Transactions on Visualization and Computer Graphics, 28(1):643–653, 2022

A. Narechania, A. Karduni, R. Wesslen, and E. Wall. 2022. VITALITY: Promoting Serendipitous Discovery of Academic Literature with Transformers &; Visual Analytics.IEEE Transactions on Visualization and Computer Graphics28, 01 (2022), 486–496. doi:10.1109/TVCG.2021.3114820

work page doi:10.1109/tvcg.2021.3114820 2022

[34] [34]

2021.Uncertainty Visual- ization

Lace Padilla, Matthew Kay, and Jessica Hullman. 2021.Uncertainty Visual- ization. John Wiley & Sons, Ltd, 1–18. doi:10.1002/9781118445112.stat08296 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118445112.stat08296

work page doi:10.1002/9781118445112.stat08296 2021

[35] [35]

Drucker, Roland Fernandez, and Niklas Elmqvist

Deokgun Park, Steven M. Drucker, Roland Fernandez, and Niklas Elmqvist. 2018. Atom: A Grammar for Unit Visualizations.IEEE Transactions on Visualization and Computer Graphics24, 12 (2018), 3032–3043. doi:10.1109/TVCG.2017.2785807

work page doi:10.1109/tvcg.2017.2785807 2018

[36] [36]

Deokgun Park, Seungyeon Kim, Jurim Lee, Jaegul Choo, Nicholas Diakopoulos, and Niklas Elmqvist. 2018. ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding.IEEE Transactions on Visualization and Computer Graphics24, 1 (2018), 361–370. doi:10.1109/TVCG.2017.2744478 VizCopilot: Fostering Appropriate Reliance on Enterpris...

work page doi:10.1109/tvcg.2017.2744478 2018

[37] [37]

2024.Appropriate re- liance on Generative AI: Research synthesis

Samir Passi, Shipi Dhanorkar, and Mihaela Vorvoreanu. 2024.Appropriate re- liance on Generative AI: Research synthesis. Technical Report MSR-TR-2024-7. Microsoft. https://www.microsoft.com/en-us/research/publication/appropriate- reliance-on-generative-ai-research-synthesis/

work page 2024

[38] [38]

2022.Overreliance on AI: Literature Review

Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report MSR-TR-2022-12. Microsoft. https://www.microsoft.com/en- us/research/publication/overreliance-on-ai-literature-review/

work page 2022

[39] [39]

2022.Overreliance on AI: Literature Review

Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report. Microsoft Research. Accessed: 2025-08-27

work page 2022

[40] [40]

Rui Qiu, Yamei Tu, Po-Yin Yen, and Han-Wei Shen. 2025. VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information- Seeking.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 1312–1321. doi:10.1109/TVCG.2024.3456339

work page doi:10.1109/tvcg.2024.3456339 2025

[41] [41]

Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncer- tainty Alignment for Large Language Model Planners. arXiv:2307.01928 [cs.RO] https://arxiv.org/abs/2307.01928

work page arXiv 2023

[42] [42]

Tara Safavi, Adam Fourney, Robert Sim, Marcin Juraszek, Shane Williams, Ned Friend, Danai Koutra, and Paul N. Bennett. 2020. Toward Activity Discovery in the Personal Web. InProceedings of the 13th International Conference on Web Search and Data Mining. 492–500. doi:10.1145/3336191.3371828

work page doi:10.1145/3336191.3371828 2020

[43] [43]

Alireza Salemi and Hamed Zamani. 2024. Evaluating Retrieval Quality in Retrieval-Augmented Generation. doi:arXiv:2404.13781 arXiv:2404.13781 [cs.CL]

work page arXiv 2024

[44] [44]

Roumeliotis, and Manoj Karkee

Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. 2026. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges.Information Fusion126 (Feb. 2026), 103599. doi:10.1016/j.inffus.2025.103599

work page doi:10.1016/j.inffus.2025.103599 2026

[45] [45]

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. InArtificial Neural Networks — ICANN’97, Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud (Eds.). Springer Berlin Heidelberg, Berlin, 583–588. doi:10.1007/BFb0020217

work page doi:10.1007/bfb0020217 1997

[46] [46]

Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reli- able, Safe & Trustworthy.International Journal of Human–Computer Interaction36, 6 (2020), 495–504. doi:10.1080/10447318.2020.1741118 arXiv:https://doi.org/10.1080/10447318.2020.1741118

work page doi:10.1080/10447318.2020.1741118 2020

[47] [47]

Ben Shneiderman and Pattie Maes. 1997. Direct manipulation vs. interface agents. Interactions4, 6 (Nov. 1997), 42–61. doi:10.1145/267505.267514

work page doi:10.1145/267505.267514 1997

[48] [48]

Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, and Jordan Boyd-Graber. 2024. Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong. arXiv:2310.12558 [cs.CL] https://arxiv.org/abs/2310.12558

work page arXiv 2024

[49] [49]

Chase Stokes, Chelsea Sanker, Bridget Cogley, and Vidya Setlur. 2024. From Delays to Densities: Exploring Data Uncertainty through Speech, Text, and Visualization.Computer Graphics Forum43, 3 (2024), e15100. doi:10.1111/cgf. 15100 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.15100

work page doi:10.1111/cgf 2024

[50] [50]

Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, and Sean Rintel. 2024. The Metacognitive Demands and Opportunities of Generative AI. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. Article 680, 24 pages. doi:10.1145/3613904. 3642902

work page doi:10.1145/3613904 2024

[51] [51]

LangChain Team. 2024. LangGraph Studio: A Specialized Agent IDE for LLM Applications. https://langchain-ai.github.io/langgraphjs/concepts/langgraph_ studio/ Accessed: 2025-09-25

work page 2024

[52] [52]

2025.Fostering appropriate reliance on GenAI: Lessons learned from early research

Mihaela Vorvoreanu, Samir Passi, Shipi Dhanorkar, Amy Heger, and Kath- leen Walker. 2025.Fostering appropriate reliance on GenAI: Lessons learned from early research. Technical Report MSR-TR-2025-4. Mi- crosoft. https://www.microsoft.com/en-us/research/publication/fostering- appropriate-reliance-on-genai-lessons-learned-from-early-research/

work page 2025

[53] [53]

Smith, Kalyan Veeramachaneni, and Huamin Qu

Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12. doi:10. 1145/3290605.3300911

work page arXiv 2019

[54] [54]

Yi Yang, Quanming Yao, and Huamin Qu. 2017. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling.Visual Informatics1, 1 (2017), 40–47. doi:10.1016/j.visinf.2017.01.005

work page doi:10.1016/j.visinf.2017.01.005 2017

[55] [55]

2025.Eval- uation of Retrieval-Augmented Generation: A Survey

Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. 2025.Eval- uation of Retrieval-Augmented Generation: A Survey. Springer Nature Singapore, 102–120. doi:10.1007/978-981-96-1024-2_8

work page doi:10.1007/978-981-96-1024-2_8 2025

[56] [56]

V ., Zhang, J

Bhada Yun, Dana Feng, Ace S. Chen, Afshin Nikzad, and Niloufar Salehi. 2025. Generative AI in Knowledge Work: Design Implications for Data Navigation and Decision-Making. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Article 634, 19 pages. doi:10.1145/3706598.3713337

work page doi:10.1145/3706598.3713337 2025

[57] [57]

Zamfirescu-Pereira, Richmond Y

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang

work page

[58] [58]

Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages

Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Article 437, 21 pages. doi:10.1145/3544548.3581388

work page doi:10.1145/3544548.3581388 2023

[59] [59]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-based Agents.ACM Trans. Inf. Syst.43, 6, Article 155 (Sept. 2025), 47 pages. doi:10.1145/3748302

work page doi:10.1145/3748302 2025