pith. sign in

arxiv: 2510.11954 · v2 · submitted 2025-10-13 · 💻 cs.HC

VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization

Pith reviewed 2026-05-18 07:05 UTC · model grok-4.3

classification 💻 cs.HC
keywords enterprise chatbotscontext visualizationtopic modelingcontext alignmentprompting strategieshuman oversightappropriate relianceresearch through design
0
0 comments X

The pith

Visualization enables users to detect and correct misaligned context in enterprise chatbots while adapting their prompting strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enterprise chatbots often retrieve context from large databases that does not match user intentions, leading to responses that are technically correct but irrelevant. VizCopilot introduces visualization by combining topic modeling with document views so users can inspect, adjust, and align the context before generation. A Research-through-Design study using the prototype found that these visuals help users spot and fix problems and also lead them to change how they prompt the system for better context from the start. The approach aims to keep added mental effort low enough for everyday work. The work also identifies remaining gaps in support for detailed reading and trust in AI summaries.

Core claim

By combining topic modeling with document visualization, VizCopilot enables human oversight and modification of retrieved context. The study shows that visualization not only helps users detect and correct misaligned context but also encourages them to adapt their prompting strategies, enabling the system to retrieve more relevant context from the outset.

What carries the argument

VizCopilot prototype that combines topic modeling with document visualization to support human oversight and modification of retrieved context in enterprise chatbots.

If this is right

  • Users can detect when retrieved context diverges from their intentions and correct it directly.
  • Users adapt prompting strategies after seeing visual feedback, improving initial context retrieval.
  • Cognitive overhead stays low enough for practical oversight and changes in real tasks.
  • Gaps appear in verification support for close reading and building trust in AI summaries.
  • Future chatbot designs should explore personalization, proactivity, and sustainable human-AI collaboration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar visualization layers could be added to other retrieval-augmented AI tools to increase user control over data sources.
  • Making context visible may reduce cases where the chatbot produces correct but off-target answers across different domains.
  • Longer-term deployments could test whether improved prompting habits transfer beyond the study setting.
  • The work connects to broader transparency needs in retrieval-based systems by showing visualization as one concrete method.

Load-bearing premise

The chosen visualization techniques keep cognitive overhead manageable while enabling effective human oversight and modification of retrieved context in realistic enterprise use.

What would settle it

A user study finding that participants with VizCopilot show no higher rates of context correction or prompting adaptation than participants using a standard text-only chatbot interface.

Figures

Figures reproduced from arXiv: 2510.11954 by Albert Calzaretto, Alice Ferng, Jingya Chen, Mihaela Vorvoreanu, Richard Lee, Samir Passi, Sam Yu-Te Lee.

Figure 1
Figure 1. Figure 1: Overview of VizCopilot before entering a prompt. (a) The visualization panel shows topic structures of the context in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Interactions supported by VizCopilot. (a) Users can enter their prompt in the chat panel to initiate a conversation. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The highlight feature allows users to quickly check [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Enterprise chatbots show promise in supporting knowledge workers in information synthesis tasks by retrieving context from large, heterogeneous databases before generating answers. However, when the retrieved context misaligns with user intentions, the chatbot often produces "irrelevantly right" responses that provide little value. In this work, we introduce VizCopilot, a prototype that incorporates visualization techniques to actively involve end-users in context alignment. By combining topic modeling with document visualization, VizCopilot enables human oversight and modification of retrieved context while keeping cognitive overhead manageable. We used VizCopilot as a design probe in a Research-through-Design study to evaluate the role of visualization in context alignment and to surface future design opportunities. Our findings show that visualization not only helps users detect and correct misaligned context but also encourages them to adapt their prompting strategies, enabling the system to retrieve more relevant context from the outset. At the same time, the study reveals limitations in verification support regarding close-reading and trust in AI summaries. We outline future directions for visualization-enhanced chatbots, focusing on personalization, proactivity, and sustainable human-AI collaboration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces VizCopilot, a prototype that combines topic modeling with document visualization to support end-user oversight and modification of retrieved context in enterprise chatbots. Through a Research-through-Design study using the system as a design probe, the authors report that visualization aids detection and correction of misaligned context, encourages users to adapt prompting strategies for improved initial retrieval, and surfaces limitations in close-reading support and trust in AI summaries, while outlining future directions for personalization and proactivity.

Significance. If the empirical observations hold, the work contributes to HCI by illustrating how visualization can foster appropriate reliance on retrieval-augmented chatbots in knowledge work settings. The dual finding that visualization supports both reactive correction and proactive strategy change is a useful distinction for designing sustainable human-AI collaboration, and the explicit discussion of verification limitations provides concrete guidance for follow-on systems.

major comments (2)
  1. [Study / Evaluation] The evaluation section provides no participant count, study protocol details, analysis method (e.g., thematic analysis procedure), or comparison baseline. Without these, the central claim that visualization encourages prompting adaptation yielding more relevant context from the outset cannot be rigorously assessed and may rest on post-hoc interpretation of selected observations.
  2. [Findings on prompting adaptation] The finding that users adapt prompting strategies is supported only by qualitative think-aloud and interview data; no logged retrieval metrics, relevance judgments against ground truth, or controlled pre/post comparisons are reported. This leaves open the possibility that observed adaptations reflect novelty or social-desirability effects rather than causally improved context quality.
minor comments (2)
  1. [Abstract] The abstract states that topic modeling combined with document visualization keeps cognitive overhead manageable, yet no specific user feedback or workload measures are referenced to support this assertion.
  2. [VizCopilot prototype description] Figure captions and axis labels in the visualization examples could be expanded to clarify how topic clusters and document highlights map to user-modifiable context elements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback on our manuscript. We appreciate the emphasis on methodological transparency and the strength of evidence for our claims. We address each major comment in turn below, outlining how we plan to revise the paper.

read point-by-point responses
  1. Referee: [Study / Evaluation] The evaluation section provides no participant count, study protocol details, analysis method (e.g., thematic analysis procedure), or comparison baseline. Without these, the central claim that visualization encourages prompting adaptation yielding more relevant context from the outset cannot be rigorously assessed and may rest on post-hoc interpretation of selected observations.

    Authors: We acknowledge the validity of this observation. The current manuscript provides a concise description of the Research-through-Design study but omits specific details such as the exact number of participants, the full study protocol, the thematic analysis method, and any baseline comparison. To address this, we will substantially revise the Evaluation section to include these elements: we will report the participant count and selection criteria, provide a detailed protocol including task descriptions and session structure, describe the analysis method (thematic analysis following established guidelines), and clarify that the study was not designed with a comparison baseline as it focused on probing design opportunities with the prototype. These additions will enable a more rigorous assessment of the findings on prompting adaptation. revision: yes

  2. Referee: [Findings on prompting adaptation] The finding that users adapt prompting strategies is supported only by qualitative think-aloud and interview data; no logged retrieval metrics, relevance judgments against ground truth, or controlled pre/post comparisons are reported. This leaves open the possibility that observed adaptations reflect novelty or social-desirability effects rather than causally improved context quality.

    Authors: We agree that the support for the prompting adaptation finding is qualitative in nature. Given the Research-through-Design methodology, our primary aim was to generate insights into how visualization can support context oversight rather than to demonstrate causal effects through quantitative measures. The think-aloud and interview data revealed users' reasoning processes and strategy changes. However, we recognize the limitations pointed out, including potential novelty and social-desirability biases. In the revised manuscript, we will add an explicit discussion of these alternative explanations in the Findings and Limitations sections, and we will moderate the language around the claim to reflect the exploratory nature of the evidence. While we cannot retroactively add logged metrics or controlled comparisons without conducting additional studies, we believe the qualitative data offers substantive value for the HCI community in this emerging area. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on independent user study observations

full rationale

The paper reports findings from a Research-through-Design probe study using thematic analysis of think-aloud sessions and interviews. No equations, parameter fittings, derivations, or predictions appear anywhere in the manuscript. Central claims about visualization aiding context alignment and prompting strategy adaptation are presented as direct outcomes of participant data rather than reductions to self-defined inputs or self-citations. Any self-citations are incidental and non-load-bearing for the empirical results, which remain externally falsifiable via replication of the described study protocol.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on domain assumptions about user interaction with visualizations rather than fitted parameters or new invented entities; full paper would be needed to audit any unstated design choices.

axioms (1)
  • domain assumption Visualization techniques can keep cognitive overhead manageable while enabling effective oversight and modification of retrieved context.
    This premise underpins the reported benefits of detection, correction, and prompting adaptation.

pith-pipeline@v0.9.0 · 5748 in / 1225 out tokens · 34850 ms · 2026-05-18T07:05:12.352546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 5 internal anchors

  1. [1]

    Eric Alexander, Joe Kohlmann, Robin Valenza, Michael Witmore, and Michael Gleicher. 2014. Serendip: Topic model-driven visual exploration of text corpora. In 2014 IEEE Conference on Visual Analytics Science and Technology (V AST). 173–182. doi:10.1109/VAST.2014.7042493

  2. [2]

    Bennett, Kori Inkpen, and Jaime Teevan

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human- AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. doi:10.1145/3290605.3300233

  3. [3]

    Anthropic. 2025. Introduction to the Model Context Protocol. https:// modelcontextprotocol.io/docs/getting-started/intro. Accessed: 2025-08-27

  4. [4]

    Muneera Bano, Didar Zowghi, Jon Whittle, Liming Zhu, Andrew Reeson, Rob Martin, and Jen Parsons. 2025. A Qualitative Study of User Perception of M365 AI Copilot. arXiv:2503.17661 [cs.CY] https://arxiv.org/abs/2503.17661

  5. [5]

    Mohammad Beigi, Sijia Wang, Ying Shen, Zihao Lin, Adithya Kulkarni, Jianfeng He, Feng Chen, Ming Jin, Jin-Hee Cho, Dawei Zhou, Chang-Tien Lu, and Lifu Huang. 2024. Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models. arXiv:2410.20199 [cs.AI] https://arxiv.org/abs/ 2410.20199

  6. [6]

    Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis. Qualitative research in sport, exercise and health11, 4 (2019), 589–597

  7. [7]

    Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI- assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (2021), 21 pages. doi:10.1145/3449287

  8. [8]

    Nan Cao, David Gotz, Jimeng Sun, Yu-Ru Lin, and Huamin Qu. 2011. SolarMap: Multifaceted Visual Analytics for Topic Exploration. In2011 IEEE 11th Interna- tional Conference on Data Mining. 101–110. doi:10.1109/ICDM.2011.135

  9. [9]

    Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz, Shixia Liu, and Huamin Qu. 2010. FacetAtlas: Multifaceted Visualization for Rich Text Corpora.IEEE Transactions on Visualization and Computer Graphics16, 6 (2010), 1172–1181. doi:10.1109/ TVCG.2010.154

  10. [10]

    Angelos Chatzimparmpas. 2025. Visual Analytics for Explainable and Trustwor- thy Artificial Intelligence.IEEE Computer Graphics and Applications45, 2 (2025), 100–111. doi:10.1109/MCG.2025.3533806

  11. [11]

    Reddy, and Haesun Park

    Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Ma- trix Factorization.IEEE Transactions on Visualization and Computer Graphics19, 12 (2013), 1992–2001. doi:10.1109/TVCG.2013.212

  12. [12]

    Nazli Cila. 2022. Designing Human-Agent Collaborations: Commitment, respon- siveness, and support. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. Article 420, 18 pages. doi:10.1145/3491102.3517500

  13. [13]

    Cover and P

    T. Cover and P. Hart. 1967. Nearest neighbor pattern classification.IEEE Trans- actions on Information Theory13, 1 (1967), 21–27. doi:10.1109/TIT.1967.1053964

  14. [14]

    Wenwen Dou, Li Yu, Xiaoyu Wang, Zhiqiang Ma, and William Ribarsky. 2013. HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hier- archies.IEEE Transactions on Visualization and Computer Graphics19, 12 (2013), 2002–2011. doi:10.1109/TVCG.2013.162

  15. [15]

    It makes you think

    Ian Drosos, Advait Sarkar, Xiaotong, Xu, and Neil Toronto. 2025. "It makes you think": Provocations Help Restore Critical Thinking to AI-Assisted Knowledge Work. arXiv:2501.17247 [cs.HC]

  16. [16]

    Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022). doi:10.48550/arXiv. 2203.05794

  17. [17]

    Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems.Proceedings of the National Academy of Sciences116, 6 (2019), 1844–1850. doi:10.1073/pnas.1807184115

  18. [18]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Trans. Inf. Syst.43, 2, Article 42 (2025), 55 pages. doi:10.1145/3703155

  19. [19]

    Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, and Lucas Dixon. 2025. LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 503–513. doi:10.1109/TVCG.2024.3456354

  20. [20]

    Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. Article 94, 15 pages. doi:10. 1145/3526113.3545660

  21. [21]

    Andrej Karpathy. 2025.In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.https://x.com/karpathy/status/1937902205765607626 Tweet. Accessed: 2025-09-15

  22. [22]

    Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. arXiv:2004.04906 [cs.CL]

  23. [23]

    Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. 2025. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv:2506.08872 [cs.AI] https://arxiv.org/abs/2506.08872

  24. [24]

    Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Ar...

  25. [25]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems (NI...

  26. [26]

    Haihan Lin, Derya Akbaba, Miriah Meyer, and Alexander Lex. 2023. Data Hunches: Incorporating Personal Knowledge into Visualizations.IEEE Transac- tions on Visualization and Computer Graphics29, 1 (2023), 504–514. doi:10.1109/ TVCG.2022.3209451

  27. [27]

    Shixia Liu, Xiting Wang, Christopher Collins, Wenwen Dou, Fangxin Ouyang, Mennatallah El-Assady, Liu Jiang, and Daniel A. Keim. 2019. Bridging Text Visu- alization and Mining: A Task-Driven Survey.IEEE Transactions on Visualization and Computer Graphics25, 7 (2019), 2482–2504. doi:10.1109/TVCG.2018.2834341

  28. [28]

    Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.Journal of Open Source Software2, 11 (2017), 205. doi:10.21105/joss.00205

  29. [29]

    Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, and Shenghua Liu. 2025. A Survey of Context Engineering for Large Language Models. arXiv:2507.13334 [cs.CL]

  30. [30]

    Microsoft. 2023. Microsoft 365 Copilot. AI-powered productivity tool. https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365- copilot-overview Launched as part of Microsoft’s generative AI productivity suite

  31. [31]

    2025.Overreliance on AI: Risk Identification and Mitigation Frame- work

    Microsoft. 2025.Overreliance on AI: Risk Identification and Mitigation Frame- work. https://learn.microsoft.com/en-us/ai/playbook/technology-guidance/ overreliance-on-ai/overreliance-on-ai Microsoft Learn. Accessed: 2025-09-15

  32. [32]

    Rajat Mukherjee and Jianchang Mao. 2004. Enterprise Search: Tough Stuff: Why is it that searching an intranet is so much harder than searching the Web?Queue 2, 2 (2004), 36–46

  33. [33]

    Coux: Collaborative visual 20 analysis of think-aloud usability test videos for digital interfaces.IEEE Transactions on Visualization and Computer Graphics, 28(1):643–653, 2022

    A. Narechania, A. Karduni, R. Wesslen, and E. Wall. 2022. VITALITY: Promoting Serendipitous Discovery of Academic Literature with Transformers &; Visual Analytics.IEEE Transactions on Visualization and Computer Graphics28, 01 (2022), 486–496. doi:10.1109/TVCG.2021.3114820

  34. [34]

    2021.Uncertainty Visual- ization

    Lace Padilla, Matthew Kay, and Jessica Hullman. 2021.Uncertainty Visual- ization. John Wiley & Sons, Ltd, 1–18. doi:10.1002/9781118445112.stat08296 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118445112.stat08296

  35. [35]

    Drucker, Roland Fernandez, and Niklas Elmqvist

    Deokgun Park, Steven M. Drucker, Roland Fernandez, and Niklas Elmqvist. 2018. Atom: A Grammar for Unit Visualizations.IEEE Transactions on Visualization and Computer Graphics24, 12 (2018), 3032–3043. doi:10.1109/TVCG.2017.2785807

  36. [36]

    Deokgun Park, Seungyeon Kim, Jurim Lee, Jaegul Choo, Nicholas Diakopoulos, and Niklas Elmqvist. 2018. ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding.IEEE Transactions on Visualization and Computer Graphics24, 1 (2018), 361–370. doi:10.1109/TVCG.2017.2744478 VizCopilot: Fostering Appropriate Reliance on Enterpris...

  37. [37]

    2024.Appropriate re- liance on Generative AI: Research synthesis

    Samir Passi, Shipi Dhanorkar, and Mihaela Vorvoreanu. 2024.Appropriate re- liance on Generative AI: Research synthesis. Technical Report MSR-TR-2024-7. Microsoft. https://www.microsoft.com/en-us/research/publication/appropriate- reliance-on-generative-ai-research-synthesis/

  38. [38]

    2022.Overreliance on AI: Literature Review

    Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report MSR-TR-2022-12. Microsoft. https://www.microsoft.com/en- us/research/publication/overreliance-on-ai-literature-review/

  39. [39]

    2022.Overreliance on AI: Literature Review

    Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report. Microsoft Research. Accessed: 2025-08-27

  40. [40]

    Rui Qiu, Yamei Tu, Po-Yin Yen, and Han-Wei Shen. 2025. VADIS: A Visual Analytics Pipeline for Dynamic Document Representation and Information- Seeking.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 1312–1321. doi:10.1109/TVCG.2024.3456339

  41. [41]

    Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncer- tainty Alignment for Large Language Model Planners. arXiv:2307.01928 [cs.RO] https://arxiv.org/abs/2307.01928

  42. [42]

    Tara Safavi, Adam Fourney, Robert Sim, Marcin Juraszek, Shane Williams, Ned Friend, Danai Koutra, and Paul N. Bennett. 2020. Toward Activity Discovery in the Personal Web. InProceedings of the 13th International Conference on Web Search and Data Mining. 492–500. doi:10.1145/3336191.3371828

  43. [43]

    Alireza Salemi and Hamed Zamani. 2024. Evaluating Retrieval Quality in Retrieval-Augmented Generation. doi:arXiv:2404.13781 arXiv:2404.13781 [cs.CL]

  44. [44]

    Roumeliotis, and Manoj Karkee

    Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. 2026. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges.Information Fusion126 (Feb. 2026), 103599. doi:10.1016/j.inffus.2025.103599

  45. [45]

    Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. InArtificial Neural Networks — ICANN’97, Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud (Eds.). Springer Berlin Heidelberg, Berlin, 583–588. doi:10.1007/BFb0020217

  46. [46]

    Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reli- able, Safe & Trustworthy.International Journal of Human–Computer Interaction36, 6 (2020), 495–504. doi:10.1080/10447318.2020.1741118 arXiv:https://doi.org/10.1080/10447318.2020.1741118

  47. [47]

    Ben Shneiderman and Pattie Maes. 1997. Direct manipulation vs. interface agents. Interactions4, 6 (Nov. 1997), 42–61. doi:10.1145/267505.267514

  48. [48]

    Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, and Jordan Boyd-Graber. 2024. Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong. arXiv:2310.12558 [cs.CL] https://arxiv.org/abs/2310.12558

  49. [49]

    Chase Stokes, Chelsea Sanker, Bridget Cogley, and Vidya Setlur. 2024. From Delays to Densities: Exploring Data Uncertainty through Speech, Text, and Visualization.Computer Graphics Forum43, 3 (2024), e15100. doi:10.1111/cgf. 15100 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.15100

  50. [50]

    Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, and Sean Rintel. 2024. The Metacognitive Demands and Opportunities of Generative AI. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. Article 680, 24 pages. doi:10.1145/3613904. 3642902

  51. [51]

    LangChain Team. 2024. LangGraph Studio: A Specialized Agent IDE for LLM Applications. https://langchain-ai.github.io/langgraphjs/concepts/langgraph_ studio/ Accessed: 2025-09-25

  52. [52]

    2025.Fostering appropriate reliance on GenAI: Lessons learned from early research

    Mihaela Vorvoreanu, Samir Passi, Shipi Dhanorkar, Amy Heger, and Kath- leen Walker. 2025.Fostering appropriate reliance on GenAI: Lessons learned from early research. Technical Report MSR-TR-2025-4. Mi- crosoft. https://www.microsoft.com/en-us/research/publication/fostering- appropriate-reliance-on-genai-lessons-learned-from-early-research/

  53. [53]

    Smith, Kalyan Veeramachaneni, and Huamin Qu

    Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12. doi:10. 1145/3290605.3300911

  54. [54]

    Yi Yang, Quanming Yao, and Huamin Qu. 2017. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling.Visual Informatics1, 1 (2017), 40–47. doi:10.1016/j.visinf.2017.01.005

  55. [55]

    2025.Eval- uation of Retrieval-Augmented Generation: A Survey

    Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. 2025.Eval- uation of Retrieval-Augmented Generation: A Survey. Springer Nature Singapore, 102–120. doi:10.1007/978-981-96-1024-2_8

  56. [56]

    V ., Zhang, J

    Bhada Yun, Dana Feng, Ace S. Chen, Afshin Nikzad, and Niloufar Salehi. 2025. Generative AI in Knowledge Work: Design Implications for Data Navigation and Decision-Making. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Article 634, 19 pages. doi:10.1145/3706598.3713337

  57. [57]

    Zamfirescu-Pereira, Richmond Y

    J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang

  58. [58]

    Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages

    Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Article 437, 21 pages. doi:10.1145/3544548.3581388

  59. [59]

    Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-based Agents.ACM Trans. Inf. Syst.43, 6, Article 155 (Sept. 2025), 47 pages. doi:10.1145/3748302