Recognition: 1 theorem link
· Lean TheoremAnalyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems
Pith reviewed 2026-05-15 15:55 UTC · model grok-4.3
The pith
LLM conversational systems differ substantially in the quantity, quality, and presentation of the references they provide to users.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that there are notable variations in the presentation, quality, and quantity of references across the nine systems. ChatGPT provides more references per response on average with higher CRAAP quality scores, while systems such as Hunyuan-TurboS supply fewer references and lower scores. Users rarely interact with the references shown to them, and interaction patterns differ by system. The authors conclude that better interface designs are needed to help users engage with and trust references more effectively.
What carries the argument
Evaluation of reference presentation in user interfaces combined with quality scoring via the CRAAP criteria (Currency, Relevance, Authority, Accuracy, Purpose) applied to 1,517 references, plus a preliminary user study tracking actual interaction behavior.
If this is right
- Users of systems that supply fewer or lower-scoring references may need to perform extra verification steps to reach reliable conclusions.
- Interface changes that make references more visible or easier to explore could increase engagement rates.
- Quality differences imply that the underlying retrieval or generation methods vary in how well they surface trustworthy sources.
- Standard practices for citation display and quality control could reduce the observed performance gaps between systems.
Where Pith is reading between the lines
- If reference quality tracks with a system's retrieval architecture, comparing the underlying document stores or fine-tuning data across providers could explain the observed gaps.
- Making references clickable and contextual might raise overall user trust in LLM answers even when the raw quality scores stay the same.
- Repeating the analysis on live user queries rather than curated samples would test whether the current variations hold outside the laboratory setting.
Load-bearing premise
That the CRAAP criteria give an appropriate and sufficient measure of reference quality for LLM-generated answers and that the 30 sampled question-answer pairs represent real-world usage patterns.
What would settle it
A larger study using hundreds of real user questions and a different quality assessment method that finds no significant differences in reference quantity, quality scores, or user interaction rates across systems.
Figures
read the original abstract
As conversational AI systems become popular for information retrieval and question-answering, the references they cite are key to ensuring their answers are reliable and trustworthy. Yet, no prior work systematically analyzes how these references are presented or their quality. We examine 1,517 references from 30 question-answer pairs across nine systems, focusing on their (1) presentation in the user interface and (2) quality using the CRAAP criteria. We find notable variations in the presentation, quality, and quantity of references across systems. For instance, ChatGPT provides more references (9.5 per response on average) with higher quality (15.48/20 CRAAP score), while Hunyuan-TurboS provides fewer references (4.0) and lower quality (11.65/20). Additionally, a preliminary user study shows that people rarely interact with these references and that their behavior differs across systems. These findings highlight the need for better interface designs that help users engage with and trust references more effectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines 1,517 references drawn from 30 question-answer pairs across nine LLM-powered conversational AI systems. It analyzes reference presentation in the user interface and quality via the CRAAP criteria, reports quantitative variations (e.g., ChatGPT averaging 9.5 references with CRAAP score 15.48/20 versus Hunyuan-TurboS with 4.0 references and 11.65/20), and includes a preliminary user study indicating low user interaction with references that differs across systems. The work concludes that better interface designs are needed to improve engagement and trust.
Significance. If the methodological gaps are closed, the study would be significant for HCI research on conversational AI by providing the first systematic comparison of reference handling across multiple systems. The scale of 1,517 references examined offers a concrete empirical basis for claims about quantity and quality differences, and the user-interaction observations point to actionable design implications. The application of the established CRAAP framework lends some structure, though its fit to LLM-generated content requires justification.
major comments (3)
- [Abstract] Abstract: The manuscript reports specific quantitative findings (9.5 vs. 4.0 references per response; CRAAP scores 15.48/20 vs. 11.65/20) but supplies no information on how the 30 question-answer pairs were selected, whether queries were stratified or randomized, inter-rater reliability for CRAAP scoring, or any statistical tests supporting the variation claims. These omissions leave the central claims of notable differences unsupported by visible evidence.
- [Methods] Methods / Evaluation sections: CRAAP scoring assesses source attributes (currency, relevance, authority, accuracy, purpose) but does not verify whether each cited reference is actually entailed by or supports the LLM-generated answer text. Without an explicit linkage or entailment check between answer content and references, the quality scores cannot be taken as direct measures of trustworthiness in the conversational use case.
- [User Study] User Study section: The preliminary user study is described as showing rare interaction and system-dependent behavior, yet no participant count, task details, or statistical analysis is provided in the abstract or summary. Given the small scale implied, this component cannot compensate for the sampling and measurement limitations in the main reference analysis.
minor comments (2)
- [Methods] The nine systems examined should be listed explicitly with version numbers or access dates in the Methods section for reproducibility.
- [Results] Figure captions and tables reporting per-system reference counts and CRAAP sub-scores would benefit from error bars or confidence intervals to convey variability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where we will revise the manuscript to improve clarity, transparency, and rigor.
read point-by-point responses
-
Referee: [Abstract] Abstract: The manuscript reports specific quantitative findings (9.5 vs. 4.0 references per response; CRAAP scores 15.48/20 vs. 11.65/20) but supplies no information on how the 30 question-answer pairs were selected, whether queries were stratified or randomized, inter-rater reliability for CRAAP scoring, or any statistical tests supporting the variation claims. These omissions leave the central claims of notable differences unsupported by visible evidence.
Authors: We agree that the abstract should provide more methodological transparency. The full manuscript's Methods section describes the 30 question-answer pairs as having been selected for topical diversity across factual, opinion, and current-events domains, but we will revise both the abstract and Methods to explicitly state the selection process, note that queries were not formally stratified or randomized, report inter-rater reliability (Cohen's kappa for CRAAP scoring), and include statistical tests (e.g., ANOVA) supporting the reported differences in quantity and quality. revision: yes
-
Referee: [Methods] Methods / Evaluation sections: CRAAP scoring assesses source attributes (currency, relevance, authority, accuracy, purpose) but does not verify whether each cited reference is actually entailed by or supports the LLM-generated answer text. Without an explicit linkage or entailment check between answer content and references, the quality scores cannot be taken as direct measures of trustworthiness in the conversational use case.
Authors: We acknowledge that CRAAP evaluates reference attributes independently and does not include an entailment or support check against the LLM-generated answer text. Our study intentionally focused on reference quality as presented to users, which is a necessary first step for understanding trustworthiness signals. We will revise the Methods and Discussion sections to explicitly state this scope and limitation, clarify that CRAAP scores do not measure direct content support, and note that future work could incorporate entailment analysis. revision: partial
-
Referee: [User Study] User Study section: The preliminary user study is described as showing rare interaction and system-dependent behavior, yet no participant count, task details, or statistical analysis is provided in the abstract or summary. Given the small scale implied, this component cannot compensate for the sampling and measurement limitations in the main reference analysis.
Authors: The user study is indeed preliminary and exploratory. We will revise the abstract and User Study section to report the participant count (15), provide task details, and include basic statistical comparisons of interaction rates across systems. We agree it cannot fully compensate for limitations in the main analysis and will reposition the study accordingly as supplementary evidence rather than a comprehensive validation. revision: yes
Circularity Check
No significant circularity: purely observational analysis
full rationale
The paper performs a descriptive empirical study by sampling 30 question-answer pairs across nine systems, extracting 1,517 references, and scoring them with the external CRAAP criteria plus a small user interaction log. No equations, fitted parameters, predictions, or derivations appear anywhere in the text. All reported variations (e.g., average reference counts and CRAAP scores) are direct tallies from the collected data rather than quantities derived from or defined in terms of themselves. The CRAAP framework is imported from outside the authors' prior work, and no self-citation is used to justify any load-bearing claim. The analysis is therefore self-contained against external benchmarks and contains none of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption CRAAP criteria provide a valid and objective measure of reference quality in LLM conversational outputs
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We examine 1,517 references from 30 question-answer pairs across nine systems, focusing on their (1) presentation in the user interface and (2) quality using the CRAAP criteria.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Katie Adamson and Susan Prion. 2013. Reliability: Measuring Internal Consis- tency Using Cronbach’s𝛼.Clinical Simulation in Nursing9 (05 2013), e179–e180. Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems CHI EA ’26, April 13–17, 2026, Barcelona, Spain doi:10.1016/j.ecns.2012.12.001
-
[2]
Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmack- ers, and Vincent Ginis. 2024. Large language models reflect human citation patterns with a heightened citation bias.arXiv preprint arXiv:2405.15739(2024). doi:10.48550/arXiv.2405.15739
- [3]
-
[4]
Gal Bakal, Ali Dasdan, Yaniv Katz, Michael Kaufman, and Guy Levin. 2025. Experience with GitHub Copilot for Developer Productivity at Zoominfo.arXiv preprint arXiv:2501.13282(2025). doi:10.48550/arXiv.2501.13282
-
[5]
Sarah Blakeslee. 2004. The CRAAP test.Loex Quarterly31, 3 (2004), 4
work page 2004
-
[6]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901
work page 2020
-
[7]
Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI- assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (April 2021), 21 pages. doi:10.1145/3449287
work page internal anchor Pith review doi:10.1145/3449287 2021
-
[8]
Wanling Cai, Yucheng Jin, and Li Chen. 2022. Impacts of Personal Characteristics on User Trust in Conversational Recommender Systems. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 489, 14 pages. doi:10.1145/3491102.3517471
-
[9]
Shawn Carolan, Amy Wu Martin, C.C. Gong, and Sam Borja. 2025. 2025: The State of Consumer AI. https://menlovc .com/perspective/2025-the-state-of- consumer-ai/
work page 2025
-
[10]
Cheng Chen and S. Shyam Sundar. 2023. Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 816, 11 pages. doi:10.1145/3544548.3580805
-
[11]
Lingjiao Chen, Matei Zaharia, and James Zou. 2023. How is ChatGPT’s behavior changing over time? doi:10.48550/arXiv.2307.09009 arXiv:2307.09009 [cs.CL]
-
[12]
Lee J Cronbach. 1951. Coefficient alpha and the internal structure of tests. psychometrika16, 3 (1951), 297–334
work page 1951
-
[13]
Smit Desai, Christina Ziying Wei, Jaisie Sin, Mateusz Dubiel, Nima Zargham, Shashank Ahire, Martin Porcheron, Anastasia Kuzminykh, Minha Lee, Heloisa Candello, Joel E Fischer, Cosmin Munteanu, and Benjamin R. Cowan. 2024. CUI@CHI 2024: Building Trust in CUIs—From Design to Deployment. InEx- tended Abstracts of the CHI Conference on Human Factors in Comput...
-
[14]
Yifan Ding, Matthew Facciani, Ellen Joyce, Amrit Poudel, Sanmitra Bhattacharya, Balaji Veeramani, Sal Aguinaga, and Tim Weninger. 2025. Citations and trust in LLM generated responses. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteent...
-
[15]
Shutong Fan, Lan Zhang, and Xiaoyong Yuan. 2026. When AI Persuades: Ad- versarial Explanation Attacks on Human Trust in AI-Assisted Decision Making. doi:10.48550/arXiv.2602.04003 arXiv:2602.04003 [cs.AI]
-
[16]
Masyura Ahmad Faudzi, Zaihisma Che Cob, Sharul Azim Sharudin, Ridha Omar, and Masitah Ghazali. 2023. The Effects of User Interface Design for Mobile Learning Application on Learner’s Extraneous Cognitive Load: A Conceptual Framework. InProceedings of the Asian HCI Symposium 2023(Online, Indonesia) (Asian CHI ’23). Association for Computing Machinery, New ...
-
[17]
Massimo Franceschet. 2011. PageRank: standing on the shoulders of giants. Commun. ACM54, 6 (June 2011), 92–101. doi:10.1145/1953122.1953146
-
[19]
Jiangen He and Jiqun Liu. 2025. Not All Transparency Is Equal: Source Presenta- tion Effects on Attention, Interaction, and Persuasion in Conversational Search. doi:10.48550/arXiv.2512.12207 arXiv:2512.12207 [cs.HC]
-
[20]
Jie Huang and Kevin Chang. 2024. Citation: A Key to Building Responsible and Accountable Large Language Models. InFindings of the Association for Computa- tional Linguistics: NAACL 2024, Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico City, Mexico, 464–473. doi:10.18653/v1/2024.findings-naacl.31
-
[21]
Jeff Huang, Ryen W. White, and Susan Dumais. 2011. No clicks, no problem: using cursor movements to understand and improve search. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI ’11). Association for Computing Machinery, New York, NY, USA, 1225–1234. doi:10.1145/1978942.1979125
-
[22]
Yanwei Huang and Arpit Narechania. 2026. WebSeek: Facilitating Proactive and Reactive Guidance for Decision Making on the Web. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3772318.3791945
-
[23]
Farnaz Jahanbakhsh and David R Karger. 2024. A Browser Extension for in-place Signaling and Assessment of Misinformation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 946, 21 pages. doi:10.1145/3613904.3642473
-
[24]
Anjali Khurana, Hariharan Subramonyam, and Parmit K Chilana. 2024. Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking. InProceedings of the 29th International Conference on Intelligent User Interfaces(Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New...
- [25]
-
[26]
Jane Li, Scott Huffman, and Akihito Tokuda. 2009. Good abandonment in mobile and PC internet search. InProceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval(Boston, MA, USA)(SIGIR ’09). Association for Computing Machinery, New York, NY, USA, 43–50. doi:10.1145/1571941.1571951
-
[27]
Jiachen Li, Elizabeth D Mynatt, Varun Mishra, and Jonathan Bell. 2025. ’Always Nice and Confident, Sometimes Wrong’: Developer’s Experiences Engaging Generative AI Chatbots Versus Human-Powered Q&A Platforms.Proceedings of the ACM on Human-Computer Interaction9, 2 (2025), 1–22. doi:10 .48550/ arXiv.2309.13684
-
[28]
Q Vera Liao and Jennifer Wortman Vaughan. 2023. Ai transparency in the age of llms: A human-centered research roadmap.arXiv preprint arXiv:2306.0194110 (2023). doi:10.48550/arXiv.2306.01941
-
[29]
Nelson Liu, Tianyi Zhang, and Percy Liang. 2023. Evaluating Verifiability in Generative Search Engines. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 7001–7025. doi:10 .18653/ v1/2023.findings-emnlp.467
work page 2023
-
[30]
Edoardo Loru, Jacopo Nudo, Niccolò Di Marco, Alessandro Santirocchi, Roberto Atzeni, Matteo Cinelli, Vincenzo Cestari, Clelia Rossi-Arnaud, and Walter Quat- trociocchi. 2025. The simulation of judgment in LLMs.Proceedings of the National Academy of Sciences122, 42 (2025), e2518443122. doi:10 .1073/pnas.2518443122 arXiv:https://www.pnas.org/doi/pdf/10.1073...
-
[31]
Luise Metzger, Linda Miller, Martin Baumann, and Johannes Kraus. 2024. Em- powering Calibrated (Dis-)Trust in Conversational Agents: A User Study on the Persuasive Power of Limitation Disclaimers vs. Authoritative Style. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Ma...
-
[32]
Alaa Mohasseb, Mohamed Bader-El-Den, and Mihaela Cocea. 2018. Question categorization and classification using grammar based approach.Information Processing & Management54, 6 (2018), 1228–1243. doi:10 .1016/j.ipm.2018.05.001
work page 2018
-
[33]
Arpit Narechania, Adam Coscia, Emily Wall, and Alex Endert. 2022. Lumos: Increasing Awareness of Analytic Behavior during Visual Data Analysis.IEEE Transactions on Visualization and Computer Graphics28, 1 (Jan. 2022), 1009–1018. doi:10.1109/TVCG.2021.3114827
-
[34]
Arpit Narechania, Alex Endert, and Atanu Sinha. 2025. Guidance Source Matters: How Guidance from AI, Expert, or a Group of Analysts Impacts Visual Data Preparation and Analysis.ACM IUI(2025). doi:10.1145/3708359.3712166
-
[35]
Arpit Narechania, Alex Endert, and Atanu R Sinha. 2025. Agentic Enterprise: AI-Centric User to User-Centric AI.arXiv preprint arXiv:2506.22893(2025). doi:10.48550/arXiv.2506.22893
-
[36]
2024.Designing, Developing, and Democratizing Guidance for Visual Analytics
Arpit Ajay Narechania. 2024.Designing, Developing, and Democratizing Guidance for Visual Analytics. Ph. D. Dissertation. Georgia Institute of Technology
work page 2024
-
[37]
2022.Overreliance on AI: Literature Review
Samir Passi and Mihaela Vorvoreanu. 2022.Overreliance on AI: Literature Review. Technical Report MSR-TR-2022-12. Microsoft. https://www.microsoft.com/en- us/research/publication/overreliance-on-ai-literature-review/
work page 2022
-
[38]
Ar Poorva Priyadarshini. 2024. The impact of user interface design on user engagement.International Journal of Engineering Research & Technology (IJERT) 13, 3 (2024). doi:10.48550/arXiv.2508.02740
-
[39]
Abdul Razaque, Salim Hariri, and Joon Yoo. 2025. Ai-Driven User Interface Design: Enhancing Digital Learning and Skill Development. (01 2025). doi:10 .2139/ ssrn.5114814
work page 2025
-
[40]
Russell, Chinmay Kulkarni, Elena L
Daniel M. Russell, Chinmay Kulkarni, Elena L. Glassman, Hariharan Subra- monyam, and Nikolas Martelaro. 2024. Human-Computer Interaction and AI: What Practitioners Need to Know to Design and Build Effective AI systems from a Human Perspective. InExtended Abstracts of the CHI Conference on Hu- man Factors in Computing Systems(Honolulu, HI, USA)(CHI EA ’24)...
-
[41]
Vera Schmitt, Isabel Bezzaoui, Charlott Jakob, Premtim Sahitaj, Qianli Wang, Arthur Hilbert, Max Upravitelev, Jonas Fegert, Sebastian Möller, and Veronika Solopova. 2025. Beyond Transparency: Evaluating Explainability in AI-Supported Fact-Checking. InProceedings of the 4th ACM International Workshop on Multime- dia AI against Disinformation (MAD’ 25). Ass...
- [42]
-
[43]
Nikhil Sharma, Q. Vera Liao, and Ziang Xiao. 2024. Generative Echo Chamber? Effect of LLM-Powered Search Systems on Diverse Information Seeking. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 1033, 17 pages. doi:10.1145/3613904.3642459
-
[44]
Sofia Eleni Spatharioti, David Rothschild, Daniel G Goldstein, and Jake M Hofman
-
[45]
InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25)
Effects of LLM-based Search on Decision Making: Speed, Accuracy, and Overreliance. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 1025, 15 pages. doi:10.1145/3706598.3714082
-
[46]
Ivan Srba, Olesya Razuvayevskaya, João A. Leite, Robert Moro, Ipek Baris Schlicht, Sara Tonelli, Francisco Moreno García, Santiago Barrio Lottmann, Denis Teyssou, Valentin Porcellini, Carolina Scarton, Kalina Bontcheva, and Maria Bielikova
-
[47]
A Survey on Automatic Credibility Assessment Using Textual Credibility Signals in the Era of Large Language Models.ACM Trans. Intell. Syst. Technol.17, 2, Article 26 (Jan. 2026), 80 pages. doi:10.1145/3770077
-
[48]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010
work page 2017
-
[49]
William H Walters and Esther Isabelle Wilder. 2023. Fabrication and errors in the bibliographic citations generated by ChatGPT.Scientific Reports13, 1 (2023), 14045. doi:10.1038/s41598-023-41032-5
-
[50]
Thomas Wolf. 2025. open-llm-leaderboard (Open LLM Leaderboard). https: //huggingface.co/open-llm-leaderboard
work page 2025
-
[51]
Vinzenz Wolf and Christian Maier. 2024. ChatGPT usage in everyday life: A motivation-theoretic mixed-methods study.International Journal of Information Management79 (2024), 102821. doi:10.1016/j.ijinfomgt.2024.102821
-
[52]
Brockman, Nasir Memon, and Sameer Patil
Waheeb Yaqub, Otari Kakhidze, Morgan L. Brockman, Nasir Memon, and Sameer Patil. 2020. Effects of Credibility Indicators on Social Media News Sharing Intent. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3313831.3376213
-
[53]
Enaam Youssef, Mervat Medhat, Soumaya Abdellatif, and Mahra Al Malek. 2024. Examining the effect of ChatGPT usage on students’ academic learning and achievement: A survey-based study in Ajman, UAE.Computers and Education: Artificial Intelligence7 (2024), 100316. doi:10.1016/j.caeai.2024.100316
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.