Overreliance in Writing Tasks: Exploring Similarity-Based Measures of AI Influence on Writing and Proposing a Reflective Writing Interface Intervention
Pith reviewed 2026-05-19 15:52 UTC · model grok-4.3
The pith
AI assistance is linked to greater reuse of its suggestions in users' final writing, and a reflective interface may increase awareness of that influence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a mixed-methods study, 47 participants completed writing tasks with or without generative AI assistance. Quantification of textual overlap showed that AI assistance was associated with patterns of suggestion reuse in the final writing. Analysis of participant reflections supported this pattern. A follow-up think-aloud study with a reflective writing interface (n=4) suggested that the interface can increase awareness of how AI outputs are incorporated and support more conscious engagement with the assistance.
What carries the argument
Similarity-based measures of textual overlap between AI suggestions and participants' final writing, serving as a proxy for measuring AI influence on the output.
If this is right
- AI assistance during writing tasks correlates with higher rates of reusing specific suggested phrases or passages.
- A reflective interface that highlights AI contributions can raise users' awareness of how those contributions appear in their work.
- Interface features prompting reflection may lead to more deliberate decisions about incorporating AI-generated material.
Where Pith is reading between the lines
- The overlap measurement approach could be extended into automated tools that flag potential AI influence for users in real time.
- Similar methods might apply to studying AI effects in other text-based creative tasks such as report drafting or content planning.
- The reflective interface design points toward broader interface strategies for supporting user agency when working with generative tools.
Load-bearing premise
That the amount of shared text between AI suggestions and a participant's final writing reliably indicates overreliance or influence rather than other reasons such as adopting good ideas or natural stylistic similarity.
What would settle it
A larger controlled study that finds no measurable difference in textual overlap between groups that did and did not receive AI suggestions during the same writing tasks.
Figures
read the original abstract
As generative AI (GenAI) systems become increasingly proficient at simulating human-like and well-reasoned text, users may attribute authority to AI outputs, shaping how they engage with writing and reasoning tasks. While prior work has raised concerns about AI overreliance, empirical approaches for observing this phenomenon during open-ended writing remain limited. In this paper, we examine how GenAI assistance influences users' interactions with AI suggestions during writing. We report results from a mixed-methods study in which 47 participants completed analysis and synthesis writing tasks with or without AI assistance. We quantify the textual overlap between AI suggestions and participants' writing and analyze participants' reflections. Our results show that AI assistance is associated with patterns of suggestion reuse. Building on these findings, we design and evaluate an interactive writing interface that may support reflection on the usage of the AI suggestions during writing. Evidence from a small follow-up think-aloud study (n = 4) suggests that the interface can increase users' awareness of how AI outputs are incorporated into their writing and may support more conscious engagement with AI assistance. Together, our findings contribute empirical methods for studying AI adoption in writing contexts and demonstrate how interface design can shape user-AI interaction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from a mixed-methods study with 47 participants completing analysis and synthesis writing tasks with or without GenAI assistance. It quantifies textual overlap between AI suggestions and final participant writing to show an association with patterns of suggestion reuse, analyzes participant reflections, and proposes a reflective writing interface evaluated via a small think-aloud study (n=4) suggesting increased awareness of AI incorporation.
Significance. If the textual overlap measure can be shown to capture AI-driven influence beyond baseline task-induced convergence or stylistic alignment, the work offers useful empirical methods for studying AI adoption in open-ended writing and illustrates how interface design might promote more conscious engagement. The mixed-methods approach and the concrete interface proposal are strengths that could inform tool development if measurement validity is strengthened.
major comments (3)
- [Methods] Methods: The paper provides no detail on the exact similarity metrics for quantifying textual overlap, statistical controls used, or exclusion criteria applied in the n=47 study; without these, the reported association between AI assistance and suggestion reuse cannot be fully evaluated for robustness.
- [Results] Results: The central claim that observed overlap indicates AI influence or overreliance lacks controls such as independent-generation baselines or human-coded distinctions between literal reuse, paraphrase, and conceptual adoption; the overlap could instead reflect task demands or idea convergence in the analysis/synthesis tasks.
- [Follow-up Study] Follow-up evaluation: The n=4 think-aloud study is presented as suggestive evidence that the reflective interface increases awareness, but the small sample and lack of quantitative outcome measures limit any generalizable claims about supporting conscious engagement with AI assistance.
minor comments (1)
- [Abstract] Clarify in the abstract and methods whether the unassisted condition involved any form of external reference material that could produce comparable overlap by chance.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback, which has helped us identify areas for clarification and improvement. We address each major comment below, indicating revisions where we can strengthen the manuscript without misrepresenting our work.
read point-by-point responses
-
Referee: [Methods] Methods: The paper provides no detail on the exact similarity metrics for quantifying textual overlap, statistical controls used, or exclusion criteria applied in the n=47 study; without these, the reported association between AI assistance and suggestion reuse cannot be fully evaluated for robustness.
Authors: We acknowledge the need for greater methodological transparency. The original manuscript described the overall approach to measuring textual overlap but omitted precise implementation details. In the revision, we will add a dedicated Methods subsection specifying the similarity metrics (cosine similarity over sentence embeddings combined with n-gram overlap), the statistical models (including regression controls for task type and participant variables), and exclusion criteria (e.g., incomplete tasks or technical failures). This will enable full evaluation of robustness. revision: yes
-
Referee: [Results] Results: The central claim that observed overlap indicates AI influence or overreliance lacks controls such as independent-generation baselines or human-coded distinctions between literal reuse, paraphrase, and conceptual adoption; the overlap could instead reflect task demands or idea convergence in the analysis/synthesis tasks.
Authors: We agree this is a substantive limitation in causal interpretation. Our design included a no-AI control condition showing significantly lower overlap, which we will highlight more explicitly as evidence against purely task-driven convergence. However, we did not collect independent-generation baselines or perform human coding of reuse types. We will revise the Results and Discussion to explicitly discuss these gaps as limitations, temper claims about 'overreliance,' and frame the overlap measure as one indicator supported by qualitative reflections rather than conclusive proof. We maintain the between-condition difference provides useful evidence of AI-specific patterns but will avoid overstatement. revision: partial
-
Referee: [Follow-up Study] Follow-up evaluation: The n=4 think-aloud study is presented as suggestive evidence that the reflective interface increases awareness, but the small sample and lack of quantitative outcome measures limit any generalizable claims about supporting conscious engagement with AI assistance.
Authors: We fully agree that the small sample and qualitative focus limit generalizability. The follow-up was explicitly positioned as an exploratory think-aloud evaluation to gather design insights, not a confirmatory test. In the revision, we will strengthen language to emphasize its preliminary, suggestive nature, explicitly note the absence of quantitative measures, and outline directions for future larger-scale studies with controlled quantitative outcomes. No overgeneralized claims will remain. revision: yes
Circularity Check
No significant circularity in empirical mixed-methods study
full rationale
This is an empirical mixed-methods study reporting results from 47 participants in assisted vs. unassisted writing tasks plus a small follow-up think-aloud study. The central observations concern measured textual overlap between AI suggestions and final writing, plus participant reflections on an interface intervention. No mathematical derivations, equations, fitted parameters, or self-citation chains appear in the provided text that would reduce any claim to its inputs by construction. The overlap measure is presented as an empirical proxy for suggestion reuse rather than a self-definitional or tautological restatement. The study is self-contained against external benchmarks (participant data and reflections) and does not invoke uniqueness theorems or ansatzes from prior self-work. This is the expected honest non-finding for an observational HCI paper without derivation chains.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Textual overlap metrics reliably capture the degree of AI suggestion reuse and influence during writing.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We quantify the textual overlap between AI suggestions and participants' writing and analyze participants' reflections. Our results show that AI assistance is associated with patterns of suggestion reuse.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Odeyinka Abiola, Adebayo Abayomi-Alli, Oluwasefunmi Arogundade Tale, Sanjay Misra, and Olusola Abayomi-Alli. 2023. Sentiment analysis of COVID-19 tweets from selected hashtags in Nigeria using VADER and Text Blob analyser.Journal of Electrical Systems and Information Technology10, 1 (2023), 5
work page 2023
-
[2]
Matheel Al-Rawas, Omar Qader, Nurul Othman, Noor Ismail, Rosnani Mamat, Mohamad Syahrizal Halim, Johari Abdullah, and Tahir Noorani. 2025. Identification of dental related ChatGPT generated abstracts by senior and young academicians versus artificial intelligence detectors and a similarity detector.Scientific Reports15 (04 2025). doi:10.1038/s41598-025-95387-y
-
[3]
Hussam Alkaissi and Samy Mcfarlane. 2023. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.Cureus15 (02 2023). doi:10.7759/cureus.35179
-
[4]
Garrett Allen, Mike Beijen, David Maxwell, and Ujwal Gadiraju. 2023. In a Hurry: How Time Constraints and the Presentation of Web Search Results Affect User Behaviour and Experience. InInternational Conference on Web Engineering. Springer, 221–235
work page 2023
-
[5]
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Mac...
-
[6]
1956.Taxonomy of educational objectives: The classification of educational goals
Benjamin S Bloom, Max D Engelhart, Edward J Furst, Walker H Hill, David R Krathwohl, et al. 1956.Taxonomy of educational objectives: The classification of educational goals. Handbook 1: Cognitive domain. Longman New York
work page 1956
-
[7]
Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proceedings of the ACM on Human-Computer Interaction5, CSCW1 (April 2021), 1–21. doi:10.1145/3449287
work page internal anchor Pith review doi:10.1145/3449287 2021
- [8]
-
[9]
Xinyue Chen, Kunlin Ruan, Kexin Phyllis Ju, Nathan Yap, and Xu Wang. 2025. More AI Assistance Reduces Cognitive Engagement: Examining the AI Assistance Dilemma in AI-Supported Note-Taking.Proceedings of the ACM on Human-Computer Interaction9, 7 (Oct. 2025), 1–29. doi:10.1145/3757632
-
[10]
Valdemar Danry, Pat Pataranutaporn, Matthew Groh, and Ziv Epstein. 2025. Deceptive Explanations by Large Language Models Lead People to Change their Beliefs About Misinformation More Often than Honest Explanations. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, U...
-
[11]
Sander de Jong, Ville Paananen, Benjamin Tag, and Niels van Berkel. 2025. Cognitive Forcing for Better Decision-Making: Reducing Overreliance on AI Systems Through Partial Explanations.Proc. ACM Hum.-Comput. Interact.9, 2, Article CSCW048 (May 2025), 30 pages. doi:10.1145/3710946 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Welzel and Vincent
-
[12]
Upol Ehsan and Mark O Riedl. 2020. Human-centered explainable ai: Towards a reflective sociotechnical approach. InInternational conference on human-computer interaction. Springer, 449–466
work page 2020
-
[13]
Liye Fu, Benjamin Newman, Maurice Jakesch, and Sarah Kreps. 2023. Comparing Sentence-Level Suggestions to Message-Level Suggestions in AI-Mediated Communication. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 103, 13 pages. doi:10.11...
-
[14]
Darren Gergle and Desney S Tan. 2014. Experimental research in HCI. InWays of Knowing in HCI. Springer, 191–227
work page 2014
-
[15]
Ella Glikson and Omri Asscher. 2022. AI-mediated apology in a multilingual work context: Implications for perceived authenticity and willingness to forgive.Computers in Human Behavior140 (11 2022), 107592. doi:10.1016/j.chb.2022.107592
-
[16]
S Goldwasser, S Micali, and C Rackoff. 1985. The knowledge complexity of interactive proof-systems. InProceedings of the Seventeenth Annual ACM Symposium on Theory of Computing(Providence, Rhode Island, USA)(STOC ’85). Association for Computing Machinery, New York, NY, USA, 291–304. doi:10.1145/22145.22178
-
[17]
A decision theoretic framework for measuring AI reliance
Ziyang Guo, Yifan Wu, Jason D. Hartline, and Jessica Hullman. 2024. A Decision Theoretic Framework for Measuring AI Reliance. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY, USA, 221–236. doi:10.1145/3630106.3658901
-
[18]
Hadassah Harland, Richard Dazeley, Hashini Senaratne, Peter Vamplew, Francisco Cruz, and Bahareh Nakisa. 2025. AI apology: a critical review of apology in AI systems.Artificial Intelligence Review58, 12 (2025), 369
work page 2025
-
[19]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. InAdvances in psychology. Vol. 52. Elsevier, 139–183
work page 1988
-
[20]
O. Henry. 1906. After Twenty Years. InThe Four Million. McClure, Phillips & Co., New York. Originally published in 1906; short story
work page 1906
-
[21]
Emily Sein Yue Elim Hui. 2025. Incorporating Bloom’s taxonomy into promoting cognitive thinking mechanism in artificial intelligence-supported learning environments.Interactive Learning Environments33, 2 (2025), 1087–1100. arXiv:https://doi.org/10.1080/10494820.2024.2364237 doi:10.1080/10494820.2024.2364237
-
[22]
Paul Jaccard. 1901. Etude comparative de la distribution florale dans une portion des Alpes et des Jura.Bulletin de la Societe Vaudoise des Sciences Naturelles37 (1901), 547–579
work page 1901
-
[23]
Daniel Kahneman. 2011.Thinking, fast and slow. Farrar, Straus and Giroux, New York. https://www.amazon.de/Thinking-Fast-Slow- Daniel-Kahneman/dp/0374275637/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&colid=151193SNGKJT9&coliid=I3OCESLZCVDFL7
-
[24]
Ece Kamar. 2016. Directions in hybrid intelligence: complementing AI systems with human intelligence. InProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence(New York, New York, USA)(IJCAI’16). AAAI Press, 4070–4073
work page 2016
-
[25]
Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1(Valencia, Spain)(AAMAS ’12). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 467–474
work page 2012
-
[26]
Sunnie S. Y. Kim, Q. Vera Liao, Mihaela Vorvoreanu, Stephanie Ballard, and Jennifer Wortman Vaughan. 2024. "I’m Not Sure, But... ": Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Asso...
-
[27]
Sunnie S. Y. Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, and Olga Russakovsky. 2025. Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). ACM, 1–19. doi:10.1145/3706598.3714020
-
[28]
Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. 2025. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv:2506.08872 [cs.AI] https://arxiv.org/abs/2506.08872
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CH...
-
[30]
Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. InCHI Conference on Human Factors in Computing Systems (CHI ’22). ACM, 1–19. doi:10.1145/3491102.3502030
-
[31]
Steven Loria and contributors. 2026. TextBlob Documentation (Release 0.19.0). Read the Docs. Accessed 2026-01-06
work page 2026
-
[32]
Hancock, Mor Naaman, Malte Jung, and Jess Hohenstein
Hannah Mieczkowski, Jeffrey T. Hancock, Mor Naaman, Malte Jung, and Jess Hohenstein. 2021. AI-Mediated Communication: Language Use and Interpersonal Effects in a Referential Communication Task.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 17 (April 2021), 14 pages. doi:10.1145/3449091
-
[33]
Mohsin Murtaza, Chi-Tsun Cheng, Bader Albahlal, Muhana Muslam, and Mansoor Raza. 2025. The impact of LLM chatbots on learning outcomes in advanced driver assistance systems education.Scientific Reports15 (03 2025). doi:10.1038/s41598-025-91330-3
-
[34]
David Navon and Daniel Gopher. 1979. On the economy of the human-processing system.Psychological review86, 3 (1979), 214. Overreliance in Writing Tasks FAccT ’26, June 25–28, 2026, Montreal, QC, Canada
work page 1979
-
[35]
Abdul Wahab Qurashi, Violeta Holmes, and Anju P Johnson. 2020. Document processing: Methods for semantic text similarity analysis. In2020 international conference on INnovations in Intelligent SysTems and Applications (INISTA). IEEE, 1–6
work page 2020
-
[36]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 3982–3992
work page 2019
-
[37]
Jenna Russell, Marzena Karpinska, and Mohit Iyyer. 2025. People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 5342–5373
work page 2025
- [38]
-
[39]
Kristen Sussman and Daniel Carter. 2025. Detecting Effects of AI-Mediated Communication on Language Complexity and Sentiment. In Companion Proceedings of the ACM on Web Conference 2025(Sydney NSW, Australia)(WWW ’25). Association for Computing Machinery, New York, NY, USA, 2689–2693. doi:10.1145/3701716.3717543
-
[40]
Ningzhi Tang, Meng Chen, Zheng Ning, Aakash Bansal, Yu Huang, Collin McMillan, and Toby Jia-Jun Li. 2023. An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code. (3 2023). doi:10.1184/R1/22223533.v1
- [41]
-
[42]
Michael Tomasello, Malinda Carpenter, Josep Call, Tanya Behne, and Henrike Moll. 2005. Understanding and Sharing Intentions: The Origins of Cultural Cognition.Behavioral and Brain Sciences28 (11 2005), 675–735. doi:10.1017/S0140525X05000129
-
[43]
K Vani and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In2015 international conference on advances in computing, communications and informatics (ICACCI). IEEE, 1578–1584
work page 2015
-
[44]
Helena Vasconcelos, Matthew Jörke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael S Bernstein, and Ranjay Krishna. 2023. Explanations can reduce overreliance on ai systems during decision-making.Proceedings of the ACM on Human-Computer Interaction7, CSCW1 (2023), 1–38
work page 2023
- [45]
-
[46]
Zachary Wojtowicz and Simon DeDeo. 2025. Undermining mental proof: how AI can make cooperation harder by making thinking easier. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intel...
-
[47]
Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: Story Writing With Large Language Models. InProceedings of the 27th International Conference on Intelligent User Interfaces(Helsinki, Finland)(IUI ’22). Association for Computing Machinery, New York, NY, USA, 841–852. doi:10.1145/3490099.3511105
-
[48]
Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* ’20). ACM, 295–305. doi:10.1145/3351095.3372852
-
[49]
came a thousand miles to stand here tonight
Qingjuan Zhao, Jianwei Niu, and Xuefeng Liu. 2022. ALS-MRS: Incorporating aspect-level sentiment for abstractive multi-review summarization.Knowledge-Based Systems258 (2022), 109942. doi:10.1016/j.knosys.2022.109942 A Task Prompts and AI Suggestions Task A - Analysis Prompt.Evaluate Bob’s decision to wait at the old restaurant site for twenty years. Judge...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.