pith. sign in

arxiv: 2604.19429 · v1 · submitted 2026-04-21 · 💻 cs.HC · cs.CY

Discerning Authorship in Online Health Communities: Experience, Trust, and Transparency Implications for Moderating AI

Pith reviewed 2026-05-10 01:32 UTC · model grok-4.3

classification 💻 cs.HC cs.CY
keywords online health communitiesAI authorship detectionLLM generated advicetransparency and trustcommunity moderationhealth advice evaluationuser experimentAI in health forums
0
0 comments X

The pith

People show little ability to distinguish AI-generated health advice from human-written advice, with the health condition shaping judgments more than experience or training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines whether users in online health communities can reliably tell if advice comes from a human or from a large language model. It reports an experiment that tested this ability across two health conditions while varying participants' lived experience, AI-recognition training, and attitudes toward transparency. The central result is minimal evidence of successful authorship detection overall, though accuracy and confidence shifted consistently with the health topic itself. Qualitative data revealed that participants leaned on unreliable signals such as perceived detail or tone, producing flawed heuristic judgments. A sympathetic reader would care because health communities depend on trust, and undetected AI content could quietly change how advice is evaluated and acted upon.

Core claim

In an online experiment, participants displayed little capacity to correctly determine whether health advice was authored by a human or generated by an AI. This held true regardless of their personal experience with the health condition, any training they received on recognizing AI text, or their general attitudes toward AI transparency and trust. A reliable difference emerged based on the health condition under discussion. Analysis of open responses showed that people applied flawed heuristics relying on signals that did not reliably indicate the true source.

What carries the argument

An online experiment that asks participants to classify health advice as AI-generated or human-written while measuring effects of health condition and user attributes.

If this is right

  • Transparency about AI use is required to sustain trust in online health communities.
  • Self-moderation of LLM advice must accommodate topic-specific differences in how users evaluate content.
  • Unreliable signals currently lead to mistaken judgments about advice origins.
  • Better design of detection aids could strengthen community-based moderation of AI content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms may need mandatory disclosure rules because users cannot be expected to detect AI on their own.
  • Similar detection problems are likely in other advice domains where source matters, such as legal or financial discussions.
  • Testing whether explicit source labels change how much users trust or follow the advice would extend these results.

Load-bearing premise

The advice examples used in the study and the group of participants reflect typical real-world online health discussions and the quality of current AI-generated text.

What would settle it

A replication experiment using more recent language models or a broader participant sample that finds most people correctly identify AI authorship above chance levels would undermine the claim of little evidence for discernment ability.

Figures

Figures reproduced from arXiv: 2604.19429 by Agnieszka Kitkowska, Mark Warner, Yefim Shulman.

Figure 1
Figure 1. Figure 1: Example of an embedded text-based safeguard (“I am not a doctor...”) advising users that the model is not a medical professional. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Research model Aligned with these theories, prior work [20] suggests people develop trust in online health communities through the evaluation of the credibility of information within individual posts and the longer-term evaluation of both the content of posts, and the sources of those posts (i.e., the posters). When evaluating information credibility, Fan et al. [20] identified heuristic cues (e.g., gramma… view at source ↗
Figure 3
Figure 3. Figure 3: Example of annotated Training advice shown to participants in the Training condition. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The study flow. screening of Reddit’s r/AskDocs resulted in these two topics being discussed frequently on the platform, enabling a larger poll for a selection of questions and advice. Lastly, the two conditions might be perceived as opposing — diabetes is commonly perceived as a long-term condition that requires permanent treatment, while back pain might be perceived as a temporary condition that does not… view at source ↗
Figure 5
Figure 5. Figure 5: The task interface shown to participants to illicit their assessment of the advice, and their qualitative reasoning. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Means of the Correctness-AI and Correctness-Human scores per Lived experience per Health condition. Error bars 95% CI. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
read the original abstract

For online health communities, community trust is paramount. Yet, advances in Large Language Models (LLMs) generating advice may erode this trust, especially if users cannot identify whether LLMs have been used. We investigate the feasibility of community-based detection of health advice authorship and how self-moderation of LLMs could help enhance advice utilization. In an online experiment, we evaluate people's ability to distinguish AI-generated from human-written advice across two health conditions, considering lived experience with a condition, AI-recognition training, and user attitudes towards transparency and trust around AI use. Our results indicate the need for transparency coupled with trust. We find little evidence of people's ability to discern advice authorship. However, we find a consistent effect of the health condition. Our qualitative findings identify unreliable signals, resulting in flawed heuristic evaluations of the advice. Our findings point to opportunities to improve the self-moderation of LLM-based AI and aid community-based AI moderation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript reports results from an online experiment testing whether participants can distinguish AI-generated from human-written health advice in online health communities (OHCs) across two conditions. It examines moderators including lived experience with the condition, AI-recognition training, and attitudes toward transparency/trust. The central claims are that there is little evidence of authorship discernment ability, a consistent effect of health condition, and that qualitative analysis reveals unreliable signals and flawed heuristics; the authors conclude that transparency mechanisms are needed to support self-moderation of LLM use.

Significance. If the results hold under representative conditions, the work would indicate that community self-moderation of AI-generated health advice is likely to fail because users cannot reliably detect authorship. This has direct implications for trust erosion in OHCs and for HCI design of transparency features. The health-condition effect and qualitative heuristics provide concrete starting points for interventions, though the overall contribution is tempered by the need for stronger methodological documentation.

major comments (2)
  1. Abstract: The abstract reports directional findings but provides no sample size, statistical tests, effect sizes, or exclusion criteria, making it impossible to verify whether the data support the central claim of little discernment ability.
  2. Methods/Experimental Design: The generation process for the AI advice samples (prompts, model, quality controls) and the participant recruitment pool (e.g., MTurk vs. active OHC members) are not described in sufficient detail to evaluate whether they match real-world OHC interactions and current LLM output quality; this directly affects the generalizability of the null discernment result and the health-condition effect.
minor comments (1)
  1. Abstract: The phrasing 'we find little evidence' could be replaced with a more precise statement once the statistical results (e.g., accuracy rates, p-values) are added.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for improving clarity and transparency. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: Abstract: The abstract reports directional findings but provides no sample size, statistical tests, effect sizes, or exclusion criteria, making it impossible to verify whether the data support the central claim of little discernment ability.

    Authors: We agree that the abstract would be strengthened by including these quantitative details to allow readers to directly evaluate the evidence. In the revised manuscript, we will update the abstract to report the sample size, the statistical tests performed (along with key results and effect sizes), and the exclusion criteria applied during data analysis. This change will make the support for the limited discernment finding more verifiable without altering the core claims. revision: yes

  2. Referee: Methods/Experimental Design: The generation process for the AI advice samples (prompts, model, quality controls) and the participant recruitment pool (e.g., MTurk vs. active OHC members) are not described in sufficient detail to evaluate whether they match real-world OHC interactions and current LLM output quality; this directly affects the generalizability of the null discernment result and the health-condition effect.

    Authors: We acknowledge that greater methodological detail is needed to support assessment of generalizability. The submitted methods section outlines the LLM used and the online recruitment approach, but we will expand it in revision to include explicit descriptions of prompt construction (drawing from real OHC examples), the specific model version and parameters, quality control steps such as manual verification of outputs, and participant screening criteria (including any checks for OHC familiarity). These additions will better demonstrate alignment with real-world conditions while preserving the original experimental design. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical experiment

full rationale

The paper reports results from an online experiment in which participants judged the authorship of health advice samples across two conditions. No equations, derivations, fitted parameters, or predictive models appear in the manuscript. Central claims rest on direct statistical comparisons of participant accuracy and qualitative coding of open responses. Any self-citations are peripheral and not invoked to justify uniqueness theorems or to close a derivation loop. The study therefore contains no load-bearing steps that reduce by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Empirical study relying on standard assumptions about participant behavior and advice representativeness rather than new postulates.

axioms (2)
  • domain assumption The AI-generated advice used in the experiment is representative of typical LLM output in health domains.
    Invoked implicitly in the experiment design described in the abstract.
  • domain assumption Participant responses in the online experiment reflect real-world judgment processes in health communities.
    Required for generalizing the 'little evidence of discernment' finding.

pith-pipeline@v0.9.0 · 5468 in / 1141 out tokens · 26917 ms · 2026-05-10T01:32:38.279257+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages

  1. [1]

    Hazrat Ali, Junaid Qadir, Zubair Shah, Tanvir Alam, and Mowafa Househ. 2023. ChatGPT and Large Language Models (LLMs) in Healthcare: Opportunities and Risks.Authorea Preprints(2023)

  2. [2]

    Versus Arthritis. [n. d.]. THE STATE OF MUSCULOSKELETAL HEALTH 2024

  3. [3]

    Alexei A Birkun and Adhish Gautam. 2023. Large language model-based chatbot as a source of advice on first aid in heart attack.Current Problems in Cardiology(2023), 102048

  4. [4]

    Dawn Branley-Bell, Richard Brown, Lynne Coventry, and Elizabeth Sillence. 2023. Chatbots for embarrassing and stigmatizing conditions: could chatbots encourage users to seek medical advice?Frontiers in Communication8 (2023), 1275127

  5. [5]

    Sergi D Bray, Shane D Johnson, and Bennett Kleinberg. 2023. Testing human ability to detect ‘deepfake’images of human faces.Journal of Cybersecurity9, 1 (2023), tyad011

  6. [6]

    Tricky to get your head around

    Eleanor R Burgess, Madhu C Reddy, Andrew Davenport, Paul Laboi, and Ann Blandford. 2019. " Tricky to get your head around" Information Work of People Managing Chronic Kidney Disease in the UK. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–17

  7. [7]

    Adrian Bussone, Simone Stumpf, and Stephanie Wilson. 2017. The use of online forums by people living with HIV for help in understanding personal health information.International journal of medical informatics108 (2017), 64–70

  8. [8]

    transparency

    Cambridge English Dictionary. 2025.“transparency”, n.Cambridge University Press & Assessment. https://dictionary.cambridge.org/dictionary/ english/transparency

  9. [9]

    Shaan Chopra, Rachael Zehrung, Tamil Arasu Shanmugam, and Eun Kyoung Choe. 2021. Living with uncertainty and stigma: self-experimentation and support-seeking around polycystic ovary syndrome. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–18

  10. [10]

    Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, and Noah A Smith. 2021. All That’s ‘Human’Is Not Gold: Evaluating Human Evaluation of Generated Text. (2021), 7282–7296

  11. [11]

    David C DeAndrea. 2014. Advancing warranting theory.Communication Theory24, 2 (2014), 186–204

  12. [12]

    Amy L Delaney and Erin D Basinger. 2021. Uncertainty and support-seeking in US-based online diabetes forums.Journal of Applied Communication Research49, 3 (2021), 305–324

  13. [13]

    Judith Donath. 2007. Signals, cues and meaning.Signals, Truth and Design(2007)

  14. [14]

    Judith Donath. 2007. Signals in social supernets.Journal of computer-mediated communication13, 1 (2007), 231–251

  15. [15]

    Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A Smith, and Yejin Choi. 2021. Is GPT-3 text indistinguishable from human text? SCARECROW: A framework for scrutinizing machine text.arXiv preprint arXiv:2107.01294(2021)

  16. [16]

    Mohan Dutta-Bergman. 2003. Trusted Online Sources of Health Information: Differences in Demographics, Health Beliefs, and Health-Information Orientation.J Med Internet Res5, 3 (25 Sep 2003), e21. doi:10.2196/jmir.5.3.e21

  17. [17]

    Florin Eggmann, Roland Weiger, Nicola U Zitzmann, and Markus B Blatz. 2023. Implications of large language models such as ChatGPT for dental medicine.Journal of Esthetic and Restorative Dentistry(2023). 24 Yefim Shulman, Agnieszka Kitkowska, and Mark Warner

  18. [18]

    Nicole Ellison, Rebecca Heino, and Jennifer Gibbs. 2006. Managing impressions online: Self-presentation processes in the online dating environment. Journal of computer-mediated communication11, 2 (2006), 415–441

  19. [19]

    Evans and Keith E Stanovich

    Jonathan St B.T. Evans and Keith E Stanovich. 2013. Dual-Process Theories of Higher Cognition: Advancing the Debate.Perspectives on Psychological Science8 (2013), 223–241. Issue 3. doi:10.1177/1745691612460685

  20. [20]

    Hanmei Fan, Reeva Lederman, Stephen P Smith, and Shanton Chang. 2014. How trust is formed in online health communities: a process perspective. Communications of the Association for Information Systems34, 1 (2014), 28

  21. [21]

    Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences.Behavior Research Methods39, 2 (01 May 2007), 175–191. doi:10.3758/BF03193146

  22. [22]

    2018.Imagining deceptive Deepfakes: an ethnographic exploration of fake videos

    Tormod Dag Fikse. 2018.Imagining deceptive Deepfakes: an ethnographic exploration of fake videos. Master’s thesis

  23. [23]

    Dilrukshi Gamage, Piyush Ghasiya, Vamshi Bonagiri, Mark E Whiting, and Kazutoshi Sasahara. 2022. Are deepfakes concerning? analyzing conversations of deepfakes on reddit and exploring societal implications. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19

  24. [24]

    Vahid Ghafouri, Vibhor Agarwal, Yong Zhang, Nishanth Sastry, Jose Such, and Guillermo Suarez-Tangil. 2023. AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 556–565

  25. [25]

    Gerd Gigerenzer and Wolfgang Gaissmaier. 2011. Heuristic Decision Making.Annual Review of Psychology62 (2011), 451–482. Issue 1. doi:10.1146/ annurev-psych-120709-145346

  26. [26]

    Gerd Gigerenzer and Wolfgang Gaissmaier. 2015. Decision making: Nonrational theories. InInternational encyclopedia of the social & behavioral sciences. Elsevier, 911–916

  27. [27]

    Omri Gillath, Ting Ai, Michael S Branicky, Shawn Keshmiri, Robert B Davison, and Ryan Spaulding. 2021. Attachment and trust in artificial intelligence.Computers in Human Behavior115 (2021), 106607

  28. [28]

    Nicole Gillespie, Steven Lockey, Caitlin Curtis, Javad Pool, and Ali Akbari. 2023. Trust in Artificial Intelligence: A global study. doi:10.14264/00d3c94

  29. [29]

    Julia Gomula, Mark Warner, and Ann Blandford. 2024. Women’s use of online health and social media resources to make sense of their polycystic ovary syndrome (PCOS) diagnosis: a qualitative study.BMC Women’s Health24, 1 (2024), 157

  30. [30]

    Matthew Groh, Ziv Epstein, Chaz Firestone, and Rosalind Picard. 2022. Deepfake detection by human crowds, machines, and machine-informed crowds.Proceedings of the National Academy of Sciences119, 1 (2022), e2110013119

  31. [31]

    Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, and Ben Y Zhao. 2024. Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?arXiv preprint arXiv:2402.03214(2024)

  32. [32]

    Jeffrey T Hancock, Mor Naaman, and Karen Levy. 2020. AI-mediated communication: Definition, research agenda, and ethical considerations. Journal of Computer-Mediated Communication25, 1 (2020), 89–100

  33. [33]

    Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, and Hsin-Min Wang. 2024. Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes.arXiv preprint arXiv:2405.04097(2024)

  34. [34]

    Claudia E Haupt and Mason Marks. 2023. AI-generated medical advice—GPT and beyond.Jama329, 16 (2023), 1349–1350

  35. [35]

    High-Level Expert Group on Artificial Intelligence set up by the European Union. 2019. Ethics Guidelines for Trustworthy AI. https://digital- strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

  36. [36]

    Alex Howard, William Hope, and Alessandro Gerada. 2023. ChatGPT and antimicrobial advice: the end of the consulting infection doctor?The Lancet Infectious Diseases23, 4 (2023), 405–406

  37. [37]

    Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck. 2020. Automatic Detection of Generated Text is Easiest when Humans are Fooled. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1808–1822. doi:10.18653/v1/2020.acl-main.164

  38. [38]

    Maurice Jakesch, Megan French, Xiao Ma, Jeffrey T Hancock, and Mor Naaman. 2019. AI-mediated communication: How the perception that profile text was written by AI affects trustworthiness. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13

  39. [39]

    Maurice Jakesch, Jeffrey T Hancock, and Mor Naaman. 2023. Human heuristics for AI-generated language are flawed.Proceedings of the National Academy of Sciences120, 11 (2023), e2208839120

  40. [40]

    Johnston, James L

    Allen C. Johnston, James L. Worrell, Paul M.Di Gangi, and Molly Wasko. 2013. Online health communities: An assessment of the influence of participation on patient empowerment outcomes.Information Technology and People26 (5 2013), 213–235. Issue 2. doi:10.1108/ITP-02-2013-0040

  41. [41]

    Daniel Kahneman. 2003. A Perspective on Judgment and Choice.American Psychologist3 (2003), 7–18. Issue 4. doi:10.1037/0003-066X.58.9.697

  42. [42]

    Kizilcec

    René F. Kizilcec. 2016. How Much Information? Effects of Transparency on Trust in an Algorithmic Interface. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 2390–2395. doi:10.1145/2858036.2858402

  43. [43]

    D Knebel, S Priglinger, N Scherer, J Siedlecki, and B Schworm. 2023. Assessment of ChatGPT in the preclinical management of ophthalmological emergencies-an analysis of ten fictional case vignettes. (2023)

  44. [44]

    Nils Köbis and Luca D Mossink. 2021. Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry.Computers in human behavior114 (2021), 106553

  45. [45]

    Nils C Köbis, Barbora Doležalová, and Ivan Soraperra. 2021. Fooled twice: People cannot detect deepfakes but think they can.Iscience24, 11 (2021). Discerning Authorship in Online Health Communities 25

  46. [46]

    Sarah Kreps, R Miles McCain, and Miles Brundage. 2022. All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. Journal of experimental political science9, 1 (2022), 104–117

  47. [47]

    Briege M Lagan, Marlene Sinclair, and W George Kernohan. 2011. What is the impact of the internet on decision-making in pregnancy? A global study.Birth38, 4 (2011), 336–345

  48. [48]

    Lim, Anind K

    Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and Why Not Explanations Improve the Intelligibility of Context-Aware Intelligent Systems. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Boston, MA, USA)(CHI ’09). Association for Computing Machinery, New York, NY, USA, 2119–2128. doi:10.1145/1518701.1519023

  49. [49]

    Sue Lim and Ralf Schmälzle. 2024. The effect of source disclosure on evaluation of AI-generated messages.Computers in Human Behavior: Artificial Humans2, 1 (2024), 100058

  50. [50]

    Tao Jennifer Ma and David Atkin. 2017. User generated content and credibility evaluation of online health information: a meta analytic study. Telematics and Informatics34, 5 (2017), 472–486

  51. [51]

    Kimberly T Mai, Sergi Bray, Toby Davies, and Lewis D Griffin. 2023. Warning: humans cannot reliably detect speech deepfakes.Plos one18, 8 (2023), e0285333

  52. [52]

    The Headline Was So Wild That I Had To Check

    Lisa Mekioussa Malki, Dilisha Patel, and Aneesha Singh. 2024. " The Headline Was So Wild That I Had To Check": An Exploration of Women’s Encounters With Health Misinformation on Social Media.Proceedings of the ACM on Human-Computer Interaction8, CSCW1 (2024), 1–26

  53. [53]

    Lena Mamykina, Drashko Nakikj, and Noemie Elhadad. 2015. Collective sensemaking in online health forums. InProceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 3217–3226

  54. [54]

    David M Markowitz, Jeffrey T Hancock, and Jeremy N Bailenson. 2023. Linguistic Markers of Inherently False AI Communication and Intentionally False Human Communication: Evidence From Hotel Reviews.Journal of Language and Social Psychology(2023), 0261927X231200201

  55. [55]

    Roger C Mayer, James H Davis, and F David Schoorman. 1995. An Integrative Model of Organizational Trust. 709-734 pages. Issue 3. https: //www.jstor.org/stable/258792?seq=1&cid=pdf-

  56. [56]

    Ergonomics , author =

    Bonnie M. Muir and Neville Moray. 1996. Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation.Ergonomics39, 3 (1996), 429–460. doi:10.1080/00140139608964474 arXiv:https://doi.org/10.1080/00140139608964474

  57. [57]

    Nicolas M Müller, Karla Pizzi, and Jennifer Williams. 2022. Human perception of audio deepfakes. InProceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia. 85–91

  58. [58]

    Oded Nov, Nina Singh, and Devin Mann. 2023. Putting ChatGPT’s medical advice to the (Turing) test: survey study.JMIR Medical Education9 (2023), e46939

  59. [59]

    2022.Trust in government, UK: 2022

    Office for National Statistics. 2022.Trust in government, UK: 2022. https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/bulletins/ trustingovernmentuk/2022

  60. [60]

    Aisling Ann O’Kane, Sun Young Park, Helena Mentis, Ann Blandford, and Yunan Chen. 2016. Turning to peers: integrating understanding of the self, the condition, and others’ experiences in making sense of complex chronic conditions.Computer Supported Cooperative Work (CSCW)25 (2016), 477–501

  61. [61]

    Reza Arkan Partadiredja, Carlos Entrena Serrano, and Davor Ljubenkov. 2020. AI or Human: The Socio-ethical Implications of AI-Generated Media Content. 1–6. doi:10.1109/CMI51275.2020.9322673

  62. [62]

    I feel like only half a man

    Dilisha Patel, Ann Blandford, Mark Warner, Jill Shawe, and Judith Stephenson. 2019. " I feel like only half a man" Online Forums as a Resource for Finding a" New Normal" for Men Experiencing Fertility Issues.Proceedings of the ACM on Human-computer Interaction3, CSCW (2019), 1–20

  63. [63]

    Dilisha Patel, Sachin Pendse, Munmun De Choudhury, Sarah Dsane, Kaylee Payne Kruzan, Neha Kumar, Aneesha Singh, and Mark Warner. 2022. Information-Seeking, Finding Identity: Exploring the Role of Online Health Information in Illness Experience. InCompanion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social Computing. 263–266

  64. [64]

    Sharoda A Paul and Madhu C Reddy. 2010. Understanding together: sensemaking in collaborative information seeking. InProceedings of the 2010 ACM conference on Computer supported cooperative work. 321–330

  65. [65]

    Margie Ruffin, Gang Wang, and Kirill Levchenko. 2023. Explaining Why Fake Photos are Fake: Does It Work?Proceedings of the ACM on Human-Computer Interaction7, GROUP (2023), 1–22

  66. [66]

    Rupert, Rebecca R

    Douglas J. Rupert, Rebecca R. Moultrie, Jennifer Gard Read, Jacqueline B. Amoozegar, Alexandra S. Bornkessel, Amie C. O’Donoghue, and Helen W. Sullivan. 2014. Perceived healthcare provider reactions to patient and caregiver use of online health communities.Patient Education and Counseling 96, 3 (2014), 320–326. doi:10.1016/j.pec.2014.05.015 Communication ...

  67. [67]

    Tobias Schimanski, Jingwei Ni, Mathias Kraus, Elliott Ash, and Markus Leippold. 2024. Towards faithful and robust llm specialists for evidence-based question-answering.arXiv preprint arXiv:2402.08277(2024)

  68. [68]

    Anuschka Schmitt, Thiemo Wambsganss, Matthias Söllner, and Andreas Janson. 2021. Towards a trust reliance paradox? exploring the gap between perceived trust in and reliance on algorithmic advice. InInternational Conference on Information Systems (ICIS), Vol. 1. 1–17

  69. [69]

    Xinyue Shen, Zeyuan Chen, Michael Backes, and Yang Zhang. 2023. In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT. (4 2023). http://arxiv.org/abs/2304.08979

  70. [70]

    Donghee Shin, Joon Soo Lim, Norita Ahmad, and Mohammed Ibahrine. 2022. Understanding user sensemaking in fairness and transparency in algorithms: algorithmic sensemaking in over-the-top platform.AI & SOCIETY(2022), 1–14

  71. [71]

    Elizabeth Sillence, Pam Briggs, Peter Richard Harris, and Lesley Fishwick. 2007. How do patients evaluate and make use of online health information? Social science & medicine64, 9 (2007), 1853–1862. 26 Yefim Shulman, Agnieszka Kitkowska, and Mark Warner

  72. [72]

    Lu Sun, Stone Tao, Junjie Hu, and Steven P Dow. 2024. MetaWriter: Exploring the Potential and Perils of AI Writing Support in Scientific Peer Review.Proceedings of the ACM on Human-Computer Interaction8, CSCW1 (2024), 1–32

  73. [73]

    Rashid Tahir, Brishna Batool, Hira Jamshed, Mahnoor Jameel, Mubashir Anwar, Faizan Ahmed, Muhammad Adeel Zaffar, and Muhammad Fareed Zaffar. 2021. Seeing is believing: Exploring perceptual differences in deepfake videos. InProceedings of the 2021 CHI conference on human factors in computing systems. 1–16

  74. [74]

    Liyan Tang, Zhaoyi Sun, Betina Idnay, Jordan G Nestor, Ali Soroush, Pierre A Elias, Ziyang Xu, Ying Ding, Greg Durrett, Justin F Rousseau, et al

  75. [75]

    Evaluating large language models on medical evidence summarization.npj Digital Medicine6, 1 (2023), 158

  76. [76]

    Catalina L Toma and Cassandra L Carlson. 2015. How do Facebook users believe they come across in their profiles?: A meta-perception approach to investigating Facebook self-presentation.Communication Research Reports32, 1 (2015), 93–101

  77. [77]

    Adaku Uchendu, Jooyoung Lee, Hua Shen, and Thai Le. 2022. Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts? (2022)

  78. [78]

    Liesbet Van Bulck and Philip Moons. 2023. What if your patient switches from Dr. Google to Dr. ChatGPT? A vignette-based survey of the trustworthiness, value, and danger of ChatGPT-generated responses to health questions.European Journal of Cardiovascular Nursing(2023), zvad038

  79. [79]

    Oleksandra Vereschak, Gilles Bailly, and Baptiste Caramiaux. 2021. How to Evaluate Trust in AI-Assisted Decision Making? A Survey of Empirical Methodologies.Proc. ACM Hum.-Comput. Interact.5, CSCW2, Article 327 (oct 2021), 39 pages. doi:10.1145/3476068

  80. [80]

    Joseph B Walther. 1996. Computer-mediated communication: Impersonal, interpersonal, and hyperpersonal interaction.Communication research23, 1 (1996), 3–43

Showing first 80 references.