Is it Cake or is it AI? A Systematic Review of Human Uncertainty in Distinguishing Generative Artificial Intelligence Content

Mark Louie F. Ramos

arxiv: 2604.03437 · v1 · submitted 2026-04-03 · 📊 stat.AP · cs.CY

Is it Cake or is it AI? A Systematic Review of Human Uncertainty in Distinguishing Generative Artificial Intelligence Content

Mark Louie F. Ramos This is my paper

Pith reviewed 2026-05-13 17:42 UTC · model grok-4.3

classification 📊 stat.AP cs.CY

keywords human detectiongenerative AIAI-generated contentsystematic reviewtextimagesvoicechance performance

0 comments

The pith

Humans detect generative AI content at chance levels across text, images, and voice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This systematic review pulled together results from 30 studies that tested whether people can tell AI-generated material apart from human-created material. Detection accuracy in the studies clustered around 50 percent, meaning performance was no better than random guessing. A reader would care because this pattern questions the common assumption that we can reliably verify the origin of digital content we see or hear every day. The finding suggests that strategies for judging trustworthiness may have to move beyond trying to spot fakes by eye or ear.

Core claim

The review of 30 empirical studies shows that human detection accuracy for generative AI content varies but generally clusters around chance performance, indicating that people are generally unreliable detectors of such content across text, image, and voice modalities.

What carries the argument

Aggregation of measured detection accuracy rates from controlled studies covering text, image, and voice modalities.

If this is right

Trust in digital content would need to rest on signals other than human-detectable authenticity.
Media evaluation practices may shift away from individual verification of origin.
Misinformation countermeasures that rely on users spotting fakes become less effective.
Platform and policy approaches to synthetic media must account for widespread inability to detect it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training or education aimed at improving detection may face inherent limits if performance stays near chance.
Legal standards that assume people can identify synthetic media may require revision.
Hybrid detection systems combining human judgment with automated tools gain practical importance.

Load-bearing premise

The 30 included studies form an unbiased sample of human detection performance without major publication bias or inconsistent measurement methods across modalities.

What would settle it

A large new study that finds humans achieve consistent accuracy well above 50 percent when distinguishing AI-generated content from human content in the same modalities would challenge the central claim.

read the original abstract

This systematic review synthesized empirical evidence on human ability to distinguish generative artificial intelligence content from human produced content across text, image, and voice modalities. A structured search of Scopus identified 22,541 records from 2025 to 2026, of which 1200 were screened and 30 studies were included. Across these studies, human detection accuracy varied widely but generally clustered around chance performance. Overall, the literature shows that humans are generally unreliable detectors of gen AI content, raising broader questions about whether the ability to tell should matter for how we evaluate or trust content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The review pulls together evidence that humans detect generative AI content at roughly chance levels, but the synthesis is only as strong as the unexamined selection of those 30 studies.

read the letter

The central point is straightforward: this systematic review of 30 studies from 2025-2026 concludes that people are generally unreliable at spotting AI-generated text, images, or voice, with accuracies clustering around chance. That matches scattered findings I've seen elsewhere, so the aggregation itself is a reasonable service to the field. It gives a single place to point when discussing media literacy or platform rules that assume detection is feasible. The search strategy via Scopus is described at a high level and the time window is recent, which keeps the scope manageable. Credit for that. The soft spots sit in the review mechanics. The abstract gives no explicit inclusion criteria, no quality scoring, and no mention of how they handled differences in task type or modality. If some studies used forced-choice while others used ratings, or if base rates varied, simply saying results cluster around chance risks smoothing over real variation. Publication bias is another live issue here—studies showing poor human performance are probably easier to publish than null or strong-detection ones. Without a risk-of-bias tool or heterogeneity numbers, the uniformity claim rests on an assumption that the included papers are representative. I would bring this to a reading group to talk through the detection literature, but only after checking whether the full methods section adds those safeguards. It is the sort of paper that could help policy readers or educators who need a quick map of the evidence, provided the methods are tightened. It deserves peer review so the authors can address the transparency gaps; the topic is relevant enough that a clearer version would be worth referee time.

Referee Report

3 major / 2 minor

Summary. This systematic review synthesizes empirical evidence on human ability to distinguish generative AI content from human-produced content across text, image, and voice modalities. A Scopus search from 2025-2026 identified 22,541 records, with 1,200 screened and 30 studies included. The synthesis finds that human detection accuracy varies widely but generally clusters around chance levels, leading to the conclusion that humans are generally unreliable detectors of gen AI content and raising questions about the role of detection ability in content trust and evaluation.

Significance. If the synthesis is robust, the review consolidates cross-modal evidence on a timely issue in AI ethics and media studies, highlighting limitations of human oversight for generative content. It provides a broad overview that could inform policy, education, and development of automated detection tools, while explicitly noting the need to question reliance on human judgment for authenticity.

major comments (3)

[Methods] Methods section: Inclusion criteria, screening process, and quality assessment are insufficiently detailed. No risk-of-bias tool is applied to the 30 studies, and no explicit list of excluded studies or reasons is provided, undermining confidence in the representativeness of the sample for the 'around chance' synthesis.
[Results] Results section: The claim that accuracies 'generally clustered around chance performance' relies on qualitative description without meta-analytic pooling, heterogeneity statistics (e.g., I²), or subgroup analysis by modality. This leaves the central finding vulnerable to influence from heterogeneous methods (forced-choice vs. ratings; text vs. image vs. voice) and potential publication bias.
[Discussion] Discussion section: Broader implications for content trust are drawn from the synthesis, but without addressing Scopus-only search limitations, gray literature exclusion, or cross-study measurement inconsistencies, the generalizability of the 'unreliable detectors' conclusion requires stronger justification.

minor comments (2)

[Abstract] Abstract: The date range '2025 to 2026' appears anomalous given current publication timelines and should be clarified or corrected.
Consider including a PRISMA flow diagram to document the record screening and inclusion process for greater transparency.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our systematic review. We have carefully considered each major comment and agree that several clarifications and expansions will strengthen the manuscript. Below we respond point by point, indicating the revisions we plan to implement.

read point-by-point responses

Referee: [Methods] Methods section: Inclusion criteria, screening process, and quality assessment are insufficiently detailed. No risk-of-bias tool is applied to the 30 studies, and no explicit list of excluded studies or reasons is provided, undermining confidence in the representativeness of the sample for the 'around chance' synthesis.

Authors: We agree that the methods section requires greater transparency. In the revised manuscript we will expand the inclusion/exclusion criteria with explicit operational definitions, provide a detailed account of the screening process (including number of independent reviewers and disagreement resolution), and describe our quality assessment approach. We will add a PRISMA flow diagram that reports reasons for exclusion at each stage. Although standard risk-of-bias tools are not ideally suited to the heterogeneous experimental designs in this literature, we will include a narrative quality appraisal of the 30 studies and note their limitations. These additions will directly address concerns about representativeness. revision: yes
Referee: [Results] Results section: The claim that accuracies 'generally clustered around chance performance' relies on qualitative description without meta-analytic pooling, heterogeneity statistics (e.g., I²), or subgroup analysis by modality. This leaves the central finding vulnerable to influence from heterogeneous methods (forced-choice vs. ratings; text vs. image vs. voice) and potential publication bias.

Authors: We acknowledge that a quantitative synthesis would be desirable but maintain that the extreme heterogeneity in outcome measures, experimental paradigms, and modalities makes formal meta-analysis inappropriate and potentially misleading. In revision we will add (1) a table of individual study accuracies with modality and task-type annotations, (2) a narrative assessment of heterogeneity, (3) modality-specific subgroup summaries, and (4) explicit discussion of possible publication bias. These changes will support the qualitative claim with greater rigor while avoiding over-interpretation of pooled statistics. revision: partial
Referee: [Discussion] Discussion section: Broader implications for content trust are drawn from the synthesis, but without addressing Scopus-only search limitations, gray literature exclusion, or cross-study measurement inconsistencies, the generalizability of the 'unreliable detectors' conclusion requires stronger justification.

Authors: We will revise the discussion to explicitly acknowledge these limitations. We will note the restriction to Scopus, discuss the potential impact of excluding gray literature, and elaborate on measurement inconsistencies across studies (e.g., forced-choice vs. continuous ratings). At the same time we will argue that the consistent pattern of near-chance performance across the included studies still supports the core conclusion, while qualifying the generalizability claims accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: systematic review synthesizes external studies

full rationale

The paper is a systematic review that identifies 30 external studies via Scopus search and summarizes their reported human detection accuracies. The central claim (humans cluster around chance performance) is an aggregation of independent empirical results from those studies, not a derivation from the review's own fitted parameters, self-defined quantities, or self-citation chain. No equations, ansatzes, or uniqueness theorems are invoked that reduce to the paper's inputs by construction. This is the expected non-finding for a literature synthesis whose evidence base lies outside the review itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a single-database search captured a representative set of studies and that the included papers used comparable detection tasks; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The Scopus search from 2025-2026 plus the screening process identified all relevant empirical studies on human detection of generative AI content.
Standard systematic-review assumption invoked to justify the final sample of 30 studies.

pith-pipeline@v0.9.0 · 5389 in / 1085 out tokens · 43311 ms · 2026-05-13T17:42:13.918085+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Artiﬁcial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry

Köbis N, Mossink LD. Artiﬁcial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior. 2021 Jan 1;114:106553. doi:10.1016/j.chb.2020.106553

work page doi:10.1016/j.chb.2020.106553 2021
[2]

Can AI tell good stories? Narrative transportation and persuasion with ChatGPT

Chu H, Liu S. Can AI tell good stories? Narrative transportation and persuasion with ChatGPT. Journal of Communication. 2024 Oct 1;74(5):347–58. doi:10.1093/joc/jqae029

work page doi:10.1093/joc/jqae029 2024
[3]

Artiﬁcial intelligence, deepfakes, and the uncertain future of truth

Villasenor J. Artiﬁcial intelligence, deepfakes, and the uncertain future of truth. Brookings [Internet]. 2019 [cited 2026 Apr 1]. Available from: https://www.brookings.edu/articles/artiﬁcial-intelligence-deepfakes-and-the- uncertain-future-of-truth/

work page 2019
[4]

Opinion | How Do You Know a Human Wrote This? The New York Times [Internet]

Manjoo F . Opinion | How Do You Know a Human Wrote This? The New York Times [Internet]. 2020 [cited 2026 Apr 1]. Available from: https://www.nytimes.com/2020/07/29/opinion/gpt-3-ai-automation.html

work page 2020
[5]

Perceiving emotion in human and AI voices: sensitivity to acoustic cues in Korean speech

Yoon D, Oh G, Kent R. Perceiving emotion in human and AI voices: sensitivity to acoustic cues in Korean speech. Lingua. 2026 Jan 1;330:104083. doi:10.1016/j.lingua.2025.104083

work page doi:10.1016/j.lingua.2025.104083 2026
[6]

AI or human? Exploring the effects of user awareness in conversational dynamics with virtual avatars

Kober SE, Streit S, Wood G. AI or human? Exploring the effects of user awareness in conversational dynamics with virtual avatars. Computers in Human Behavior. 2026 Aug;181:108984. doi:10.1016/j.chb.2026.108984

work page doi:10.1016/j.chb.2026.108984 2026
[7]

Content camouﬂage: How diversiﬁed posting patterns inﬂuence human detection of AI-enabled social bots

Saucier CJ, Wack M, Linvill D, Okoronkwo A, Tatineni G, Sezgin A. Content camouﬂage: How diversiﬁed posting patterns inﬂuence human detection of AI-enabled social bots. Computers in Human Behavior. 2026 Apr 1;177:108881. doi:10.5167/uzh-282286

work page doi:10.5167/uzh-282286 2026
[8]

The invisible author: Citizen sociolinguistic perspectives on identifying human and AI-generated narrative texts

Szabó G, Krizsai F , Deme A. The invisible author: Citizen sociolinguistic perspectives on identifying human and AI-generated narrative texts. Social Sciences & Humanities Open. 2026 Jun;13:102646. doi:10.1016/j.ssaho.2026.102646

work page doi:10.1016/j.ssaho.2026.102646 2026
[9]

Human versus artiﬁcial creativity: A case study in poetry

Holyoak KJ. Human versus artiﬁcial creativity: A case study in poetry. Journal of Creativity. 2026 Apr;36(1):100118. doi:10.1016/j.yjoc.2025.100118

work page doi:10.1016/j.yjoc.2025.100118 2026
[10]

Can AI write reports like a radiologist? A blinded evaluation of large language model-generated lumbar spine MRI reports

Zanardo M, Albano D, Molinari V , Fabrizio R, Conca M, Asmundo L, et al. Can AI write reports like a radiologist? A blinded evaluation of large language model-generated lumbar spine MRI reports. Eur Radiol Exp. 2026 Feb 23;10(1):16. doi:10.1186/s41747- 026-00682-6

work page doi:10.1186/s41747- 2026
[11]

Psychometric properties and detectability of GPT-4o–generated multiple-choice questions compared with human-authored items across imaging specialties

Linde P , Fichter F , Dietlein M, Sudbrock F , Afshar K, Dapper H, et al. Psychometric properties and detectability of GPT-4o–generated multiple-choice questions compared with human-authored items across imaging specialties. npj Digit Med. 2026 Jan 8;9(1):132. doi:10.1038/s41746-025-02313-7

work page doi:10.1038/s41746-025-02313-7 2026
[12]

Artiﬁcial Intelligence vs Human Authorship in Spine Surgery Fellowship Personal Statements: Can ChatGPT Outperform Applicants? Global Spine J

Karakash WJ, Avetisian H, Ragheb JM, Wang JC, Hah RJ, Alluri RK. Artiﬁcial Intelligence vs Human Authorship in Spine Surgery Fellowship Personal Statements: Can ChatGPT Outperform Applicants? Global Spine J. 2026 Jan;16(1):313–8. doi:10.1177/21925682251344248 PubMed PMID: 40392947; PubMed Central PMCID: PMC12092409

work page doi:10.1177/21925682251344248 2026
[14]

Death of the Personal Statement: Qualitative Comparison Between Human-Authored and Artiﬁcial Intelligence-Generated Medical School Admissions Essays

Vaccaro MJ, Sharma I, Espina-Rey AP , Lyman N, Palacios C, Zhang Y , et al. Death of the Personal Statement: Qualitative Comparison Between Human-Authored and Artiﬁcial Intelligence-Generated Medical School Admissions Essays. J Am Coll Surg. 2026 Jan 1;242(1):47–52. doi:10.1097/XCS.0000000000001602 PubMed PMID: 41051105

work page doi:10.1097/xcs.0000000000001602 2026
[15]

Can OMFS experts distinguish AI from human manuscripts? A double-blind evaluation using ChatGPT-4

Jain A. Can OMFS experts distinguish AI from human manuscripts? A double-blind evaluation using ChatGPT-4. J Craniomaxillofac Surg. 2026 Mar;54(3):104468. doi:10.1016/j.jcms.2026.104468 PubMed PMID: 41534249

work page doi:10.1016/j.jcms.2026.104468 2026
[16]

Phishing 2.0: Human Ability to Detect AI-Generated Content

Madleňák M, Hubočan S. Phishing 2.0: Human Ability to Detect AI-Generated Content. Transportation Research Procedia. 2026;93:1125–32. doi:10.1016/j.trpro.2025.12.051

work page doi:10.1016/j.trpro.2025.12.051 2026
[17]

Children’s Susceptibility to Content Generated by Artiﬁcial Intelligence

Langer A, Martinez S, Marshall P , Chein J. Children’s Susceptibility to Content Generated by Artiﬁcial Intelligence. Technology in Society. 2026 Mar 1;86:103303. doi:10.1016/j.techsoc.2026.103303

work page doi:10.1016/j.techsoc.2026.103303 2026
[18]

Framing digital inauthenticity: Comparing user detection of AI-generated faces to messaged-based scam methods

Sarno DM, Solorio J, Ballar S, Chadwick S, Harris K, Moss D, et al. Framing digital inauthenticity: Comparing user detection of AI-generated faces to messaged-based scam methods. Acta Psychol (Amst). 2026 Feb;262:105995. doi:10.1016/j.actpsy.2025.105995 PubMed PMID: 41349270

work page doi:10.1016/j.actpsy.2025.105995 2026
[19]

Domain-general object recognition predicts human ability to tell real from AI-generated faces

Chow JK, McGugin RW, Gauthier I. Domain-general object recognition predicts human ability to tell real from AI-generated faces. Journal of Experimental Psychology: General. 2026;155(3):629–48. doi:10.1037/xge0001881

work page doi:10.1037/xge0001881 2026
[20]

Genuine or Fake? Explaining Consumers’ Perception and Detection of AI-Generated Fake Reviews

Fröhnel K, Santelmann B, Zarnekow R. Genuine or Fake? Explaining Consumers’ Perception and Detection of AI-Generated Fake Reviews. In: Proceedings of the 58th Hawaii International Conference on System Sciences [Internet]. 2025 [cited 2026 Mar 31]. Available from: https://hdl.handle.net/10125/109350 doi:10.24251/HICSS.2025.505

work page doi:10.24251/hicss.2025.505 2025
[21]

People are poorly equipped to detect AI-powered voice clones

Barrington S, Cooper EA, Farid H. People are poorly equipped to detect AI-powered voice clones. Sci Rep. 2025 Mar 31;15(1):11004. doi:10.1038/s41598-025-94170-3

work page doi:10.1038/s41598-025-94170-3 2025
[22]

Voice clones sound realistic but not (yet) hyperrealistic

Lavan N, Irvine M, Rosi V , McGettigan C. Voice clones sound realistic but not (yet) hyperrealistic. PLOS ONE. 2025 Sep 24;20(9):e0332692. doi:10.1371/journal.pone.0332692

work page doi:10.1371/journal.pone.0332692 2025
[23]

Convincingness of AI-Generated Restaurant Reviews

Tuomi A, Abidin HZ, Tuominen P , Ascenção MP . Convincingness of AI-Generated Restaurant Reviews. Springer Proceedings in Business and Economics. 2025;437–48

work page 2025
[24]

Acceptance and trust in AI-generated exercise plans among recreational athletes and quality evaluation by experienced coaches: a pilot study

Wachholz F , Manno S, Schlachter D, Gamper N, Schnitzer M. Acceptance and trust in AI-generated exercise plans among recreational athletes and quality evaluation by experienced coaches: a pilot study. BMC Res Notes. 2025 Mar 13;18(1):112. doi:10.1186/s13104-025-07172-9

work page doi:10.1186/s13104-025-07172-9 2025
[25]

In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V

Zhu T, Weissburg I, Zhang K, Wang WY . Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated. In: Che W, Nabende J, Shutova E, Pilehvar MT, editors. Findings of the Association for Computational Linguistics: ACL 2025 [Internet]. Vienna, Austria: Association for Computational Linguistics; 2025 [cited 2026 Mar 31]. p. 2...

work page doi:10.18653/v1/2025 2025
[26]

Artiﬁcial intelligence vs

Franke Föyen L, Zapel E, Lekander M, Hedman-Lagerlöf E, Lindsäter E. Artiﬁcial intelligence vs. human expert: Licensed mental health clinicians’ blinded evaluation of AI-generated and expert psychological advice on quality, empathy, and perceived authorship. Internet Interv. 2025 Sep;41:100841. doi:10.1016/j.invent.2025.100841 PubMed PMID: 40525210; PubMe...

work page doi:10.1016/j.invent.2025.100841 2025
[27]

Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆

Fiedler A, Döpke J. Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆. International Review of Economics Education [Internet]. 2025 [cited 2026 Mar 31];49(C). Available from: https://ideas.repec.org//a/eee/ireced/v49y2025ics1477388025000131.html

work page 2025
[28]

Identiﬁcation of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers

Stadler RD, Sudah SY , Moverman MA, Denard PJ, Duralde XA, Garrigues GE, et al. Identiﬁcation of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy. 2025 Apr;41(4):916-924.e2. doi:10.1016/j.arthro.2024.06.045 PubMed PMID: 38992513

work page doi:10.1016/j.arthro.2024.06.045 2025
[29]

A Handwritten Text Recognition Dataset for Ajami Manuscripts in Fulfulde and Hausa,

Cardia F , Pentangelo V , Lambiase S, Gravino C, Palomba F , Marras M. Toward Realistic AI-Generated Student Questions to Support Instructor Training. In: Two Decades of TEL. From Lessons Learnt to Challenges Ahead: 20th European Conference on Technology Enhanced Learning, EC-TEL 2025, Newcastle upon Tyne and Durham, UK, September 15–19, 2025, Proceedings...

work page doi:10.1007/978-3-032- 2025
[30]

Human or Machine? A Comparative Analysis of Artiﬁcial Intelligence-Generated Writing Detection in Personal Statements

Goodman MA, Lee AM, Schreck Z, Hollman JH. Human or Machine? A Comparative Analysis of Artiﬁcial Intelligence-Generated Writing Detection in Personal Statements. J Phys Ther Educ. 2025 Dec 1;39(4):329–38. doi:10.1097/JTE.0000000000000396 PubMed PMID: 39808529

work page doi:10.1097/jte.0000000000000396 2025
[31]

Alkhoﬁ A. Man vs. machine: can AI outperform ESL student translations? Front Artif Intell. 2025 Jul 9;8:1624754. doi:10.3389/frai.2025.1624754 PubMed PMID: 40703308; PubMed Central PMCID: PMC12283786

work page doi:10.3389/frai.2025.1624754 2025
[32]

Interpretation of AI- Generated vs

Velásquez-Salamanca D, Martín-Pascual MÁ, Andreu-Sánchez C. Interpretation of AI- Generated vs. Human-Made Images. Journal of Imaging. 2025 Jul;11(7):227. doi:10.3390/jimaging11070227

work page doi:10.3390/jimaging11070227 2025
[33]

What you see is not what you get anymore: a mixed- methods approach on human perception of AI-generated images

Högemann M, Betke J, Thomas O. What you see is not what you get anymore: a mixed- methods approach on human perception of AI-generated images. Front Artif Intell. 2025;8:1707336. doi:10.3389/frai.2025.1707336 PubMed PMID: 41346853; PubMed Central PMCID: PMC12672458

work page doi:10.3389/frai.2025.1707336 2025
[34]

Generative Art in Your Pocket: User Perception and Acceptance of AI- Generated Abstract Art for Mobile Wallpapers

Wang Z, Jin Y . Generative Art in Your Pocket: User Perception and Acceptance of AI- Generated Abstract Art for Mobile Wallpapers. In: Proceedings of the Twelfth International Symposium of Chinese CHI [Internet]. New York, NY , USA: Association for Computing Machinery; 2025 [cited 2026 Mar 31]. p. 716–21. (CHCHI ’24). Available from: https://dl.acm.org/do...

work page doi:10.1145/3758871.3758940 2025
[35]

The Conversation [Internet]

Stone J. The Conversation [Internet]. 2024 [cited 2026 Mar 31]. People can’t tell the difference between human and AI-generated poetry – new study. Available from: https://theconversation.com/people-cant-tell-the-difference-between-human-and-ai- generated-poetry-new-study-243750 doi:10.64628/AB.99e9sddjt

work page doi:10.64628/ab.99e9sddjt 2024
[36]

Hancock, and Mor Naaman

Jakesch M, Hancock JT, Naaman M. Human heuristics for AI-generated language are ﬂawed. Proceedings of the National Academy of Sciences. 2023 Mar 14;120(11):e2208839120. doi:10.1073/pnas.2208839120

work page doi:10.1073/pnas.2208839120 2023
[37]

New results in AI research: Humans barely able to recognize AI- generated media [Internet]

Koltermann F . New results in AI research: Humans barely able to recognize AI- generated media [Internet]. 2024 [cited 2026 Mar 31]. Available from: http://cispa.de/en/holz-ai-generated-media

work page 2024
[38]

End User: AI is becoming too realistic

Ellenberg L, Radcliffe S. End User: AI is becoming too realistic. The Ithacan [Internet]. 2025 Nov 19 [cited 2026 Mar 31]. Available from: https://theithacan.org/64577/opinion/columns/ai-is-becoming-too-realistic/

work page 2025
[39]

and Griffin, L.D

Mai KT, Bray S, Davies T, Griffin LD. Warning: Humans cannot reliably detect speech deepfakes. PLOS ONE. 2023 Aug 2;18(8):e0285333. doi:10.1371/journal.pone.0285333

work page doi:10.1371/journal.pone.0285333 2023
[40]

Photo forensics from lighting shadows and reﬂections [Internet]

Farid H. Photo forensics from lighting shadows and reﬂections [Internet]. 2023 [cited 2026 Mar 31]. Available from: https://contentauthenticity.org/blog/photo-forensics- from-lighting-shadows-and-reﬂections

work page 2023
[41]

and Soraperra, I

Köbis NC, Doležalová B, Soraperra I. Fooled twice: People cannot detect deepfakes but think they can. iScience. 2021 Oct 29;24(11):103364. doi:10.1016/j.isci.2021.103364 PubMed PMID: 34820608; PubMed Central PMCID: PMC8602050

work page doi:10.1016/j.isci.2021.103364 2021
[42]

Nat Methods

Using AI responsibly in scientiﬁc publishing. Nat Methods. 2026 Feb;23(2):271–271. doi:10.1038/s41592-026-03020-1

work page doi:10.1038/s41592-026-03020-1 2026
[43]

Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Manipulation

Bradshaw S, Howard PN. Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Manipulation. 2017

work page 2017
[44]

Architects of Networked Disinformation: Behind the Scenes of Troll Accounts and Fake News Production in the Philippines [Monograph] [Internet]

Ong JC, Cabanes JVA. Architects of Networked Disinformation: Behind the Scenes of Troll Accounts and Fake News Production in the Philippines [Monograph] [Internet]. Leeds; 2018 [cited 2026 Apr 1]. Available from: http://newtontechfordev.com/wp- content/uploads/2018/02/ARCHITECTS-OF-NETWORKED-DISINFORMATION-FULL- REPORT.pdf

work page 2018

[1] [1]

Artiﬁcial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry

Köbis N, Mossink LD. Artiﬁcial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior. 2021 Jan 1;114:106553. doi:10.1016/j.chb.2020.106553

work page doi:10.1016/j.chb.2020.106553 2021

[2] [2]

Can AI tell good stories? Narrative transportation and persuasion with ChatGPT

Chu H, Liu S. Can AI tell good stories? Narrative transportation and persuasion with ChatGPT. Journal of Communication. 2024 Oct 1;74(5):347–58. doi:10.1093/joc/jqae029

work page doi:10.1093/joc/jqae029 2024

[3] [3]

Artiﬁcial intelligence, deepfakes, and the uncertain future of truth

Villasenor J. Artiﬁcial intelligence, deepfakes, and the uncertain future of truth. Brookings [Internet]. 2019 [cited 2026 Apr 1]. Available from: https://www.brookings.edu/articles/artiﬁcial-intelligence-deepfakes-and-the- uncertain-future-of-truth/

work page 2019

[4] [4]

Opinion | How Do You Know a Human Wrote This? The New York Times [Internet]

Manjoo F . Opinion | How Do You Know a Human Wrote This? The New York Times [Internet]. 2020 [cited 2026 Apr 1]. Available from: https://www.nytimes.com/2020/07/29/opinion/gpt-3-ai-automation.html

work page 2020

[5] [5]

Perceiving emotion in human and AI voices: sensitivity to acoustic cues in Korean speech

Yoon D, Oh G, Kent R. Perceiving emotion in human and AI voices: sensitivity to acoustic cues in Korean speech. Lingua. 2026 Jan 1;330:104083. doi:10.1016/j.lingua.2025.104083

work page doi:10.1016/j.lingua.2025.104083 2026

[6] [6]

AI or human? Exploring the effects of user awareness in conversational dynamics with virtual avatars

Kober SE, Streit S, Wood G. AI or human? Exploring the effects of user awareness in conversational dynamics with virtual avatars. Computers in Human Behavior. 2026 Aug;181:108984. doi:10.1016/j.chb.2026.108984

work page doi:10.1016/j.chb.2026.108984 2026

[7] [7]

Content camouﬂage: How diversiﬁed posting patterns inﬂuence human detection of AI-enabled social bots

Saucier CJ, Wack M, Linvill D, Okoronkwo A, Tatineni G, Sezgin A. Content camouﬂage: How diversiﬁed posting patterns inﬂuence human detection of AI-enabled social bots. Computers in Human Behavior. 2026 Apr 1;177:108881. doi:10.5167/uzh-282286

work page doi:10.5167/uzh-282286 2026

[8] [8]

The invisible author: Citizen sociolinguistic perspectives on identifying human and AI-generated narrative texts

Szabó G, Krizsai F , Deme A. The invisible author: Citizen sociolinguistic perspectives on identifying human and AI-generated narrative texts. Social Sciences & Humanities Open. 2026 Jun;13:102646. doi:10.1016/j.ssaho.2026.102646

work page doi:10.1016/j.ssaho.2026.102646 2026

[9] [9]

Human versus artiﬁcial creativity: A case study in poetry

Holyoak KJ. Human versus artiﬁcial creativity: A case study in poetry. Journal of Creativity. 2026 Apr;36(1):100118. doi:10.1016/j.yjoc.2025.100118

work page doi:10.1016/j.yjoc.2025.100118 2026

[10] [10]

Can AI write reports like a radiologist? A blinded evaluation of large language model-generated lumbar spine MRI reports

Zanardo M, Albano D, Molinari V , Fabrizio R, Conca M, Asmundo L, et al. Can AI write reports like a radiologist? A blinded evaluation of large language model-generated lumbar spine MRI reports. Eur Radiol Exp. 2026 Feb 23;10(1):16. doi:10.1186/s41747- 026-00682-6

work page doi:10.1186/s41747- 2026

[11] [11]

Psychometric properties and detectability of GPT-4o–generated multiple-choice questions compared with human-authored items across imaging specialties

Linde P , Fichter F , Dietlein M, Sudbrock F , Afshar K, Dapper H, et al. Psychometric properties and detectability of GPT-4o–generated multiple-choice questions compared with human-authored items across imaging specialties. npj Digit Med. 2026 Jan 8;9(1):132. doi:10.1038/s41746-025-02313-7

work page doi:10.1038/s41746-025-02313-7 2026

[12] [12]

Artiﬁcial Intelligence vs Human Authorship in Spine Surgery Fellowship Personal Statements: Can ChatGPT Outperform Applicants? Global Spine J

Karakash WJ, Avetisian H, Ragheb JM, Wang JC, Hah RJ, Alluri RK. Artiﬁcial Intelligence vs Human Authorship in Spine Surgery Fellowship Personal Statements: Can ChatGPT Outperform Applicants? Global Spine J. 2026 Jan;16(1):313–8. doi:10.1177/21925682251344248 PubMed PMID: 40392947; PubMed Central PMCID: PMC12092409

work page doi:10.1177/21925682251344248 2026

[13] [14]

Death of the Personal Statement: Qualitative Comparison Between Human-Authored and Artiﬁcial Intelligence-Generated Medical School Admissions Essays

Vaccaro MJ, Sharma I, Espina-Rey AP , Lyman N, Palacios C, Zhang Y , et al. Death of the Personal Statement: Qualitative Comparison Between Human-Authored and Artiﬁcial Intelligence-Generated Medical School Admissions Essays. J Am Coll Surg. 2026 Jan 1;242(1):47–52. doi:10.1097/XCS.0000000000001602 PubMed PMID: 41051105

work page doi:10.1097/xcs.0000000000001602 2026

[14] [15]

Can OMFS experts distinguish AI from human manuscripts? A double-blind evaluation using ChatGPT-4

Jain A. Can OMFS experts distinguish AI from human manuscripts? A double-blind evaluation using ChatGPT-4. J Craniomaxillofac Surg. 2026 Mar;54(3):104468. doi:10.1016/j.jcms.2026.104468 PubMed PMID: 41534249

work page doi:10.1016/j.jcms.2026.104468 2026

[15] [16]

Phishing 2.0: Human Ability to Detect AI-Generated Content

Madleňák M, Hubočan S. Phishing 2.0: Human Ability to Detect AI-Generated Content. Transportation Research Procedia. 2026;93:1125–32. doi:10.1016/j.trpro.2025.12.051

work page doi:10.1016/j.trpro.2025.12.051 2026

[16] [17]

Children’s Susceptibility to Content Generated by Artiﬁcial Intelligence

Langer A, Martinez S, Marshall P , Chein J. Children’s Susceptibility to Content Generated by Artiﬁcial Intelligence. Technology in Society. 2026 Mar 1;86:103303. doi:10.1016/j.techsoc.2026.103303

work page doi:10.1016/j.techsoc.2026.103303 2026

[17] [18]

Framing digital inauthenticity: Comparing user detection of AI-generated faces to messaged-based scam methods

Sarno DM, Solorio J, Ballar S, Chadwick S, Harris K, Moss D, et al. Framing digital inauthenticity: Comparing user detection of AI-generated faces to messaged-based scam methods. Acta Psychol (Amst). 2026 Feb;262:105995. doi:10.1016/j.actpsy.2025.105995 PubMed PMID: 41349270

work page doi:10.1016/j.actpsy.2025.105995 2026

[18] [19]

Domain-general object recognition predicts human ability to tell real from AI-generated faces

Chow JK, McGugin RW, Gauthier I. Domain-general object recognition predicts human ability to tell real from AI-generated faces. Journal of Experimental Psychology: General. 2026;155(3):629–48. doi:10.1037/xge0001881

work page doi:10.1037/xge0001881 2026

[19] [20]

Genuine or Fake? Explaining Consumers’ Perception and Detection of AI-Generated Fake Reviews

Fröhnel K, Santelmann B, Zarnekow R. Genuine or Fake? Explaining Consumers’ Perception and Detection of AI-Generated Fake Reviews. In: Proceedings of the 58th Hawaii International Conference on System Sciences [Internet]. 2025 [cited 2026 Mar 31]. Available from: https://hdl.handle.net/10125/109350 doi:10.24251/HICSS.2025.505

work page doi:10.24251/hicss.2025.505 2025

[20] [21]

People are poorly equipped to detect AI-powered voice clones

Barrington S, Cooper EA, Farid H. People are poorly equipped to detect AI-powered voice clones. Sci Rep. 2025 Mar 31;15(1):11004. doi:10.1038/s41598-025-94170-3

work page doi:10.1038/s41598-025-94170-3 2025

[21] [22]

Voice clones sound realistic but not (yet) hyperrealistic

Lavan N, Irvine M, Rosi V , McGettigan C. Voice clones sound realistic but not (yet) hyperrealistic. PLOS ONE. 2025 Sep 24;20(9):e0332692. doi:10.1371/journal.pone.0332692

work page doi:10.1371/journal.pone.0332692 2025

[22] [23]

Convincingness of AI-Generated Restaurant Reviews

Tuomi A, Abidin HZ, Tuominen P , Ascenção MP . Convincingness of AI-Generated Restaurant Reviews. Springer Proceedings in Business and Economics. 2025;437–48

work page 2025

[23] [24]

Acceptance and trust in AI-generated exercise plans among recreational athletes and quality evaluation by experienced coaches: a pilot study

Wachholz F , Manno S, Schlachter D, Gamper N, Schnitzer M. Acceptance and trust in AI-generated exercise plans among recreational athletes and quality evaluation by experienced coaches: a pilot study. BMC Res Notes. 2025 Mar 13;18(1):112. doi:10.1186/s13104-025-07172-9

work page doi:10.1186/s13104-025-07172-9 2025

[24] [25]

In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V

Zhu T, Weissburg I, Zhang K, Wang WY . Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated. In: Che W, Nabende J, Shutova E, Pilehvar MT, editors. Findings of the Association for Computational Linguistics: ACL 2025 [Internet]. Vienna, Austria: Association for Computational Linguistics; 2025 [cited 2026 Mar 31]. p. 2...

work page doi:10.18653/v1/2025 2025

[25] [26]

Artiﬁcial intelligence vs

Franke Föyen L, Zapel E, Lekander M, Hedman-Lagerlöf E, Lindsäter E. Artiﬁcial intelligence vs. human expert: Licensed mental health clinicians’ blinded evaluation of AI-generated and expert psychological advice on quality, empathy, and perceived authorship. Internet Interv. 2025 Sep;41:100841. doi:10.1016/j.invent.2025.100841 PubMed PMID: 40525210; PubMe...

work page doi:10.1016/j.invent.2025.100841 2025

[26] [27]

Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆

Fiedler A, Döpke J. Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆. International Review of Economics Education [Internet]. 2025 [cited 2026 Mar 31];49(C). Available from: https://ideas.repec.org//a/eee/ireced/v49y2025ics1477388025000131.html

work page 2025

[27] [28]

Identiﬁcation of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers

Stadler RD, Sudah SY , Moverman MA, Denard PJ, Duralde XA, Garrigues GE, et al. Identiﬁcation of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy. 2025 Apr;41(4):916-924.e2. doi:10.1016/j.arthro.2024.06.045 PubMed PMID: 38992513

work page doi:10.1016/j.arthro.2024.06.045 2025

[28] [29]

A Handwritten Text Recognition Dataset for Ajami Manuscripts in Fulfulde and Hausa,

Cardia F , Pentangelo V , Lambiase S, Gravino C, Palomba F , Marras M. Toward Realistic AI-Generated Student Questions to Support Instructor Training. In: Two Decades of TEL. From Lessons Learnt to Challenges Ahead: 20th European Conference on Technology Enhanced Learning, EC-TEL 2025, Newcastle upon Tyne and Durham, UK, September 15–19, 2025, Proceedings...

work page doi:10.1007/978-3-032- 2025

[29] [30]

Human or Machine? A Comparative Analysis of Artiﬁcial Intelligence-Generated Writing Detection in Personal Statements

Goodman MA, Lee AM, Schreck Z, Hollman JH. Human or Machine? A Comparative Analysis of Artiﬁcial Intelligence-Generated Writing Detection in Personal Statements. J Phys Ther Educ. 2025 Dec 1;39(4):329–38. doi:10.1097/JTE.0000000000000396 PubMed PMID: 39808529

work page doi:10.1097/jte.0000000000000396 2025

[30] [31]

Alkhoﬁ A. Man vs. machine: can AI outperform ESL student translations? Front Artif Intell. 2025 Jul 9;8:1624754. doi:10.3389/frai.2025.1624754 PubMed PMID: 40703308; PubMed Central PMCID: PMC12283786

work page doi:10.3389/frai.2025.1624754 2025

[31] [32]

Interpretation of AI- Generated vs

Velásquez-Salamanca D, Martín-Pascual MÁ, Andreu-Sánchez C. Interpretation of AI- Generated vs. Human-Made Images. Journal of Imaging. 2025 Jul;11(7):227. doi:10.3390/jimaging11070227

work page doi:10.3390/jimaging11070227 2025

[32] [33]

What you see is not what you get anymore: a mixed- methods approach on human perception of AI-generated images

Högemann M, Betke J, Thomas O. What you see is not what you get anymore: a mixed- methods approach on human perception of AI-generated images. Front Artif Intell. 2025;8:1707336. doi:10.3389/frai.2025.1707336 PubMed PMID: 41346853; PubMed Central PMCID: PMC12672458

work page doi:10.3389/frai.2025.1707336 2025

[33] [34]

Generative Art in Your Pocket: User Perception and Acceptance of AI- Generated Abstract Art for Mobile Wallpapers

Wang Z, Jin Y . Generative Art in Your Pocket: User Perception and Acceptance of AI- Generated Abstract Art for Mobile Wallpapers. In: Proceedings of the Twelfth International Symposium of Chinese CHI [Internet]. New York, NY , USA: Association for Computing Machinery; 2025 [cited 2026 Mar 31]. p. 716–21. (CHCHI ’24). Available from: https://dl.acm.org/do...

work page doi:10.1145/3758871.3758940 2025

[34] [35]

The Conversation [Internet]

Stone J. The Conversation [Internet]. 2024 [cited 2026 Mar 31]. People can’t tell the difference between human and AI-generated poetry – new study. Available from: https://theconversation.com/people-cant-tell-the-difference-between-human-and-ai- generated-poetry-new-study-243750 doi:10.64628/AB.99e9sddjt

work page doi:10.64628/ab.99e9sddjt 2024

[35] [36]

Hancock, and Mor Naaman

Jakesch M, Hancock JT, Naaman M. Human heuristics for AI-generated language are ﬂawed. Proceedings of the National Academy of Sciences. 2023 Mar 14;120(11):e2208839120. doi:10.1073/pnas.2208839120

work page doi:10.1073/pnas.2208839120 2023

[36] [37]

New results in AI research: Humans barely able to recognize AI- generated media [Internet]

Koltermann F . New results in AI research: Humans barely able to recognize AI- generated media [Internet]. 2024 [cited 2026 Mar 31]. Available from: http://cispa.de/en/holz-ai-generated-media

work page 2024

[37] [38]

End User: AI is becoming too realistic

Ellenberg L, Radcliffe S. End User: AI is becoming too realistic. The Ithacan [Internet]. 2025 Nov 19 [cited 2026 Mar 31]. Available from: https://theithacan.org/64577/opinion/columns/ai-is-becoming-too-realistic/

work page 2025

[38] [39]

and Griffin, L.D

Mai KT, Bray S, Davies T, Griffin LD. Warning: Humans cannot reliably detect speech deepfakes. PLOS ONE. 2023 Aug 2;18(8):e0285333. doi:10.1371/journal.pone.0285333

work page doi:10.1371/journal.pone.0285333 2023

[39] [40]

Photo forensics from lighting shadows and reﬂections [Internet]

Farid H. Photo forensics from lighting shadows and reﬂections [Internet]. 2023 [cited 2026 Mar 31]. Available from: https://contentauthenticity.org/blog/photo-forensics- from-lighting-shadows-and-reﬂections

work page 2023

[40] [41]

and Soraperra, I

Köbis NC, Doležalová B, Soraperra I. Fooled twice: People cannot detect deepfakes but think they can. iScience. 2021 Oct 29;24(11):103364. doi:10.1016/j.isci.2021.103364 PubMed PMID: 34820608; PubMed Central PMCID: PMC8602050

work page doi:10.1016/j.isci.2021.103364 2021

[41] [42]

Nat Methods

Using AI responsibly in scientiﬁc publishing. Nat Methods. 2026 Feb;23(2):271–271. doi:10.1038/s41592-026-03020-1

work page doi:10.1038/s41592-026-03020-1 2026

[42] [43]

Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Manipulation

Bradshaw S, Howard PN. Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Manipulation. 2017

work page 2017

[43] [44]

Architects of Networked Disinformation: Behind the Scenes of Troll Accounts and Fake News Production in the Philippines [Monograph] [Internet]

Ong JC, Cabanes JVA. Architects of Networked Disinformation: Behind the Scenes of Troll Accounts and Fake News Production in the Philippines [Monograph] [Internet]. Leeds; 2018 [cited 2026 Apr 1]. Available from: http://newtontechfordev.com/wp- content/uploads/2018/02/ARCHITECTS-OF-NETWORKED-DISINFORMATION-FULL- REPORT.pdf

work page 2018