arxiv: 2605.14021 · v1 · submitted 2026-05-13 · 💻 cs.CY · cs.AI

Recognition: no theorem link

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

Haofei Xu , Umar Iqbal , Jacob M. Montgomery

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:09 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords Google AI Overviewssearch measurementgenerative AIclaim fidelitysource qualitypublisher revenuequery activationinformation ecosystem

0 comments

The pith

Google AI Overviews activate on 13.7% of searches, with 11% of their claims unsupported by cited pages and most of those pages carrying ads that suppress publisher revenue.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures Google AI Overviews across 55,393 trending queries over 40 days to track how often they appear and how well they perform. Activation reaches 13.7% overall but jumps to 64.7% for question queries while staying low on political topics. Cited sources prove more credible than standard results yet nearly 30% are absent from those results, showing a separate selection process. Breaking answers into 98,020 atomic claims reveals that 11% lack support from the cited pages, mostly through omissions, and source quality does not predict fidelity. Over half the cited pages include display advertising, so publishers lose clicks and revenue when the AI summary replaces the original link.

Core claim

Issuing 55,393 trending queries across 19 categories shows AIO activation at 13.7% overall and 64.7% for question-form queries, with lower rates on politically sensitive topics. AIO-cited domains are more credible than co-displayed results but nearly 30% do not appear in those results. Of 98,020 decomposed atomic claims, 11.0% are unsupported by the cited pages, with omission as the main failure mode, and fidelity is independent of source quality. Well over half of AIO-cited pages carry display advertising, so publishers lose revenue when AIOs suppress clicks while Google's sponsored ads remain.

What carries the argument

Large-scale longitudinal query measurement combined with atomic claim decomposition to quantify activation rates, source credibility differences, unsupported claims, and advertising presence on cited pages.

Load-bearing premise

The 55,393 trending queries represent typical user searches and that breaking responses into atomic claims can be done reliably without systematic bias in measuring unsupported content.

What would settle it

Re-running the same queries with a different query sample or having human raters verify the support status of a random subset of the 98,020 claims would show whether the reported activation and unsupported rates hold.

Figures

Figures reproduced from arXiv: 2605.14021 by Haofei Xu, Jacob M. Montgomery, Umar Iqbal.

**Figure 2.** Figure 2: Daily AIO activation rate (red line, 7-day mov [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of reference counts across 7,583 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Median reference count by topical category. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Verification label distribution by topic cate [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Google AI Overviews (AIOs) are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated. Where search engines have traditionally surfaced ranked sources and left users to evaluate them, AIOs synthesize and deliver a single answer - giving Google unprecedented editorial control over what users read and know. We present a large-scale longitudinal measurement study, issuing 55,393 trending queries across 19 topical categories over a 40-day window (March 13 - April 21, 2026). We report four main findings. First, overall AIO activation is 13.7%, rising to 64.7% for question-form queries, while politically sensitive topics see markedly lower rates. Second, AIO-cited domains are more credible than co-displayed first-page results, yet nearly 30% do not appear in those results at all, indicating a source selection mechanism distinct from Google's ranking algorithm. Third, decomposing responses into 98,020 atomic claims, 11.0% are unsupported by the cited pages - with omission the dominant failure mode - and source quality and claim fidelity are largely independent. Fourth, well over half of AIO-cited pages carry display advertising, meaning publishers lose revenue when AIOs suppress the click-through, even as Google's own sponsored ads continue to appear on the same page. Together, these findings document a rapid transformation of the online information ecosystem whose consequences for epistemic security remain poorly understood.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

First large-scale numbers on AI Overviews activation and claim support, but the atomic claim breakdown lacks reported validation.

read the letter

This paper gives the first big empirical snapshot of Google AI Overviews. They ran 55,393 trending queries over 40 days and tracked when the feature activates, which sources it picks, how well its claims hold up, and what that means for publishers. Activation lands at 13.7 percent overall and jumps to 64.7 percent on question queries, while political topics see lower rates. Cited domains are more credible than standard results, yet nearly 30 percent of them do not appear in the first page at all. They split responses into 98,020 atomic claims and report 11 percent unsupported, mostly through omission, with fidelity largely independent of source quality. Over half the cited pages carry display ads, so publishers lose clicks when the summary takes over. The work stays observational and covers several angles at once without extra modeling. The scale and the mix of questions make the numbers useful for anyone following how generative search changes information flow. The soft spot is the claim decomposition. Breaking text into 98k atomic claims and judging support against the cited pages is the load-bearing step for the 11 percent figure, yet the text gives no details on extraction rules, inter-rater checks, or a human validation sample. Without those, the unsupported rate could move with different choices. The trending-query sample is reasonable for this purpose but does not stand in for all user searches. This is for researchers on search engines, AI information systems, and media economics. It deserves peer review because the empirical reach is substantial and the questions are current, even if the methods section needs more transparency in revision.

Referee Report

1 major / 1 minor

Summary. The manuscript reports results from a 40-day longitudinal measurement study issuing 55,393 trending queries across 19 topical categories to Google. It finds AIO activation at 13.7% overall (64.7% for question-form queries, lower for politically sensitive topics), AIO-cited domains more credible than co-displayed first-page results yet ~30% absent from those results, 11.0% of 98,020 atomic claims unsupported by cited pages (omission dominant) with source quality and fidelity largely independent, and >50% of AIO-cited pages carrying display advertising.

Significance. If the measurements are reliable, the study supplies large-scale empirical evidence on activation rates, source selection distinct from ranking, claim fidelity, and publisher revenue displacement in generative search summaries. These observations bear directly on epistemic security, information ecosystem shifts, and advertising economics, providing a useful baseline for future work.

major comments (1)

[Abstract / claim-fidelity section] Abstract (third finding) and associated methods: the decomposition of responses into 98,020 atomic claims and the 11.0% unsupported rate are load-bearing for the fidelity and independence claims, yet the manuscript provides no description of atomic-claim extraction criteria, support-judgment rules, inter-rater reliability, or human-validation subsample. Without these details the reported percentage cannot be verified and may embed systematic bias.

minor comments (1)

[Abstract] The date window March 13–April 21, 2026 appears to lie in the future relative to the arXiv posting; confirm the correct interval or clarify the study timeline.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback on our manuscript. We address the single major comment below and will revise the manuscript to incorporate the requested methodological details.

read point-by-point responses

Referee: [Abstract / claim-fidelity section] Abstract (third finding) and associated methods: the decomposition of responses into 98,020 atomic claims and the 11.0% unsupported rate are load-bearing for the fidelity and independence claims, yet the manuscript provides no description of atomic-claim extraction criteria, support-judgment rules, inter-rater reliability, or human-validation subsample. Without these details the reported percentage cannot be verified and may embed systematic bias.

Authors: We agree that the current version of the manuscript does not include a sufficiently detailed description of the atomic-claim extraction process, the rules used to judge support or omission, inter-rater reliability statistics, or the human-validation subsample. This information is necessary for reproducibility and to allow readers to evaluate potential bias. In the revised manuscript we will add a dedicated subsection to the Methods section that specifies: (1) the annotation guidelines and decision rules for decomposing AIO responses into atomic claims, (2) the criteria for classifying a claim as supported (direct quotation, close paraphrase, or logical entailment from the cited page), (3) inter-rater agreement results (including Cohen’s kappa) obtained from a pilot study on a random subsample of claims, and (4) the size, selection procedure, and validation protocol for the human-reviewed subsample. These additions will be placed immediately before the presentation of the 11.0 % unsupported rate so that the fidelity and independence findings can be properly assessed. No changes to the reported percentages themselves are required. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational measurement study

full rationale

The paper conducts a longitudinal measurement by issuing 55,393 queries, recording AIO activation rates, identifying cited domains, decomposing responses into 98,020 atomic claims, and counting unsupported claims plus advertising presence. No equations, fitted parameters, self-referential derivations, or load-bearing self-citations appear in the reported findings. All percentages (13.7% activation, 11.0% unsupported, etc.) are direct empirical tallies from the collected data, not reduced to prior quantities by construction. The study is self-contained against external benchmarks and contains no derivation chain that collapses to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that trending queries capture representative search behavior and that manual or automated claim decomposition is unbiased; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Trending queries obtained from Google Trends are representative of typical user search behavior across the 19 categories
Used to select the 55,393 queries issued in the study
domain assumption Atomic claims can be reliably extracted from AIO text and checked against cited pages without systematic omission bias
Required for the 11.0% unsupported claim statistic

pith-pipeline@v0.9.0 · 5580 in / 1381 out tokens · 38775 ms · 2026-05-15T02:09:49.978034+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 3 internal anchors

[1]

AdExchanger. 2026. The AI Search Reckoning Is Dismantling Open Web Traffic. https://www.adexchanger.com/publishers/the-ai- search-reckoning-is-dismantling-open-web-traffic-and-publishers- may-never-recover/

work page 2026
[2]

Saharsh Agarwal and Ananya Sen. 2026. Google AI Overviews and Publisher Traffic: Evidence from a Field Experiment. https://doi.org/ 10.2139/ssrn.6513059 SSRN Working Paper No. 6513059

work page doi:10.2139/ssrn.6513059 2026
[3]

Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, and Ameet Deshpande. 2024. GEO: Gen- erative Engine Optimization. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(Barcelona, Spain) (KDD ’24). Association for Computing Machinery, New York, NY, USA, 5–16. https://doi.org/10.1145...

work page doi:10.1145/3637528.3671900 2024
[4]

Alphabet Inc. 2026. Form 10-K: Annual Report for the Fiscal Year Ended December 31, 2025. U.S. Securities and Exchange Commis- sion Filing. https://www.sec.gov/Archives/edgar/data/0001652044/ 000165204426000018/goog-20251231.htm Accessed: 2026-05-06

work page arXiv 2026
[5]

Benjamin Andow, Samin Yaseer Mahmud, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Serge Egelman. 2020. Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with PoliCheck. In29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 985–1002. https://www. usenix.org/conference/usenixsecur...

work page 2020
[6]

Sinan Aral, Haiwen Li, and Rui Zuo. 2026. The Rise of AI Search: Implications for Information Markets and Human Judgement at Scale. arXiv preprint arXiv:2602.13415(2026)

work page arXiv 2026
[7]

Yuri Baburov. 2025. readability-lxml: Fast HTML to Text Parser (Ar- ticle Readability Tool). https://github.com/buriy/python-readability. Accessed: 2026-04-25

work page 2025
[8]

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, ...

work page 2020
[9]

Competition and Markets Authority. 2026. CMA Proposes Pack- age of Measures to Improve Google Search Services in the UK. https://www.gov.uk/government/news/cma-proposes-package- of-measures-to-improve-google-search-services-in-uk

work page 2026
[10]

Hao Cui, Rahmadi Trimananda, Athina Markopoulou, and Scott Jordan

work page
[11]

In32nd USENIX Security Symposium (USENIX Security 23)

PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 1037–1054. https://www.usenix. org/conference/usenixsecurity23/presentation/cui

work page
[12]

Digital Content Next. 2025. Facts: Google’s Push to AI Hurts Publisher Traffic. https://digitalcontentnext.org/blog/2025/08/14/facts-googles- push-to-ai-hurts-publisher-traffic/

work page 2025
[13]

EasyList Authors. [n. d.]. EasyList. https://easylist.to/. Accessed: 2026-03-24

work page 2026
[14]

Robert Epstein and Ronald E Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections.Proceedings of the National Academy of Sciences112, 33 (2015), E4512–E4521

work page 2015
[15]

European Commission. 2025. Commission Opens Investigation into Possible Anticompetitive Conduct by Google in the Use of Online Con- tent for AI Purposes. https://ec.europa.eu/commission/presscorner/ detail/da/ip_25_2964

work page 2025
[16]

Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 3816–3830

work page 2021
[17]

Chang Ge, Justine Zhang, Haofei Xu, Yanna Krupnikov, Jenna Bednar, and Sabina Tomkins. 2025. What does the public want their local government to hear? A data-driven case study of public comments across the state of Michigan.Journal of Quantitative Description: Digital Media5 (2025)

work page 2025
[18]

2025.AI Overviews and AI Mode in Search

Google. 2025.AI Overviews and AI Mode in Search. Technical Report. Google. https://search.google/pdf/google-about-AI-overviews-AI- Mode.pdf

work page 2025
[19]

Google. 2025. Puppeteer: Headless Chrome Node.js API. https://github. com/puppeteer/puppeteer. Accessed: 2026-04-25

work page 2025
[20]

Google. 2026. Puppeteer API: page.goto waitUntil Options. https: //pptr.dev/api/puppeteer.puppeteerlifecycleevent. Accessed: 2026-04- 25

work page 2026
[21]

Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, and Karl Aberer. 2018. Polisis: Automated Analysis and Presenta- tion of Privacy Policies Using Deep Learning. arXiv:1802.02561 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Desheng Hu, Joachim Baumann, Aleksandra Urman, Elsa Lichtenegger, Robin Forsberg, Aniko Hannak, and Christo Wilson. 2025. Auditing Google’s AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy.arXiv preprint arXiv:2511.12920(2025)

work page arXiv 2025
[23]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Trans. Inf. Syst.43, 2, Article 42 (Jan. 2025), 55 pages. https://doi.org/ 10.1145/3703155

work page doi:10.1145/3703155 2025
[24]

Umar Iqbal, Peter Snyder, Shitong Zhu, Benjamin Livshits, Zhiyun Qian, and Zubair Shafiq. 2020. AdGraph: A Graph-Based Approach to Ad and Tracker Blocking. InProceedings of the 2020 IEEE Symposium on Security and Privacy (S&P). IEEE, 763–776. https://doi.org/10.1109/ SP40000.2020.00005

work page arXiv 2020
[25]

Klaudia Jaźwińska and Aisvarya Chandrasekar. 2024. How ChatGPT Search (Mis)represents Publisher Content. https://www.cjr.org/tow_ center/how-chatgpt-misrepresents-publisher-content.php 14 Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

work page 2024
[26]

Why Language Models Hallucinate

Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. 2025. Why Language Models Hallucinate. arXiv:2509.04664 [cs.CL] https://arxiv.org/abs/2509.04664

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Mehrzad Khosravi and Hema Yoganarasimhan. 2026. Impact of AI Search Summaries on Website Traffic: Evidence from Google AI Overviews and Wikipedia. arXiv:2602.18455 [cs.CY] https: //arxiv.org/abs/2602.18455

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica

work page
[29]

InProceedings of the 29th symposium on operating systems principles

Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th symposium on operating systems principles. 611–626

work page
[30]

Hause Lin, Jana Lasser, Stephan Lewandowsky, Rocky Cole, Andrew Gully, David G Rand, and Gordon Pennycook. 2023. High level of correspondence across different news domain quality rating sets.PNAS nexus2, 9 (2023), pgad286

work page 2023
[31]

Nelson Liu, Tianyi Zhang, and Percy Liang. 2023. Evaluating Verifiabil- ity in Generative Search Engines. InFindings of the Association for Com- putational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singa- pore, 7001–7025. https://doi.org/10.18653/v1/2023.findings-emnlp.467

work page doi:10.18653/v1/2023.findings-emnlp.467 2023
[32]

Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christo- pher D Manning, and Daniel E Ho. 2025. Hallucination-free? Assessing the reliability of leading AI legal research tools.Journal of empirical legal studies22, 2 (2025), 216–242

work page 2025
[33]

Dasha Metropolitansky and Jonathan Larson. 2025. Towards effec- tive extraction and evaluation of factual claims. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6996–7045

work page 2025
[34]

Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi

work page
[35]

Harnessing the power of large language models for empathetic response generation: Empirical investigations and improvements

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Lin- guistics, Singapore, 12076–12100. https://doi.org/10.18653/v1/2023. emnlp-main.741

work page doi:10.18653/v1/2023 2023
[36]

NPR. 2025. Online News Publishers Face ‘Extinction-Level Event’ from Google’s AI-Powered Search. https://www.npr.org/2025/07/31/nx-s1- 5484118/google-ai-overview-online-publishers

work page 2025
[37]

Department of Justice. 2025. Department of Justice Pre- vails in Landmark Antitrust Case Against Google. https: //www.justice.gov/opa/pr/department-justice-prevails-landmark- antitrust-case-against-google

work page 2025
[38]

Pew Research Center. 2025. Google Users Are Less Likely to Click Links When AI Summaries Appear. https://www.pewresearch.org/ short-reads/2025/07/22/google-users-are-less-likely-to-click-on- links-when-an-ai-summary-appears-in-the-results/

work page 2025
[39]

Sundar Pichai. 2025. Q2 earnings call: CEO’s remarks. https://blog.google/company-news/inside-google/message- ceo/alphabet-earnings-q2-2025/

work page 2025
[40]

Leonard Richardson. 2025. Beautiful Soup. https://www.crummy.com/ software/BeautifulSoup/. Accessed: 2026-04-25

work page 2025
[41]

Robertson, Shan Jiang, Kenneth Joseph, Lisa Friedland, David Lazer, and Christo Wilson

Ronald E. Robertson, Shan Jiang, Kenneth Joseph, Lisa Friedland, David Lazer, and Christo Wilson. 2018. Auditing Partisan Audience Bias within Google Search.Proceedings of the ACM on Human-Computer Interaction2, CSCW, Article 148 (Nov. 2018), 22 pages. https://doi. org/10.1145/3274417

work page doi:10.1145/3274417 2018
[42]

The Poppler Developers. 2026. Poppler: A PDF Rendering Library. https://poppler.freedesktop.org/. Accessed: 2026-04-25

work page 2026
[43]

Yongqi Tong, Dawei Li, Sizhe Wang, Yujia Wang, Fei Teng, and Jingbo Shang. 2024. Can LLMs Learn from Previous Mistakes? Investigating LLMs’ Errors to Boost for Reasoning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for C...

work page 2024
[44]

https://doi.org/10.18653/v1/2024.acl-long.169

work page doi:10.18653/v1/2024.acl-long.169 2024
[45]

2022.{OVRseen }: Auditing network traffic and privacy policies in oculus {VR}

Rahmadi Trimananda, Hieu Le, Hao Cui, Janice Tran Ho, Anastasia Shuba, and Athina Markopoulou. 2022.{OVRseen }: Auditing network traffic and privacy policies in oculus {VR}. In31st USENIX security symposium (USENIX security 22). 3789–3806

work page 2022
[46]

Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Yixin Mao, and Chien-Sheng Wu. 2024. Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses. arXiv:2410.22349 [cs.IR] https://arxiv.org/abs/2410.22349

work page arXiv 2024
[47]

Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online.Science359, 6380 (2018), 1146–1151

work page 2018
[48]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompt- ing elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022
[49]

Kevin Wu, Eric Wu, Kevin Wei, Angela Zhang, Allison Casasola, Teresa Nguyen, Sith Riantawan, Patricia Shi, Daniel Ho, and James Zou

work page
[50]

https://doi.org/10.1038/s41467-025-58551-6

An automated framework for assessing how well LLMs cite relevant medical references.Nature Communications16, 1 (Apr 2025). https://doi.org/10.1038/s41467-025-58551-6

work page doi:10.1038/s41467-025-58551-6 2025
[51]

Yuhao Wu, Evin Jaff, Ke Yang, Ning Zhang, and Umar Iqbal. 2025. An In-Depth Investigation of Data Collection in LLM App Ecosystems. In Proceedings of the 2025 ACM Internet Measurement Conference(USA) (IMC ’25). Association for Computing Machinery, New York, NY, USA, 150–170. https://doi.org/10.1145/3730567.3732912

work page doi:10.1145/3730567.3732912 2025
[52]

xAI. 2025. Grok 4.1 Fast and Agent Tools API. https://x.ai/news/grok- 4-1-fast. Accessed: 2026-04-25

work page 2025
[53]

Yiwei Xu, Saloni Dash, Sungha Kang, Wang Liao, and Emma S. Spiro

work page
[54]

arXiv:2511.22809 [cs.HC] https://arxiv.org/abs/2511.22809

AI summaries in online search influence users’ attitudes. arXiv:2511.22809 [cs.HC] https://arxiv.org/abs/2511.22809

work page arXiv
[55]

Yumo Xu, Peng Qi, Jifan Chen, Kunlun Liu, Rujun Han, Lan Liu, Bo- nan Min, Vittorio Castelli, Arshit Gupta, and Zhiguo Wang. 2025. CiteEval: Principle-Driven Citation Evaluation for Source Attribu- tion. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina...

work page doi:10.18653/v1/2025.acl-long.1574 2025
[56]

Christina Yeung, Umar Iqbal, Yekaterina Tsipenyuk O’Neil, Tadayoshi Kohno, and Franziska Roesner. 2023. Online Advertising in Ukraine and Russia During the 2022 Russian Invasion. InProceedings of the ACM Web Conference 2023 (WWW). ACM, Austin, TX, USA. https: //doi.org/10.1145/3543507.3583484

work page doi:10.1145/3543507.3583484 2023
[57]

Eric Zeng, Tadayoshi Kohno, and Franziska Roesner. 2020. Bad news: Clickbait and deceptive ads on news and misinformation websites. InWorkshop on technology and consumer protection. IEEE Computer Society, 1–11. A ETHICS Our study involves automated web interactions that may have unintended side effects. We outline these considerations and the steps we too...

work page 2020
[58]

Verifiable : it can in principle be checked true or false against evidence

work page
[59]

Specific : it states a concrete fact , event , attribute , relationship , quantity , date , ranking , or action

work page
[60]

Decontex tualized : it is fully understandable on its own , and its meaning in isolation matches its meaning in the AI Overview

work page
[61]

Entailed : if the AI Overview is true , the claim must also be true . Rules :

work page
[62]

Do not use outside knowledge

Extract only claims that are explicitly supported by the AI Overview text . Do not use outside knowledge

work page
[63]

If the text is vague , keep the claim equally vague or omit it

Do not invent or normalize missing details . If the text is vague , keep the claim equally vague or omit it

work page
[64]

If a statement is generic , normative , speculative , promotional , advisory , subjective , or otherwise not specifically verifiable , do not extract it

work page
[65]

If a sentence contains both generic language and one buried specific fact , extract only the specific fact

work page
[66]

If the text says that a person , organization , government body , report , court , source , or expert said , reported , announced , recommended , warned , found , highlighted , or did something , preserve that attribution when it is part of the meaning

work page
[67]

Resolve references when the text clearly supports it : - replace pronouns or shorthand with the fully specified referent when recoverable from nearby context ; - expand partial names only when the full name is present in the AI Overview ; - otherwise leave them unresolved only if the claim is still understandable and faithful

work page
[68]

If a statement has multiple plausible interpretations and the AI Overview does not clearly resolve the ambiguity , do not extract a claim from that ambiguous part

work page
[69]

Split multi - fact sentences into the simplest discrete factual claims that remain natural and useful for fact - checking

work page
[70]

Do not extract duplicate claims or near - duplicates

work page
[71]

Key takeaways

Do not include citations , source names , bullet labels , headings , or formatting artifacts unless they are themselves part of a factual claim . What to omit : - opinions , praise , hype , or value judgments - advice , instructions , recommendations to the reader - vague trend language without a checkable proposition - rhetorical summaries - section head...

work page 2026
[72]

You MUST output exactly one entry per clailaims above

work page
[73]

Use an empty list [] for OMITTED

m a tc h e d_ r e fe r e nc e s should list ALL reference IDs ( R1 , R2 , etc .) that are relevant to the claim . Use an empty list [] for OMITTED

work page
[74]

No relevant content found

evidence should quote or closely paraphrase the specific text from references . Use " No relevant content found " for OMITTED

work page
[75]

confidence reflects how clearly the references support your judgment (1.0 = unambiguous match / contradiction , 0.5 = borderline )

work page
[76]

If the claimand value B to entity Y , but the refere nceassi gns value A to entity Y and value B to entity X , that is INCORRECT -- the values are swapped

CRITICAL : When a claim contains numbers , dates , or attributes paired with specific entities , verify that each value is assigned to the CORRECT entity . If the claimand value B to entity Y , but the refere nceassi gns value A to entity Y and value B to entity X , that is INCORRECT -- the values are swapped . Do not label a claim CLEAR just because the ...

work page
[77]

Only answer with the specified JSON array , no other text . 18

work page