Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources

Mowafak Allaham; Nicholas Diakopoulos

arxiv: 2605.23684 · v1 · pith:C6HRNCYHnew · submitted 2026-05-22 · 💻 cs.IR · cs.CY

Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources

Mowafak Allaham , Nicholas Diakopoulos This is my paper

Pith reviewed 2026-05-25 03:07 UTC · model grok-4.3

classification 💻 cs.IR cs.CY

keywords generative search enginesAI-generated sourcescitation auditsynthetic contentinformation qualitysource domainsChatGPT Copilot Gemini Perplexity

0 comments

The pith

Generative search engines cite AI-generated sources in about 16 percent of cases across politics, health, and environment queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper audits four generative search engines by feeding them 712 real user queries and examining the sources they cite in responses. It reports that roughly 16 percent of those cited sources show signs of being AI-generated, with the pattern appearing in every engine tested. A sympathetic reader would care because the engines present these citations without clear distinction, which could lead users to treat synthetic text as equivalent to material from official or authoritative sites. The work also notes that the engines draw repeatedly from a small number of domains while surfacing many other domains only once. These patterns point to a practical limit in how well current systems can screen out machine-made web content before citing it.

Core claim

An audit of ChatGPT, Copilot, Gemini, and Perplexity on 712 queries spanning politics, health, and the environment found evidence that AI-generated sources appear among the citations in responses from all four engines, accounting for approximately 16 percent of cited sources overall. Certain web domains recur frequently across engines and topics as origins of these sources, while the engines otherwise draw from a long tail of minimally cited domains.

What carries the argument

Citation audit that classifies web sources returned in engine responses as AI-generated or not, applied to real-world queries in three high-stakes domains.

If this is right

Users may receive information drawn from synthetic sources and treat it as equivalent to material from authoritative sources.
Generative search engines surface a narrow set of repeatedly cited domains alongside a large number of minimally cited ones.
Public awareness of these citation patterns can support better-informed use of the engines.
The findings point toward the need for improved source filtering and governance measures in these systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the 16 percent figure holds under stricter detection, it would indicate that AI content is already integrated into the citation layer of these tools at a scale that affects everyday queries.
The concentration on a few recurring domains raises the possibility that certain sites function as high-volume producers of synthetic material that engines continue to surface.
A follow-up audit could test whether the rate changes when queries are rephrased or when engines receive explicit instructions to avoid AI sources.
The pattern suggests a feedback loop in which engines cite AI content that then becomes training or reference material for future generations of the same systems.

Load-bearing premise

The process used to label a cited source as AI-generated produces accurate and consistent results.

What would settle it

Independent re-examination of the same set of cited sources with a different detection method or human review that yields a percentage of AI-generated sources differing by more than five points from the reported 16 percent.

Figures

Figures reproduced from arXiv: 2605.23684 by Mowafak Allaham, Nicholas Diakopoulos.

**Figure 2.** Figure 2: Distribution of AI-generated sources by number of source web domains. [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Concentration and distribution of citations across source web domains. Figure 1(a) illustrates the high proportion [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Prompt template used for to generate text across the categories of “research abstract”, “wiki page”, “reddit post”, [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of Pangram prediction categories. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

The growing accessibility of Large Language Models via conversational interfaces capable of responding to users' questions by drawing on, synthesizing, and citing information from the web (i.e., Generative Search Engines) has simplified the information-seeking process for users. However, with the proliferation of AI-generated content on the web, it is unclear whether these engines can reliably omit citing synthetic sources (i.e., AI-generated sources). Should these engines be unable to do so, this puts users at risk of harm by treating information from AI-generated sources synthesized in responses of generative search engines as equivalent to information from authoritative or official sources. In a step towards identifying whether AI-generated sources are being cited by these engines, this work presents an audit of four generative search engines (ChatGPT, Copilot, Gemini, Perplexity) using a total of 712 real-world human-generated queries spanning domains of public importance: politics, health, and the environment. Our findings show evidence of AI-generated sources being cited across all four generative search engines (~16% of cited sources) and identifies key source web domains these sources belong to that are frequently cited across these engines and topics. In addition, we observed that generative search engines include a somewhat narrow set of repeatedly cited domains while predominantly surfacing a large number of minimally cited domains in responses to users' queries. These findings contribute to the growing body of work on assessing the risks of generative search engines with the objective of increasing public awareness of their limitations and encouraging appropriate measures to improve information quality and governance of these systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The audit measures ~16% AI-generated sources cited across four engines but the classification step has no reported validation or method details.

read the letter

The paper's core result is that four generative search engines (ChatGPT, Copilot, Gemini, Perplexity) cite AI-generated sources in roughly 16% of cases when answering 712 real queries on politics, health, and the environment. It also notes that a small number of domains get repeated citations while most appear only once or twice. This is a direct empirical check on citation behavior rather than a theoretical claim, and the cross-engine, multi-domain design gives a practical snapshot of how these tools handle web sources today. The repeated-domain observation is a useful secondary finding for anyone tracking information concentration in AI responses. The central measurement, however, depends entirely on labeling cited sources as AI-generated. The abstract supplies no information on the detector or criteria used, no validation set, no error rates, and no inter-annotator checks. Given that existing AI-text detectors are known to produce false positives on short or stylized content, the 16% number cannot be evaluated for robustness from what is shown. The domain patterns inherit the same uncertainty. The study is a measurement exercise with no fitted parameters or circular derivations. It is aimed at researchers and practitioners who study information quality and risks in generative search. A reader focused on audit methods or AI content detection would find the query collection and engine comparison worth examining, but the headline percentage needs clearer documentation before it can be treated as reliable. I would bring the paper to a reading group to talk through the classification step. I would not cite the 16% figure in my own work until the methods are specified and checked. The question is timely enough that a serious editor should send it to peer review so referees can assess whether the labeling procedure holds up.

Referee Report

1 major / 0 minor

Summary. The paper audits four generative search engines (ChatGPT, Copilot, Gemini, Perplexity) with 712 real-world queries across politics, health, and environment. It reports evidence that ~16% of cited sources are AI-generated, identifies frequently cited source domains, and observes that engines rely on a narrow set of repeatedly cited domains alongside many minimally cited ones.

Significance. If the source classification is shown to be reliable, the audit would provide a concrete empirical measurement of a key risk in generative search systems, directly informing discussions on information quality, citation trustworthiness, and governance. The use of real user queries across high-stakes domains adds practical relevance to the cs.IR literature on search engine behavior.

major comments (1)

[Methods (source classification procedure)] The central 16% figure and all domain-level patterns rest on the classification of sources as AI-generated, yet the manuscript provides no description of the detector (or criteria), threshold, validation set, error rates, or inter-annotator agreement. Without these, the measurement cannot be distinguished from detector bias or noise, directly undermining the headline claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the constructive feedback. We agree that methodological transparency is essential for the central claim and will revise the manuscript to provide the requested details on source classification.

read point-by-point responses

Referee: [Methods (source classification procedure)] The central 16% figure and all domain-level patterns rest on the classification of sources as AI-generated, yet the manuscript provides no description of the detector (or criteria), threshold, validation set, error rates, or inter-annotator agreement. Without these, the measurement cannot be distinguished from detector bias or noise, directly undermining the headline claim.

Authors: We acknowledge that the current manuscript does not include a description of the source classification procedure. In the revised version we will add a dedicated subsection detailing the detector (or criteria) used to identify AI-generated sources, any thresholds applied, the validation set and its construction, reported error rates, and inter-annotator agreement statistics. This addition will allow readers to evaluate the reliability of the ~16% figure and the domain-level patterns. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical audit with direct source classification

full rationale

The paper conducts an empirical audit by issuing 712 queries to four generative search engines, collecting cited sources, and classifying a subset as AI-generated (~16%). No equations, derivations, parameters, or fitted models appear in the abstract or described methodology. The central claim rests on direct observation and labeling rather than any reduction to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. Classification accuracy is a separate methodological concern (unvalidated detector details) but does not constitute circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical audit study with no mathematical model. No free parameters, axioms, or invented entities are introduced. The claim depends on the validity of the (undescribed) source classification procedure.

pith-pipeline@v0.9.0 · 5810 in / 993 out tokens · 33467 ms · 2026-05-25T03:07:39.474620+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 3 internal anchors

[1]

Anticipating Impacts: Using Large-Scale Scenario Writing to Explore Diverse Implications of Generative

work page
[2]

2024 , title =

Kieslich, Kimon and Helberger, Natali and Diakopoulos, Nicholas , journal =. 2024 , title =. doi:10.1145/3630106.3659026 , pages =

work page doi:10.1145/3630106.3659026 2024
[3]

2024 , title =

Nishal, Sachita and Diakopoulos, Nicholas , journal =. 2024 , title =. doi:10.48550/arxiv.2402.18835 , eprint =

work page doi:10.48550/arxiv.2402.18835 2024
[4]

2025 , title =

Zhang, Peixian and Ye, Qiming and Peng, Zifan and Garimella, Kiran and Tyson, Gareth , journal =. 2025 , title =. doi:10.48550/arxiv.2512.09483 , eprint =

work page doi:10.48550/arxiv.2512.09483 2025
[5]

2025 , title =

Russell, Jenna and Karpinska, Marzena and Akinode, Destiny and Thai, Katherine and Emi, Bradley and Spero, Max and Iyyer, Mohit , journal =. 2025 , title =

work page 2025
[6]

arXiv preprint arXiv:2410.22349 , year=

Search engines in an ai era: The false promise of factual and verifiable source-cited responses , author=. arXiv preprint arXiv:2410.22349 , year=

work page arXiv
[7]

Proceedings of the Association for Information Science and Technology , volume=

Generative ai search engines as arbiters of public knowledge: An audit of bias and authority , author=. Proceedings of the Association for Information Science and Technology , volume=. 2024 , publisher=

work page 2024
[8]

arXiv preprint arXiv:2507.05301 , year=

News source citing patterns in ai search systems , author=. arXiv preprint arXiv:2507.05301 , year=

work page arXiv
[9]

The News with ChatGPT: An Audit and Survey Experiment on the Effects of GPT-Enabled News Search on User Attitudes , author=

work page
[10]

They're All Bad at Citing News , author =

AI Search Has a Citation Problem: We Compared Eight AI Search Engines. They're All Bad at Citing News , author =. 2025 , howpublished =

work page 2025
[11]

arXiv preprint arXiv:2304.09848 , year=

Evaluating verifiability in generative search engines , author=. arXiv preprint arXiv:2304.09848 , year=

work page arXiv
[12]

arXiv preprint arXiv:2508.00838 , year=

The Attribution Crisis in LLM Search Results , author=. arXiv preprint arXiv:2508.00838 , year=

work page arXiv
[13]

2025 , month = jul, day =

Athena Chapekis and Anna Lieb , title =. 2025 , month = jul, day =

work page 2025
[14]

2022 , month = dec, url =

David Rozado , title =. 2022 , month = dec, url =

work page 2022
[15]

Foundations and Trends

Auditing algorithms: Understanding algorithmic systems from the outside in , author=. Foundations and Trends. 2021 , publisher=

work page 2021
[16]

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

Generative echo chamber? effect of llm-powered search systems on diverse information seeking , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

work page 2024
[17]

arXiv preprint arXiv:2404.07981 , year=

Manipulating large language models to increase product visibility , author=. arXiv preprint arXiv:2404.07981 , year=

work page arXiv
[18]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Media source matters more than content: Unveiling political bias in llm-generated citations , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025
[19]

Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web , pages=

Global Claims: A Multilingual Dataset of Fact-Checked Claims with Veracity, Topic, and Salience Annotations , author=. Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web , pages=

work page
[20]

2025 , month = oct, url =

News Integrity in AI Assistants: An International PSM Study , author =. 2025 , month = oct, url =

work page 2025
[21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Cram: Credibility-aware attention modification in llms for combating misinformation in rag , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[22]

2025 , organization =

Tracking AI-enabled Misinformation:. 2025 , organization =

work page 2025
[23]

2026 , month =

Gregory, Andrew , title =. 2026 , month =

work page 2026
[24]

arXiv preprint arXiv:2510.27489 , year=

Auditing LLM Editorial Bias in News Media Exposure , author=. arXiv preprint arXiv:2510.27489 , year=

work page arXiv
[25]

Proceedings of the Association for Information Science and Technology , volume=

Bing chat: The future of search engines? , author=. Proceedings of the Association for Information Science and Technology , volume=. 2023 , publisher=

work page 2023
[26]

Telematics and Informatics , volume=

The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat , author=. Telematics and Informatics , volume=. 2025 , publisher=

work page 2025
[27]

The Symposium on Future Directions in Information Access (FDIA) co-located with the 2023 European Summer School on Information Retrieval (ESSIR) , year=

Examining query sentiment bias effects on search results in large language models , author=. The Symposium on Future Directions in Information Access (FDIA) co-located with the 2023 European Summer School on Information Retrieval (ESSIR) , year=

work page 2023
[28]

new media & society , pages=

AI chatbot accountability in the age of algorithmic gatekeeping: Comparing generative search engine political information retrieval across five languages , author=. new media & society , pages=. 2025 , publisher=

work page 2025
[29]

arXiv preprint arXiv:2502.04951 , year=

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search , author=. arXiv preprint arXiv:2502.04951 , year=

work page arXiv
[30]

Telecommunications Policy , pages=

Sourcing behavior and the role of news media in AI-powered search engines in the digital media ecosystem: Comparing political news retrieval across five languages , author=. Telecommunications Policy , pages=. 2025 , publisher=

work page 2025
[31]

Roy, Jean-Hugues , journal =. I used. 2025 , url =

work page 2025
[32]

ACM computing surveys , volume=

Survey of hallucination in natural language generation , author=. ACM computing surveys , volume=. 2023 , publisher=

work page 2023
[33]

Big Data & Society , volume=

The chat-chamber effect: Trusting the AI hallucination , author=. Big Data & Society , volume=. 2025 , publisher=

work page 2025
[34]

arXiv preprint arXiv:2402.11707 , year=

Search engines post-ChatGPT: How generative artificial intelligence could make search less reliable , author=. arXiv preprint arXiv:2402.11707 , year=

work page arXiv
[35]

arXiv preprint arXiv:2404.07461 , year=

An Audit on the Perspectives and Challenges of Hallucinations in NLP , author=. arXiv preprint arXiv:2404.07461 , year=

work page arXiv
[36]

Engadget , url =

Your Google News feed is likely filled with AI-generated articles , year =. Engadget , url =

work page
[37]

The Decoder , url =

Matthias Bastian , title =. The Decoder , url =. 2023 , month = dec, day =

work page 2023
[38]

arXiv preprint arXiv:2402.04607 , year=

Google Scholar is manipulatable , author=. arXiv preprint arXiv:2402.04607 , year=

work page arXiv
[39]

Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=

On the dangers of stochastic parrots: Can language models be too big? , author=. Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=

work page 2021
[40]

AI use in American newspapers is widespread, uneven, and rarely disclosed

AI use in American newspapers is widespread, uneven, and rarely disclosed , author=. arXiv preprint arXiv:2510.18774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[41]

arXiv preprint arXiv:2509.19163 , year=

Measuring AI" Slop" in Text , author=. arXiv preprint arXiv:2509.19163 , year=

work page arXiv
[42]

arXiv preprint arXiv:2402.14873 , year=

Technical report on the pangram ai-generated text classifier , author=. arXiv preprint arXiv:2402.14873 , year=

work page arXiv
[43]

ACM Transactions on Information Systems , volume=

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

work page 2025
[44]

TechPolicy.Press , year =

Varsha Bansal , title =. TechPolicy.Press , year =

work page
[45]

arXiv preprint arXiv:2401.06730 , year=

Relying on the unreliable: The impact of language models' reluctance to express uncertainty , author=. arXiv preprint arXiv:2401.06730 , year=

work page arXiv
[46]

manipulation , author=

Nudges to mitigate confirmation bias during web search on debated topics: Support vs. manipulation , author=. ACM Transactions on the Web , volume=. 2024 , publisher=

work page 2024
[47]

2017 , publisher=

Search and politics: The uses and impacts of search in Britain, France, Germany, Italy, Poland, Spain, and the United States , author=. 2017 , publisher=

work page 2017
[48]

Proceedings of the 2018 conference on human information interaction & retrieval , pages=

Searching as learning: Exploring search behavior and learning outcomes in learning-related tasks , author=. Proceedings of the 2018 conference on human information interaction & retrieval , pages=

work page 2018
[49]

Proceedings of the 27th ACM international conference on information and knowledge management , pages=

Contrasting search as a learning activity with instructor-designed learning , author=. Proceedings of the 27th ACM international conference on information and knowledge management , pages=

work page
[50]

Communications of the ACM , volume=

Exploratory search: from finding to understanding , author=. Communications of the ACM , volume=. 2006 , publisher=

work page 2006
[51]

Annual review of public health , volume=

Public health and online misinformation: challenges and recommendations , author=. Annual review of public health , volume=. 2020 , publisher=

work page 2020
[52]

BMJ open , volume=

Google search histories of patients presenting to an emergency department: an observational study , author=. BMJ open , volume=. 2019 , publisher=

work page 2019
[53]

Journal of marketing research , volume=

What makes online content viral? , author=. Journal of marketing research , volume=. 2012 , publisher=

work page 2012
[54]

Proceedings of the 2022 Conference on Human Information Interaction and Retrieval , pages=

Featured snippets and their influence on users’ credibility judgements , author=. Proceedings of the 2022 Conference on Human Information Interaction and Retrieval , pages=

work page 2022
[55]

Human--Computer Interaction , volume=

SNIF-ACT: A cognitive model of user navigation on the World Wide Web , author=. Human--Computer Interaction , volume=. 2007 , publisher=

work page 2007
[56]

Journal of broadcasting & electronic media , volume=

Uses and grats 2.0: New gratifications for new media , author=. Journal of broadcasting & electronic media , volume=. 2013 , publisher=

work page 2013
[57]

Proceedings of the 22nd international conference on World Wide Web , pages=

Measuring personalization of web search , author=. Proceedings of the 22nd international conference on World Wide Web , pages=

work page
[58]

International Journal of Knowledge Society Research (IJKSR) , volume=

In search we trust: exploring how search engines are shaping society , author=. International Journal of Knowledge Society Research (IJKSR) , volume=. 2014 , publisher=

work page 2014
[59]

Proceedings of the 2019 CHI Conference on human factors in computing systems , pages=

Search as news curator: The role of Google in shaping attention to news information , author=. Proceedings of the 2019 CHI Conference on human factors in computing systems , pages=

work page 2019
[60]

Nature , volume=

Online searches to evaluate misinformation can increase its perceived veracity , author=. Nature , volume=. 2024 , publisher=

work page 2024
[61]

arXiv preprint arXiv:2501.13802 , year=

Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change , author=. arXiv preprint arXiv:2501.13802 , year=

work page arXiv
[62]

Harvard Kennedy School Misinformation Review , year=

LLMs grooming or data voids? LLM-powered chatbot references to Kremlin disinformation reflect information gaps, not manipulation , author=. Harvard Kennedy School Misinformation Review , year=

work page
[63]

NASIG Proceedings , volume=

Data Voids and Echo Chambers: The Transformative Journey of Search and Its Consequences , author=. NASIG Proceedings , volume=

work page
[64]

Tages-Anzeiger , year =

Zihlmann, Oliver and Euchner, Celina , title =. Tages-Anzeiger , year =

work page
[65]

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , volume=

Informing AI Risk Assessment with News Media: Analyzing National and Political Variation in the Coverage of AI Risks , author=. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , volume=

work page
[66]

arXiv preprint arXiv:2306.05949 , year=

Evaluating the social impact of generative ai systems in systems and society , author=. arXiv preprint arXiv:2306.05949 , year=

work page arXiv
[67]

New Media & Society , volume=

Impact of misinformation from generative AI on user information processing: How people understand misinformation from generative AI , author=. New Media & Society , volume=. 2025 , publisher=

work page 2025
[68]

2019 , publisher=

Invisible search and online search engines: The ubiquity of search in everyday life , author=. 2019 , publisher=

work page 2019
[69]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[70]

2025 , month = feb, howpublished =

Arena Explorer: A Topic Modeling Pipeline for LLM Evals & Analytics , author =. 2025 , month = feb, howpublished =

work page 2025
[71]

The Information Society , volume=

Searching for politics: Using real-world web search behavior and surveys to see political information searching in context , author=. The Information Society , volume=. 2023 , publisher=

work page 2023
[72]

Healthcare , volume=

Online health information seeking behavior: a systematic review , author=. Healthcare , volume=. 2021 , organization=

work page 2021
[73]

arXiv preprint arXiv:2504.11373 , year=

Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions , author=. arXiv preprint arXiv:2504.11373 , year=

work page arXiv
[74]

arXiv preprint arXiv:2403.14709 , year=

ClimateQ&A: Bridging the gap between climate scientists and the general public , author=. arXiv preprint arXiv:2403.14709 , year=

work page arXiv
[75]

2025 , publisher=

ChatGPT as a news recommender system: Measuring source types and diversity across different interfaces , author=. 2025 , publisher=

work page 2025
[76]

Playwright: Fast and reliable end-to-end testing for modern web apps , year =

work page
[77]

2018 , note =

Newspaper3k: Article scraping & curation — Documentation , author =. 2018 , note =

work page 2018
[78]

2025 , month =

Ben Paviour , title =. 2025 , month =

work page 2025
[79]

Colorado College Publication , series =

Gini, Corrado , title =. Colorado College Publication , series =

work page
[80]

arXiv preprint arXiv:2506.05334 , year=

Search Arena: Analyzing Search-Augmented LLMs , author=. arXiv preprint arXiv:2506.05334 , year=

work page arXiv

Showing first 80 references.

[1] [1]

Anticipating Impacts: Using Large-Scale Scenario Writing to Explore Diverse Implications of Generative

work page

[2] [2]

2024 , title =

Kieslich, Kimon and Helberger, Natali and Diakopoulos, Nicholas , journal =. 2024 , title =. doi:10.1145/3630106.3659026 , pages =

work page doi:10.1145/3630106.3659026 2024

[3] [3]

2024 , title =

Nishal, Sachita and Diakopoulos, Nicholas , journal =. 2024 , title =. doi:10.48550/arxiv.2402.18835 , eprint =

work page doi:10.48550/arxiv.2402.18835 2024

[4] [4]

2025 , title =

Zhang, Peixian and Ye, Qiming and Peng, Zifan and Garimella, Kiran and Tyson, Gareth , journal =. 2025 , title =. doi:10.48550/arxiv.2512.09483 , eprint =

work page doi:10.48550/arxiv.2512.09483 2025

[5] [5]

2025 , title =

Russell, Jenna and Karpinska, Marzena and Akinode, Destiny and Thai, Katherine and Emi, Bradley and Spero, Max and Iyyer, Mohit , journal =. 2025 , title =

work page 2025

[6] [6]

arXiv preprint arXiv:2410.22349 , year=

Search engines in an ai era: The false promise of factual and verifiable source-cited responses , author=. arXiv preprint arXiv:2410.22349 , year=

work page arXiv

[7] [7]

Proceedings of the Association for Information Science and Technology , volume=

Generative ai search engines as arbiters of public knowledge: An audit of bias and authority , author=. Proceedings of the Association for Information Science and Technology , volume=. 2024 , publisher=

work page 2024

[8] [8]

arXiv preprint arXiv:2507.05301 , year=

News source citing patterns in ai search systems , author=. arXiv preprint arXiv:2507.05301 , year=

work page arXiv

[9] [9]

The News with ChatGPT: An Audit and Survey Experiment on the Effects of GPT-Enabled News Search on User Attitudes , author=

work page

[10] [10]

They're All Bad at Citing News , author =

AI Search Has a Citation Problem: We Compared Eight AI Search Engines. They're All Bad at Citing News , author =. 2025 , howpublished =

work page 2025

[11] [11]

arXiv preprint arXiv:2304.09848 , year=

Evaluating verifiability in generative search engines , author=. arXiv preprint arXiv:2304.09848 , year=

work page arXiv

[12] [12]

arXiv preprint arXiv:2508.00838 , year=

The Attribution Crisis in LLM Search Results , author=. arXiv preprint arXiv:2508.00838 , year=

work page arXiv

[13] [13]

2025 , month = jul, day =

Athena Chapekis and Anna Lieb , title =. 2025 , month = jul, day =

work page 2025

[14] [14]

2022 , month = dec, url =

David Rozado , title =. 2022 , month = dec, url =

work page 2022

[15] [15]

Foundations and Trends

Auditing algorithms: Understanding algorithmic systems from the outside in , author=. Foundations and Trends. 2021 , publisher=

work page 2021

[16] [16]

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

Generative echo chamber? effect of llm-powered search systems on diverse information seeking , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

work page 2024

[17] [17]

arXiv preprint arXiv:2404.07981 , year=

Manipulating large language models to increase product visibility , author=. arXiv preprint arXiv:2404.07981 , year=

work page arXiv

[18] [18]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Media source matters more than content: Unveiling political bias in llm-generated citations , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025

[19] [19]

Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web , pages=

Global Claims: A Multilingual Dataset of Fact-Checked Claims with Veracity, Topic, and Salience Annotations , author=. Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web , pages=

work page

[20] [20]

2025 , month = oct, url =

News Integrity in AI Assistants: An International PSM Study , author =. 2025 , month = oct, url =

work page 2025

[21] [21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Cram: Credibility-aware attention modification in llms for combating misinformation in rag , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[22] [22]

2025 , organization =

Tracking AI-enabled Misinformation:. 2025 , organization =

work page 2025

[23] [23]

2026 , month =

Gregory, Andrew , title =. 2026 , month =

work page 2026

[24] [24]

arXiv preprint arXiv:2510.27489 , year=

Auditing LLM Editorial Bias in News Media Exposure , author=. arXiv preprint arXiv:2510.27489 , year=

work page arXiv

[25] [25]

Proceedings of the Association for Information Science and Technology , volume=

Bing chat: The future of search engines? , author=. Proceedings of the Association for Information Science and Technology , volume=. 2023 , publisher=

work page 2023

[26] [26]

Telematics and Informatics , volume=

The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat , author=. Telematics and Informatics , volume=. 2025 , publisher=

work page 2025

[27] [27]

The Symposium on Future Directions in Information Access (FDIA) co-located with the 2023 European Summer School on Information Retrieval (ESSIR) , year=

Examining query sentiment bias effects on search results in large language models , author=. The Symposium on Future Directions in Information Access (FDIA) co-located with the 2023 European Summer School on Information Retrieval (ESSIR) , year=

work page 2023

[28] [28]

new media & society , pages=

AI chatbot accountability in the age of algorithmic gatekeeping: Comparing generative search engine political information retrieval across five languages , author=. new media & society , pages=. 2025 , publisher=

work page 2025

[29] [29]

arXiv preprint arXiv:2502.04951 , year=

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search , author=. arXiv preprint arXiv:2502.04951 , year=

work page arXiv

[30] [30]

Telecommunications Policy , pages=

Sourcing behavior and the role of news media in AI-powered search engines in the digital media ecosystem: Comparing political news retrieval across five languages , author=. Telecommunications Policy , pages=. 2025 , publisher=

work page 2025

[31] [31]

Roy, Jean-Hugues , journal =. I used. 2025 , url =

work page 2025

[32] [32]

ACM computing surveys , volume=

Survey of hallucination in natural language generation , author=. ACM computing surveys , volume=. 2023 , publisher=

work page 2023

[33] [33]

Big Data & Society , volume=

The chat-chamber effect: Trusting the AI hallucination , author=. Big Data & Society , volume=. 2025 , publisher=

work page 2025

[34] [34]

arXiv preprint arXiv:2402.11707 , year=

Search engines post-ChatGPT: How generative artificial intelligence could make search less reliable , author=. arXiv preprint arXiv:2402.11707 , year=

work page arXiv

[35] [35]

arXiv preprint arXiv:2404.07461 , year=

An Audit on the Perspectives and Challenges of Hallucinations in NLP , author=. arXiv preprint arXiv:2404.07461 , year=

work page arXiv

[36] [36]

Engadget , url =

Your Google News feed is likely filled with AI-generated articles , year =. Engadget , url =

work page

[37] [37]

The Decoder , url =

Matthias Bastian , title =. The Decoder , url =. 2023 , month = dec, day =

work page 2023

[38] [38]

arXiv preprint arXiv:2402.04607 , year=

Google Scholar is manipulatable , author=. arXiv preprint arXiv:2402.04607 , year=

work page arXiv

[39] [39]

Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=

On the dangers of stochastic parrots: Can language models be too big? , author=. Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=

work page 2021

[40] [40]

AI use in American newspapers is widespread, uneven, and rarely disclosed

AI use in American newspapers is widespread, uneven, and rarely disclosed , author=. arXiv preprint arXiv:2510.18774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[41] [41]

arXiv preprint arXiv:2509.19163 , year=

Measuring AI" Slop" in Text , author=. arXiv preprint arXiv:2509.19163 , year=

work page arXiv

[42] [42]

arXiv preprint arXiv:2402.14873 , year=

Technical report on the pangram ai-generated text classifier , author=. arXiv preprint arXiv:2402.14873 , year=

work page arXiv

[43] [43]

ACM Transactions on Information Systems , volume=

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

work page 2025

[44] [44]

TechPolicy.Press , year =

Varsha Bansal , title =. TechPolicy.Press , year =

work page

[45] [45]

arXiv preprint arXiv:2401.06730 , year=

Relying on the unreliable: The impact of language models' reluctance to express uncertainty , author=. arXiv preprint arXiv:2401.06730 , year=

work page arXiv

[46] [46]

manipulation , author=

Nudges to mitigate confirmation bias during web search on debated topics: Support vs. manipulation , author=. ACM Transactions on the Web , volume=. 2024 , publisher=

work page 2024

[47] [47]

2017 , publisher=

Search and politics: The uses and impacts of search in Britain, France, Germany, Italy, Poland, Spain, and the United States , author=. 2017 , publisher=

work page 2017

[48] [48]

Proceedings of the 2018 conference on human information interaction & retrieval , pages=

Searching as learning: Exploring search behavior and learning outcomes in learning-related tasks , author=. Proceedings of the 2018 conference on human information interaction & retrieval , pages=

work page 2018

[49] [49]

Proceedings of the 27th ACM international conference on information and knowledge management , pages=

Contrasting search as a learning activity with instructor-designed learning , author=. Proceedings of the 27th ACM international conference on information and knowledge management , pages=

work page

[50] [50]

Communications of the ACM , volume=

Exploratory search: from finding to understanding , author=. Communications of the ACM , volume=. 2006 , publisher=

work page 2006

[51] [51]

Annual review of public health , volume=

Public health and online misinformation: challenges and recommendations , author=. Annual review of public health , volume=. 2020 , publisher=

work page 2020

[52] [52]

BMJ open , volume=

Google search histories of patients presenting to an emergency department: an observational study , author=. BMJ open , volume=. 2019 , publisher=

work page 2019

[53] [53]

Journal of marketing research , volume=

What makes online content viral? , author=. Journal of marketing research , volume=. 2012 , publisher=

work page 2012

[54] [54]

Proceedings of the 2022 Conference on Human Information Interaction and Retrieval , pages=

Featured snippets and their influence on users’ credibility judgements , author=. Proceedings of the 2022 Conference on Human Information Interaction and Retrieval , pages=

work page 2022

[55] [55]

Human--Computer Interaction , volume=

SNIF-ACT: A cognitive model of user navigation on the World Wide Web , author=. Human--Computer Interaction , volume=. 2007 , publisher=

work page 2007

[56] [56]

Journal of broadcasting & electronic media , volume=

Uses and grats 2.0: New gratifications for new media , author=. Journal of broadcasting & electronic media , volume=. 2013 , publisher=

work page 2013

[57] [57]

Proceedings of the 22nd international conference on World Wide Web , pages=

Measuring personalization of web search , author=. Proceedings of the 22nd international conference on World Wide Web , pages=

work page

[58] [58]

International Journal of Knowledge Society Research (IJKSR) , volume=

In search we trust: exploring how search engines are shaping society , author=. International Journal of Knowledge Society Research (IJKSR) , volume=. 2014 , publisher=

work page 2014

[59] [59]

Proceedings of the 2019 CHI Conference on human factors in computing systems , pages=

Search as news curator: The role of Google in shaping attention to news information , author=. Proceedings of the 2019 CHI Conference on human factors in computing systems , pages=

work page 2019

[60] [60]

Nature , volume=

Online searches to evaluate misinformation can increase its perceived veracity , author=. Nature , volume=. 2024 , publisher=

work page 2024

[61] [61]

arXiv preprint arXiv:2501.13802 , year=

Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change , author=. arXiv preprint arXiv:2501.13802 , year=

work page arXiv

[62] [62]

Harvard Kennedy School Misinformation Review , year=

LLMs grooming or data voids? LLM-powered chatbot references to Kremlin disinformation reflect information gaps, not manipulation , author=. Harvard Kennedy School Misinformation Review , year=

work page

[63] [63]

NASIG Proceedings , volume=

Data Voids and Echo Chambers: The Transformative Journey of Search and Its Consequences , author=. NASIG Proceedings , volume=

work page

[64] [64]

Tages-Anzeiger , year =

Zihlmann, Oliver and Euchner, Celina , title =. Tages-Anzeiger , year =

work page

[65] [65]

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , volume=

Informing AI Risk Assessment with News Media: Analyzing National and Political Variation in the Coverage of AI Risks , author=. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , volume=

work page

[66] [66]

arXiv preprint arXiv:2306.05949 , year=

Evaluating the social impact of generative ai systems in systems and society , author=. arXiv preprint arXiv:2306.05949 , year=

work page arXiv

[67] [67]

New Media & Society , volume=

Impact of misinformation from generative AI on user information processing: How people understand misinformation from generative AI , author=. New Media & Society , volume=. 2025 , publisher=

work page 2025

[68] [68]

2019 , publisher=

Invisible search and online search engines: The ubiquity of search in everyday life , author=. 2019 , publisher=

work page 2019

[69] [69]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[70] [70]

2025 , month = feb, howpublished =

Arena Explorer: A Topic Modeling Pipeline for LLM Evals & Analytics , author =. 2025 , month = feb, howpublished =

work page 2025

[71] [71]

The Information Society , volume=

Searching for politics: Using real-world web search behavior and surveys to see political information searching in context , author=. The Information Society , volume=. 2023 , publisher=

work page 2023

[72] [72]

Healthcare , volume=

Online health information seeking behavior: a systematic review , author=. Healthcare , volume=. 2021 , organization=

work page 2021

[73] [73]

arXiv preprint arXiv:2504.11373 , year=

Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions , author=. arXiv preprint arXiv:2504.11373 , year=

work page arXiv

[74] [74]

arXiv preprint arXiv:2403.14709 , year=

ClimateQ&A: Bridging the gap between climate scientists and the general public , author=. arXiv preprint arXiv:2403.14709 , year=

work page arXiv

[75] [75]

2025 , publisher=

ChatGPT as a news recommender system: Measuring source types and diversity across different interfaces , author=. 2025 , publisher=

work page 2025

[76] [76]

Playwright: Fast and reliable end-to-end testing for modern web apps , year =

work page

[77] [77]

2018 , note =

Newspaper3k: Article scraping & curation — Documentation , author =. 2018 , note =

work page 2018

[78] [78]

2025 , month =

Ben Paviour , title =. 2025 , month =

work page 2025

[79] [79]

Colorado College Publication , series =

Gini, Corrado , title =. Colorado College Publication , series =

work page

[80] [80]

arXiv preprint arXiv:2506.05334 , year=

Search Arena: Analyzing Search-Augmented LLMs , author=. arXiv preprint arXiv:2506.05334 , year=

work page arXiv