arxiv: 2604.10258 · v1 · submitted 2026-04-11 · 💻 cs.HC

Recognition: unknown

From Searchable to Non-Searchable: Generative AI and Information Diversity in Online Information Seeking

Yulin Yu , Yizhou Li , Siddharth Suri , Scott Counts

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:42 UTC · model grok-4.3

classification 💻 cs.HC

keywords generative AIinformation diversityonline searchChatGPTuser queriesfeedback loopknowledge seeking

0 comments

The pith

ChatGPT users pose mostly non-searchable questions that reach broader topics, but AI answers to search-like questions show less variety than Google results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines more than 200,000 real ChatGPT conversations to test whether generative AI expands or limits the range of information people encounter during online inquiry. It separates user questions into those that traditional search engines could plausibly answer and those that could not. Nearly 80 percent fall into the non-searchable category and cover a wider set of topics. When questions match what people would ask a search engine, the AI responses contain less diverse information than Google results across most topics. The variety in the AI output also forecasts how diverse the user's next questions will be, pointing to a loop in which the system influences future human exploration.

Core claim

Analysis of over 200,000 human-ChatGPT interactions shows that 80 percent of user queries are non-searchable and span broader knowledge spaces and topics than searchable queries. For comparable searchable queries, AI responses are less diverse than Google search results in the majority of topics. The diversity of AI responses further predicts subsequent changes in users' inquiry diversity, indicating a feedback loop between AI outputs and human exploration.

What carries the argument

The classification of user queries as searchable versus non-searchable, defined by whether the query could plausibly be answered by a traditional search engine, which separates expanded conversational inquiry from conventional search behavior.

If this is right

Users shift toward more open-ended inquiry that lies outside traditional search.
AI systems deliver narrower information sets than search engines for standard topics.
Changes in the diversity of AI answers can drive corresponding shifts in the diversity of users' follow-up questions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid designs that route some queries to search engines and others to generative models could preserve breadth while retaining conversational support.
Sustained low diversity in AI answers might gradually narrow the range of topics users choose to explore over time.
Platforms could test explicit prompts or post-processing steps that increase response variety to counteract the observed feedback effect.

Load-bearing premise

Queries can be classified reliably and without bias into searchable and non-searchable categories, and the chosen diversity measures accurately capture meaningful differences in the information users actually encounter.

What would settle it

A study that applies a different, independent method to label queries as searchable or non-searchable and finds no reduction in diversity for AI responses compared with search results, or no link between AI response diversity and later user query changes.

Figures

Figures reproduced from arXiv: 2604.10258 by Scott Counts, Siddharth Suri, Yizhou Li, Yulin Yu.

**Figure 1.** Figure 1: Diversity, topical distribution, and temporal trends in user input searchability on ChatGPT. (A) User inputs are [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Diversity of ChatGPT answers versus Google Search results for the same searchable inputs. (A) Responses from [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: User input diversity over the course of the conversation. (A) Mean input diversity across conversation turns in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Conversational generative AI systems such as ChatGPT are transforming how people seek and engage with information online. Unlike traditional search engines, these systems support open-ended, conversational inquiry, yet it remains unclear whether they ultimately expand or constrain the diversity of knowledge that users encounter in online search spaces, a primary foundation for knowledge work, learning, and innovation. Using over 200,000 real-world human-ChatGPT interactions, we examine how generative-AI-mediated inquiry reshapes diversity in both user inputs and system outputs through the lens of searchability - whether queries could plausibly be answered by traditional search engines. We find that almost 80% of ChatGPT user queries are non-searchable and span a broader knowledge space and topics than searchable queries, indicating expanded modes of inquiry. However, for comparable searchable queries, AI responses are less diverse than Google search results in the majority of topics. Moreover, the diversity of AI responses predicts subsequent changes in users' inquiry diversity, revealing a feedback loop between AI outputs and human exploration. These findings highlight a tension between expanded inquiry and constrained information exposure, with implications for designing hybrid search and generative-AI systems that better support exploratory knowledge seeking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses real ChatGPT logs to flag that most queries fall outside traditional search and that AI response diversity shapes what users ask next, but the whole story rests on an unvalidated searchable/non-searchable split.

read the letter

This paper's main point is straightforward: in 200k real ChatGPT sessions, roughly 80% of queries are the kind that wouldn't work well in a search engine, they cover more topics, and the AI's answers for the remaining searchable queries tend to be less varied than Google's. They also report that lower diversity in the AI reply predicts narrower follow-up questions from users, pointing to a feedback loop.

Referee Report

2 major / 1 minor

Summary. The paper analyzes over 200,000 real-world ChatGPT interactions to study how generative AI reshapes information diversity in online seeking. Queries are classified as searchable (plausibly answerable by traditional search engines) or non-searchable; the work reports that ~80% are non-searchable and span broader topics, that AI responses to comparable searchable queries are less diverse than Google results in most topics, and that AI response diversity predicts subsequent shifts in user inquiry diversity, indicating a feedback loop.

Significance. If robust, the findings illuminate a key tension in AI-mediated information seeking: expanded modes of inquiry alongside reduced diversity in outputs, with direct implications for hybrid search-AI system design. The large-scale observational dataset from actual user interactions is a notable strength, lending empirical weight to the feedback-loop prediction without relying on self-referential modeling.

major comments (2)

The binary searchability classifier (used to derive the headline ~80% non-searchable statistic and all subsequent comparisons) lacks reported validation such as inter-annotator agreement, human-rater benchmarks, or sensitivity analysis to threshold or prompt variations. This classification is load-bearing for the central claims about expanded inquiry and diversity differences; without it, the directional results cannot be confidently separated from classification artifacts.
Diversity metrics (topic breadth, overlap with search results) and the regression linking AI response diversity to later user inquiry changes are described at a high level only, with no explicit parameter settings, statistical controls, error bars, or robustness checks against post-hoc topic selection. These omissions directly affect evaluation of the 'less diverse' and feedback-loop findings.

minor comments (1)

The abstract introduces 'searchability' without a concise operational definition; moving a brief clause from the Methods to the abstract would improve immediate clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the methodological transparency and robustness of our findings.

read point-by-point responses

Referee: The binary searchability classifier (used to derive the headline ~80% non-searchable statistic and all subsequent comparisons) lacks reported validation such as inter-annotator agreement, human-rater benchmarks, or sensitivity analysis to threshold or prompt variations. This classification is load-bearing for the central claims about expanded inquiry and diversity differences; without it, the directional results cannot be confidently separated from classification artifacts.

Authors: We agree that the current manuscript does not provide sufficient validation details for the searchability classifier, which is central to our claims. The classification relied on a structured prompt to GPT-4, but we did not report inter-annotator agreement or sensitivity checks. In the revised version, we will add a new subsection detailing validation: inter-annotator agreement (Cohen's kappa) from two human raters on a random sample of 500 queries, benchmarks against manual classification of a held-out set, and sensitivity analyses across prompt variations and decision thresholds confirming stability of the ~80% non-searchable proportion and downstream diversity comparisons. revision: yes
Referee: Diversity metrics (topic breadth, overlap with search results) and the regression linking AI response diversity to later user inquiry changes are described at a high level only, with no explicit parameter settings, statistical controls, error bars, or robustness checks against post-hoc topic selection. These omissions directly affect evaluation of the 'less diverse' and feedback-loop findings.

Authors: We concur that the manuscript presents the diversity metrics and regression at an insufficient level of detail. We will substantially expand the Methods section to specify: exact operationalizations (topic breadth via entropy over LDA-derived topics; overlap via Jaccard similarity to Google results), all parameter settings (e.g., number of topics, LDA hyperparameters), the complete regression model including controls and fixed effects, error bars or confidence intervals on all reported figures, and robustness checks such as alternative topic models (e.g., NMF), varying topic counts, and re-estimation after excluding high-influence topics to address post-hoc selection concerns. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical classification and measurements are independent of reported outcomes.

full rationale

The paper conducts an observational analysis of over 200,000 real-world ChatGPT interactions. Queries are classified as searchable or non-searchable according to an external criterion (whether they could plausibly be answered by traditional search engines), after which topic breadth, knowledge space coverage, and overlap with Google results are measured separately. The ~80% non-searchable statistic, diversity comparisons, and feedback-loop regression are direct outputs of these measurements rather than reductions to the classification inputs by construction. No equations, self-referential definitions, fitted parameters presented as predictions, or load-bearing self-citations appear in the derivation chain. While the validity and potential bias of the classification method are separate methodological concerns, they do not render the results circular per the specified patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on empirical measurement rather than derivation. Free parameters are expected in the operational definitions of searchability and diversity. Axioms concern the validity of those measures. No new physical or theoretical entities are introduced.

free parameters (2)

searchability classifier threshold
A cutoff or model used to label queries as searchable or non-searchable; its exact value or training affects the 80% figure and all downstream comparisons.
diversity metric parameters
Parameters controlling topic granularity, entropy calculation, or overlap scoring between AI responses and Google results; these directly influence the 'less diverse' finding.

axioms (2)

domain assumption User queries and AI responses can be meaningfully compared to traditional search engine results on the same topic
Invoked when claiming AI responses are less diverse than Google for comparable queries.
domain assumption Diversity of information exposure can be quantified via topic breadth and cross-system overlap without substantial measurement bias
Underpins both the expanded inquiry claim and the feedback-loop prediction.

pith-pipeline@v0.9.0 · 5517 in / 1660 out tokens · 40299 ms · 2026-05-10T15:42:38.207942+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Michelle Brachman, Amina El-Ashry, Casey Dugan, and Werner Geyer. 2025. Current and future use of large language models for knowledge work.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–24

2025
[2]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making.Proceedings of the ACM on Human-computer Interaction5, CSCW1 (2021), 1–21

2021
[3]

2025.How people use chatgpt

Aaron Chatterji, Thomas Cunningham, David J Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman. 2025.How people use chatgpt. Technical Report. National Bureau of Economic Research

2025
[4]

Anil R Doshi and Oliver P Hauser. 2024. Generative AI enhances individual creativity but reduces the collective diversity of novel content.Science advances 10, 28 (2024), eadn5290

2024
[5]

Rui Fang et al. 2014. Diversity in collaborative innovation.Information Systems Research25, 4 (2014), 725–744

2014
[6]

Qianyue Hao, Fengli Xu, Yong Li, and James Evans. 2026. Artificial intelligence tools expand scientists’ impact but contract science’s focus.Nature(2026), 1–7

2026
[7]

Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI Conference. From Searchable to Non-Searchable: Generative AI and Information Diversity in Online Information Seeking Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Figure 2: Diversity of ChatGPT answers versus Google Search results for the same searcha...

1999
[8]

Harsh Kumar, Jonathan Vincentius, Ewan Jordan, and Ashton Anderson. 2025. Human creativity in the age of llms: Randomized experiments on divergent and convergent thinking. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18

2025
[9]

Byung Cheol Lee and Jaeyeon Chung. 2024. An empirical investigation of the impact of ChatGPT on creativity.Nature Human Behaviour8, 10 (2024), 1906– 1914

2024
[10]

Jialu Liu, Yongfeng Zhang, Min Chen, et al . 2023. Conversational search and recommendation: State of the art and future directions.Comput. Surveys(2023)

2023
[11]

Rock Yuren Pang, Hope Schroeder, Kynnedy Simone Smith, Solon Barocas, Ziang Xiao, Emily Tseng, and Danielle Bragg. 2025. Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–20

2025
[12]

2011.The filter bubble: What the Internet is hiding from you

Eli Pariser. 2011.The filter bubble: What the Internet is hiding from you. Penguin

2011
[13]

Nikhil Sharma, Q Vera Liao, and Ziang Xiao. 2024. Generative echo chamber? effect of llm-powered search systems on diverse information seeking. InPro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–17

2024
[14]

Ben Shneiderman. 2022. Human-centered AI: A framework for responsible systems. InProceedings of the CHI Conference on Human Factors in Computing Systems. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yu et al

2022
[15]

Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W White, Reid Andersen, et al. 2024. The use of generative search engines for knowledge work and complex tasks. arXiv preprint arXiv:2404.04268(2024)

work page arXiv 2024
[16]

Brian Uzzi, Satyam Mukherjee, Michael Stringer, and Ben Jones. 2013. Atypical combinations and scientific impact.Science342, 6157 (2013), 468–472

2013
[17]

Ryan Yen, Nicole Sultanum, and Jian Zhao. 2024. To search or to gen? Exploring the synergy between generative AI and web search in programming. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–8

2024
[18]

Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng. 2024. Wildchat: 1m chatgpt interaction logs in the wild.arXiv preprint arXiv:2405.01470(2024)

work page internal anchor Pith review arXiv 2024