Recognition: unknown
From Searchable to Non-Searchable: Generative AI and Information Diversity in Online Information Seeking
Pith reviewed 2026-05-10 15:42 UTC · model grok-4.3
The pith
ChatGPT users pose mostly non-searchable questions that reach broader topics, but AI answers to search-like questions show less variety than Google results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Analysis of over 200,000 human-ChatGPT interactions shows that 80 percent of user queries are non-searchable and span broader knowledge spaces and topics than searchable queries. For comparable searchable queries, AI responses are less diverse than Google search results in the majority of topics. The diversity of AI responses further predicts subsequent changes in users' inquiry diversity, indicating a feedback loop between AI outputs and human exploration.
What carries the argument
The classification of user queries as searchable versus non-searchable, defined by whether the query could plausibly be answered by a traditional search engine, which separates expanded conversational inquiry from conventional search behavior.
If this is right
- Users shift toward more open-ended inquiry that lies outside traditional search.
- AI systems deliver narrower information sets than search engines for standard topics.
- Changes in the diversity of AI answers can drive corresponding shifts in the diversity of users' follow-up questions.
Where Pith is reading between the lines
- Hybrid designs that route some queries to search engines and others to generative models could preserve breadth while retaining conversational support.
- Sustained low diversity in AI answers might gradually narrow the range of topics users choose to explore over time.
- Platforms could test explicit prompts or post-processing steps that increase response variety to counteract the observed feedback effect.
Load-bearing premise
Queries can be classified reliably and without bias into searchable and non-searchable categories, and the chosen diversity measures accurately capture meaningful differences in the information users actually encounter.
What would settle it
A study that applies a different, independent method to label queries as searchable or non-searchable and finds no reduction in diversity for AI responses compared with search results, or no link between AI response diversity and later user query changes.
Figures
read the original abstract
Conversational generative AI systems such as ChatGPT are transforming how people seek and engage with information online. Unlike traditional search engines, these systems support open-ended, conversational inquiry, yet it remains unclear whether they ultimately expand or constrain the diversity of knowledge that users encounter in online search spaces, a primary foundation for knowledge work, learning, and innovation. Using over 200,000 real-world human-ChatGPT interactions, we examine how generative-AI-mediated inquiry reshapes diversity in both user inputs and system outputs through the lens of searchability - whether queries could plausibly be answered by traditional search engines. We find that almost 80% of ChatGPT user queries are non-searchable and span a broader knowledge space and topics than searchable queries, indicating expanded modes of inquiry. However, for comparable searchable queries, AI responses are less diverse than Google search results in the majority of topics. Moreover, the diversity of AI responses predicts subsequent changes in users' inquiry diversity, revealing a feedback loop between AI outputs and human exploration. These findings highlight a tension between expanded inquiry and constrained information exposure, with implications for designing hybrid search and generative-AI systems that better support exploratory knowledge seeking.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes over 200,000 real-world ChatGPT interactions to study how generative AI reshapes information diversity in online seeking. Queries are classified as searchable (plausibly answerable by traditional search engines) or non-searchable; the work reports that ~80% are non-searchable and span broader topics, that AI responses to comparable searchable queries are less diverse than Google results in most topics, and that AI response diversity predicts subsequent shifts in user inquiry diversity, indicating a feedback loop.
Significance. If robust, the findings illuminate a key tension in AI-mediated information seeking: expanded modes of inquiry alongside reduced diversity in outputs, with direct implications for hybrid search-AI system design. The large-scale observational dataset from actual user interactions is a notable strength, lending empirical weight to the feedback-loop prediction without relying on self-referential modeling.
major comments (2)
- The binary searchability classifier (used to derive the headline ~80% non-searchable statistic and all subsequent comparisons) lacks reported validation such as inter-annotator agreement, human-rater benchmarks, or sensitivity analysis to threshold or prompt variations. This classification is load-bearing for the central claims about expanded inquiry and diversity differences; without it, the directional results cannot be confidently separated from classification artifacts.
- Diversity metrics (topic breadth, overlap with search results) and the regression linking AI response diversity to later user inquiry changes are described at a high level only, with no explicit parameter settings, statistical controls, error bars, or robustness checks against post-hoc topic selection. These omissions directly affect evaluation of the 'less diverse' and feedback-loop findings.
minor comments (1)
- The abstract introduces 'searchability' without a concise operational definition; moving a brief clause from the Methods to the abstract would improve immediate clarity for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the methodological transparency and robustness of our findings.
read point-by-point responses
-
Referee: The binary searchability classifier (used to derive the headline ~80% non-searchable statistic and all subsequent comparisons) lacks reported validation such as inter-annotator agreement, human-rater benchmarks, or sensitivity analysis to threshold or prompt variations. This classification is load-bearing for the central claims about expanded inquiry and diversity differences; without it, the directional results cannot be confidently separated from classification artifacts.
Authors: We agree that the current manuscript does not provide sufficient validation details for the searchability classifier, which is central to our claims. The classification relied on a structured prompt to GPT-4, but we did not report inter-annotator agreement or sensitivity checks. In the revised version, we will add a new subsection detailing validation: inter-annotator agreement (Cohen's kappa) from two human raters on a random sample of 500 queries, benchmarks against manual classification of a held-out set, and sensitivity analyses across prompt variations and decision thresholds confirming stability of the ~80% non-searchable proportion and downstream diversity comparisons. revision: yes
-
Referee: Diversity metrics (topic breadth, overlap with search results) and the regression linking AI response diversity to later user inquiry changes are described at a high level only, with no explicit parameter settings, statistical controls, error bars, or robustness checks against post-hoc topic selection. These omissions directly affect evaluation of the 'less diverse' and feedback-loop findings.
Authors: We concur that the manuscript presents the diversity metrics and regression at an insufficient level of detail. We will substantially expand the Methods section to specify: exact operationalizations (topic breadth via entropy over LDA-derived topics; overlap via Jaccard similarity to Google results), all parameter settings (e.g., number of topics, LDA hyperparameters), the complete regression model including controls and fixed effects, error bars or confidence intervals on all reported figures, and robustness checks such as alternative topic models (e.g., NMF), varying topic counts, and re-estimation after excluding high-influence topics to address post-hoc selection concerns. revision: yes
Circularity Check
No significant circularity; empirical classification and measurements are independent of reported outcomes.
full rationale
The paper conducts an observational analysis of over 200,000 real-world ChatGPT interactions. Queries are classified as searchable or non-searchable according to an external criterion (whether they could plausibly be answered by traditional search engines), after which topic breadth, knowledge space coverage, and overlap with Google results are measured separately. The ~80% non-searchable statistic, diversity comparisons, and feedback-loop regression are direct outputs of these measurements rather than reductions to the classification inputs by construction. No equations, self-referential definitions, fitted parameters presented as predictions, or load-bearing self-citations appear in the derivation chain. While the validity and potential bias of the classification method are separate methodological concerns, they do not render the results circular per the specified patterns.
Axiom & Free-Parameter Ledger
free parameters (2)
- searchability classifier threshold
- diversity metric parameters
axioms (2)
- domain assumption User queries and AI responses can be meaningfully compared to traditional search engine results on the same topic
- domain assumption Diversity of information exposure can be quantified via topic breadth and cross-system overlap without substantial measurement bias
Reference graph
Works this paper leans on
-
[1]
Michelle Brachman, Amina El-Ashry, Casey Dugan, and Werner Geyer. 2025. Current and future use of large language models for knowledge work.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–24
2025
-
[2]
Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making.Proceedings of the ACM on Human-computer Interaction5, CSCW1 (2021), 1–21
2021
-
[3]
2025.How people use chatgpt
Aaron Chatterji, Thomas Cunningham, David J Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman. 2025.How people use chatgpt. Technical Report. National Bureau of Economic Research
2025
-
[4]
Anil R Doshi and Oliver P Hauser. 2024. Generative AI enhances individual creativity but reduces the collective diversity of novel content.Science advances 10, 28 (2024), eadn5290
2024
-
[5]
Rui Fang et al. 2014. Diversity in collaborative innovation.Information Systems Research25, 4 (2014), 725–744
2014
-
[6]
Qianyue Hao, Fengli Xu, Yong Li, and James Evans. 2026. Artificial intelligence tools expand scientists’ impact but contract science’s focus.Nature(2026), 1–7
2026
-
[7]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI Conference. From Searchable to Non-Searchable: Generative AI and Information Diversity in Online Information Seeking Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Figure 2: Diversity of ChatGPT answers versus Google Search results for the same searcha...
1999
-
[8]
Harsh Kumar, Jonathan Vincentius, Ewan Jordan, and Ashton Anderson. 2025. Human creativity in the age of llms: Randomized experiments on divergent and convergent thinking. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18
2025
-
[9]
Byung Cheol Lee and Jaeyeon Chung. 2024. An empirical investigation of the impact of ChatGPT on creativity.Nature Human Behaviour8, 10 (2024), 1906– 1914
2024
-
[10]
Jialu Liu, Yongfeng Zhang, Min Chen, et al . 2023. Conversational search and recommendation: State of the art and future directions.Comput. Surveys(2023)
2023
-
[11]
Rock Yuren Pang, Hope Schroeder, Kynnedy Simone Smith, Solon Barocas, Ziang Xiao, Emily Tseng, and Danielle Bragg. 2025. Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–20
2025
-
[12]
2011.The filter bubble: What the Internet is hiding from you
Eli Pariser. 2011.The filter bubble: What the Internet is hiding from you. Penguin
2011
-
[13]
Nikhil Sharma, Q Vera Liao, and Ziang Xiao. 2024. Generative echo chamber? effect of llm-powered search systems on diverse information seeking. InPro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–17
2024
-
[14]
Ben Shneiderman. 2022. Human-centered AI: A framework for responsible systems. InProceedings of the CHI Conference on Human Factors in Computing Systems. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yu et al
2022
- [15]
-
[16]
Brian Uzzi, Satyam Mukherjee, Michael Stringer, and Ben Jones. 2013. Atypical combinations and scientific impact.Science342, 6157 (2013), 468–472
2013
-
[17]
Ryan Yen, Nicole Sultanum, and Jian Zhao. 2024. To search or to gen? Exploring the synergy between generative AI and web search in programming. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–8
2024
-
[18]
Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng. 2024. Wildchat: 1m chatgpt interaction logs in the wild.arXiv preprint arXiv:2405.01470(2024)
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.