Conversational AI increases political knowledge as effectively as self-directed internet search

Christopher Summerfield; Divya Siddarth; Hannah Rose Kirk; Henry Davidson; Henry Ogden; Jessica Bergs; Kobi Hackenburg; Lennart Luettgau; Saffron Huang

arxiv: 2509.05219 · v4 · submitted 2025-09-05 · 💻 cs.HC

Conversational AI increases political knowledge as effectively as self-directed internet search

Lennart Luettgau , Hannah Rose Kirk , Kobi Hackenburg , Jessica Bergs , Henry Davidson , Henry Ogden , Divya Siddarth , Saffron Huang

show 1 more author

Christopher Summerfield

This is my paper

Pith reviewed 2026-05-18 18:31 UTC · model grok-4.3

classification 💻 cs.HC

keywords conversational AIpolitical knowledgemisinformationrandomized controlled trialGoogle searchinformation seekingelection information

0 comments

The pith

Task-directed conversations with AI increase political knowledge as much as self-directed Google search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish whether conversational AI helps or harms users' political knowledge when used for information seeking. It finds through a survey that many UK citizens are already using it for election info. The core experiments are RCTs showing that AI chats lead to the same increase in correct political beliefs and decrease in incorrect ones as using search engines on your own. This is consistent no matter the topic, the AI model, or how the user prompts it. A reader might care because it indicates that replacing search with AI for politics may not worsen the spread of false information.

Core claim

In a series of randomised controlled trials, task-directed conversations with AI to research specific political topics increase political knowledge to the same extent as self-directed Google search. This equivalence is observed across issues, models, and prompting strategies, with knowledge gauged by belief in true information rising and belief in misinformation falling.

What carries the argument

The randomised controlled trials that directly compare belief changes after AI-assisted versus self-directed search-based research on political questions.

Load-bearing premise

The knowledge measures used in the RCTs validly capture lasting changes in belief rather than temporary responses shaped by the experimental setting or demand characteristics.

What would settle it

Re-testing the same participants on political facts after a delay of several weeks to see if the knowledge gains persist equally in the AI and search groups.

Figures

Figures reproduced from arXiv: 2509.05219 by Christopher Summerfield, Divya Siddarth, Hannah Rose Kirk, Henry Davidson, Henry Ogden, Jessica Bergs, Kobi Hackenburg, Lennart Luettgau, Saffron Huang.

**Figure 1.** Figure 1: Experimental design for measuring the impact of conversational AI on political knowledge. Participants [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Conversational AI usage patterns and influence on belief in true versus false information. (A) Survey results: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Belief in true and false information across prompting techniques and different conversational AI models. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Agreement with trust and distrust statements and private beliefs. Top row: (A) Change in agreement with trust [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Conversational AI systems are increasingly being used in place of traditional search engines to help users complete information-seeking tasks. This has raised concerns in the political domain, where biased or hallucinated outputs could misinform voters or distort public opinion. However, in spite of these concerns, the extent to which conversational AI is used for political information-seeking, as well the potential impact of this use on users' political knowledge, remains uncertain. Here, we address these questions: First, in a representative national survey of the UK public (N = 2,499), we find that in the week before the 2024 election as many as 32% of chatbot users - and 13% of eligible UK voters - have used conversational AI to seek political information relevant to their electoral choice. Second, in a series of randomised controlled trials (N = 2,858 total) we find that across issues, models, and prompting strategies, task-directed conversations with AI to research specific political topics increase political knowledge (increase belief in true information and decrease belief in misinformation) to the same extent as self-directed Google search. Taken together, our results suggest that people in the UK are increasingly turning to conversational AI for information about politics. These findings substantially extend prior work by demonstrating that conversational AI's effects on political knowledge generalise across multiple topics, political perspectives, and model families, suggesting that the shift toward AI-assisted political information-seeking may not lead to increased public belief in political misinformation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows AI chatbots match Google search for immediate political knowledge gains across topics and models, but the single-session design leaves durability untested.

read the letter

This paper reports that task-directed conversations with AI raise accurate political beliefs and lower misinformation beliefs to the same degree as self-directed Google searches. The claim rests on a UK representative survey of 2499 people plus a series of RCTs with 2858 total participants covering multiple topics, political angles, and model families. Usage numbers are straightforward: up to 13 percent of eligible voters had turned to chatbots for election-related info in the week before the 2024 vote. The experiments then put people into either AI or Google conditions and measured belief shifts right afterward. The breadth is the clearest addition over earlier narrower studies. Testing several issues and several models in one design gives a better sense of whether the parity is general rather than tied to one narrow case. The direct comparison to ordinary search is also useful for anyone tracking how information tools actually affect voters. The main soft spot is the outcome timing. All knowledge measures come immediately after the task, with no delayed retest reported in the abstract or methods summary. That setup cannot separate lasting belief change from short-term recall or from participants simply repeating what they just encountered. Without item wording details, distractors, or any retention check, the equivalence result could partly reflect the experimental setting itself. The survey side looks solid on sampling, but the RCT measures carry the heavier load for the headline claim. This work is aimed at researchers and policy people who follow AI effects on political information. Anyone running experiments on misinformation or civic tech would get concrete comparative numbers from it. The topic and sample sizes are strong enough that a serious editor should send it to referees rather than desk reject, with the main request being clearer evidence on whether the gains hold up beyond the lab session.

Referee Report

1 major / 1 minor

Summary. The paper reports that in a representative UK survey (N=2,499) conducted the week before the 2024 election, up to 32% of chatbot users (13% of eligible voters) used conversational AI for politically relevant information. Across a series of RCTs (total N=2,858), task-directed conversations with AI on specific political topics increased belief in true statements and decreased belief in misinformation to the same extent as self-directed Google search, with this equivalence holding across issues, models, and prompting strategies.

Significance. If the equivalence result holds under scrutiny of the outcome measures, the work provides large-scale comparative evidence that conversational AI does not appear to worsen political knowledge relative to conventional search, extending prior findings on AI information-seeking to the political domain with multi-topic, multi-model coverage. The representative survey component adds timely descriptive data on adoption rates.

major comments (1)

[RCT design and outcome measures] The central equivalence claim rests on immediate post-task belief measures in the RCTs. The manuscript provides no delayed retest, no explicit pre-registration details on retention checks, and limited abstract-level description of item wording or distractors; this leaves open that observed parity could reflect single-session demand characteristics, social desirability, or verbatim recall rather than durable knowledge gains (see skeptic note on weakest assumption).

minor comments (1)

[Abstract] The abstract states results 'across issues, models, and prompting strategies' but does not enumerate the specific topics or models tested; adding this detail would improve transparency without altering the main text.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment regarding RCT design and outcome measures below, with clarifications on what the study can and cannot support.

read point-by-point responses

Referee: [RCT design and outcome measures] The central equivalence claim rests on immediate post-task belief measures in the RCTs. The manuscript provides no delayed retest, no explicit pre-registration details on retention checks, and limited abstract-level description of item wording or distractors; this leaves open that observed parity could reflect single-session demand characteristics, social desirability, or verbatim recall rather than durable knowledge gains (see skeptic note on weakest assumption).

Authors: We acknowledge that the RCTs rely on immediate post-task belief measures rather than delayed retests. This design matches the acute nature of the information-seeking task and enables a direct, controlled comparison to self-directed Google search, consistent with prior experimental work on search effects. A delayed retest was not included because the study protocol focused on immediate belief updating following the task; data collection is now complete and such a follow-up cannot be added. The study was pre-registered, though retention checks were outside the original scope. In revision we will expand the methods section with full item wording, distractor details, and selection criteria, and we will add explicit discussion of potential demand characteristics, noting that both AI and search conditions used parallel instructions with participants unaware of the equivalence hypothesis. These changes will clarify the scope of the equivalence finding without overstating durability. revision: partial

standing simulated objections not resolved

Absence of delayed retest data to evaluate long-term retention of belief changes.

Circularity Check

0 steps flagged

No circularity: purely empirical RCT and survey results

full rationale

The paper presents a national survey (N=2499) and series of RCTs (N=2858) comparing task-directed AI conversations to self-directed Google search on political knowledge outcomes. No equations, derivations, fitted parameters, or predictive models are defined or used. The core equivalence claim is a direct statistical comparison of independent experimental conditions. No self-citations are invoked to justify uniqueness or load-bearing premises, and the design does not rename or smuggle in prior results by construction. This is a standard empirical comparison with no reduction of outputs to inputs via definition or fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions about survey representativeness and the validity of knowledge measures in controlled settings; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The national survey sample is representative of the UK public for estimating chatbot usage rates.
Underpins the 32% and 13% usage figures reported in the abstract.
domain assumption The belief measures in the RCTs reflect genuine changes in political knowledge rather than experimental artifacts.
Load-bearing for the equivalence claim between AI conversations and web search.

pith-pipeline@v0.9.0 · 5823 in / 1195 out tokens · 43229 ms · 2026-05-18T18:31:58.573278+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

across issues, models, and prompting strategies, task-directed conversations with AI ... increase political knowledge ... to the same extent as self-directed Google search
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Bayesian Generalized Linear Models (GLM1-3 ... ordered-logistic likelihood

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

People readily follow personal advice from AI but it does not improve their well-being
cs.HC 2025-11 conditional novelty 7.0

Large longitudinal RCT finds high rates of following AI personal advice but no sustained well-being gains versus a hobbies control condition.
What Is The Political Content in LLMs' Pre- and Post-Training Data?
cs.CL 2025-09 unverdicted novelty 5.0

Training data for open LLMs is systematically left-leaning, with pre-training corpora containing more political material than post-training data and model stances aligning with data distributions.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 2 Pith papers · 2 internal anchors

[1]

Can knowledge graphs reduce hallucinations in LLMs? : A survey

Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi, and Huan Liu. Can knowledge graphs reduce hallucinations in LLMs? : A survey. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), page...

work page 2024
[2]

doi: 10.18653/v1/2024.naacl-long.219

Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.219. URL https://aclanthology.org/2024.naacl-long. 219/. Anthropic. Clio: Privacy-preserving insights into real-world ai use. https://assets.anthropic.com/m/ 7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf ,

work page doi:10.18653/v1/2024.naacl-long.219 2024
[3]

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal

URLhttps://arxiv.org/abs/2507.03772. Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630,

work page arXiv
[4]

The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.http://arxiv.org/abs/2301.01768,

J Hartmann, J Schwenzow, and M Witte. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.http://arxiv.org/abs/2301.01768,

work page arXiv
[5]

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

URLhttp://arxiv.org/abs/1111.4246. L Huang, W Yu, W Ma, W Zhong, Z Feng, H Wang, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, page 3703155,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

doi: 10.1145/3703155. Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, et al. Survey of hallucination in natural language generation.ACM Computing Surveys, 55:1–38,

work page doi:10.1145/3703155
[7]

doi: 10.1145/3571730. T Laher. Who do we trust the most? https://www.ipsos.com/sites/default/files/ct/news/documents/ 2024-09/Ipsos%20BandA%20%20Veracity%20Index%202024.pdf,

work page doi:10.1145/3571730 2024
[8]

URL https://arxiv.org/abs/2505. 05602. C McClain. Americans’ use of chatgpt is ticking up, but few trust its elec- tion information. https://www.pewresearch.org/short-reads/2024/03/26/ americans-use-of-chatgpt-is-ticking-up-but-few-trust-its-election-information/ #chatgpt-and-the-2024-presidential-election,

work page 2024
[9]

Reuters institute digital news report 2024,

N Newman, R Fletcher, CT Robertson, A Ross Arguedas, and RK Nielsen. Reuters institute digital news report 2024,

work page 2024
[10]

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

URLhttp://arxiv.org/abs/1912.11554. P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, et al. Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. http://arxiv.org/abs/2402. 16786,

work page internal anchor Pith review Pith/arXiv arXiv 1912
[11]

How will advanced ai systems impact democracy?, 2024

JMLR.org. C Summerfield, L Argyle, M Bakker, T Collins, E Durmus, T Eloundou, et al. How will advanced ai systems impact democracy?http://arxiv.org/abs/2409.06729,

work page arXiv
[12]

11 Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn

doi: 10.1080/19331681.2024.2422929. 11 Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn. Fine-tuning language models for factuality. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, May

work page doi:10.1080/19331681.2024.2422929 2024
[13]

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

URLhttps://openreview.net/forum?id=8435. C Wang, X Liu, Y Yue, X Tang, T Zhang, C Jiayang, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.http://arxiv.org/abs/2310.07521,

work page arXiv
[14]

business as usual

12 Supplementary Information Survey Demographics For each variable we show the percentages in the weighted sample with the raw percentages in parentheses. • Gender: –Male: 48.34% (51.62%) –Female: 51.46% (48.18%) • Age group: –18 to 24: 11.60% (12.61%) –25 to 34: 14.01% (19.37%) –35 to 54: 37.41% (34.73%) –55 to 64: 14.33% (14.17%) –65+: 22.65% (19.13%) •...

work page 2030

[1] [1]

Can knowledge graphs reduce hallucinations in LLMs? : A survey

Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi, and Huan Liu. Can knowledge graphs reduce hallucinations in LLMs? : A survey. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), page...

work page 2024

[2] [2]

doi: 10.18653/v1/2024.naacl-long.219

Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.219. URL https://aclanthology.org/2024.naacl-long. 219/. Anthropic. Clio: Privacy-preserving insights into real-world ai use. https://assets.anthropic.com/m/ 7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf ,

work page doi:10.18653/v1/2024.naacl-long.219 2024

[3] [3]

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal

URLhttps://arxiv.org/abs/2507.03772. Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630,

work page arXiv

[4] [4]

The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.http://arxiv.org/abs/2301.01768,

J Hartmann, J Schwenzow, and M Witte. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.http://arxiv.org/abs/2301.01768,

work page arXiv

[5] [5]

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

URLhttp://arxiv.org/abs/1111.4246. L Huang, W Yu, W Ma, W Zhong, Z Feng, H Wang, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, page 3703155,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

doi: 10.1145/3703155. Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, et al. Survey of hallucination in natural language generation.ACM Computing Surveys, 55:1–38,

work page doi:10.1145/3703155

[7] [7]

doi: 10.1145/3571730. T Laher. Who do we trust the most? https://www.ipsos.com/sites/default/files/ct/news/documents/ 2024-09/Ipsos%20BandA%20%20Veracity%20Index%202024.pdf,

work page doi:10.1145/3571730 2024

[8] [8]

URL https://arxiv.org/abs/2505. 05602. C McClain. Americans’ use of chatgpt is ticking up, but few trust its elec- tion information. https://www.pewresearch.org/short-reads/2024/03/26/ americans-use-of-chatgpt-is-ticking-up-but-few-trust-its-election-information/ #chatgpt-and-the-2024-presidential-election,

work page 2024

[9] [9]

Reuters institute digital news report 2024,

N Newman, R Fletcher, CT Robertson, A Ross Arguedas, and RK Nielsen. Reuters institute digital news report 2024,

work page 2024

[10] [10]

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

URLhttp://arxiv.org/abs/1912.11554. P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, et al. Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. http://arxiv.org/abs/2402. 16786,

work page internal anchor Pith review Pith/arXiv arXiv 1912

[11] [11]

How will advanced ai systems impact democracy?, 2024

JMLR.org. C Summerfield, L Argyle, M Bakker, T Collins, E Durmus, T Eloundou, et al. How will advanced ai systems impact democracy?http://arxiv.org/abs/2409.06729,

work page arXiv

[12] [12]

11 Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn

doi: 10.1080/19331681.2024.2422929. 11 Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn. Fine-tuning language models for factuality. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, May

work page doi:10.1080/19331681.2024.2422929 2024

[13] [13]

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

URLhttps://openreview.net/forum?id=8435. C Wang, X Liu, Y Yue, X Tang, T Zhang, C Jiayang, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.http://arxiv.org/abs/2310.07521,

work page arXiv

[14] [14]

business as usual

12 Supplementary Information Survey Demographics For each variable we show the percentages in the weighted sample with the raw percentages in parentheses. • Gender: –Male: 48.34% (51.62%) –Female: 51.46% (48.18%) • Age group: –18 to 24: 11.60% (12.61%) –25 to 34: 14.01% (19.37%) –35 to 54: 37.41% (34.73%) –55 to 64: 14.33% (14.17%) –65+: 22.65% (19.13%) •...

work page 2030