pith. sign in

arxiv: 2509.05219 · v4 · submitted 2025-09-05 · 💻 cs.HC

Conversational AI increases political knowledge as effectively as self-directed internet search

Pith reviewed 2026-05-18 18:31 UTC · model grok-4.3

classification 💻 cs.HC
keywords conversational AIpolitical knowledgemisinformationrandomized controlled trialGoogle searchinformation seekingelection information
0
0 comments X

The pith

Task-directed conversations with AI increase political knowledge as much as self-directed Google search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish whether conversational AI helps or harms users' political knowledge when used for information seeking. It finds through a survey that many UK citizens are already using it for election info. The core experiments are RCTs showing that AI chats lead to the same increase in correct political beliefs and decrease in incorrect ones as using search engines on your own. This is consistent no matter the topic, the AI model, or how the user prompts it. A reader might care because it indicates that replacing search with AI for politics may not worsen the spread of false information.

Core claim

In a series of randomised controlled trials, task-directed conversations with AI to research specific political topics increase political knowledge to the same extent as self-directed Google search. This equivalence is observed across issues, models, and prompting strategies, with knowledge gauged by belief in true information rising and belief in misinformation falling.

What carries the argument

The randomised controlled trials that directly compare belief changes after AI-assisted versus self-directed search-based research on political questions.

Load-bearing premise

The knowledge measures used in the RCTs validly capture lasting changes in belief rather than temporary responses shaped by the experimental setting or demand characteristics.

What would settle it

Re-testing the same participants on political facts after a delay of several weeks to see if the knowledge gains persist equally in the AI and search groups.

Figures

Figures reproduced from arXiv: 2509.05219 by Christopher Summerfield, Divya Siddarth, Hannah Rose Kirk, Henry Davidson, Henry Ogden, Jessica Bergs, Kobi Hackenburg, Lennart Luettgau, Saffron Huang.

Figure 1
Figure 1. Figure 1: Experimental design for measuring the impact of conversational AI on political knowledge. Participants [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Conversational AI usage patterns and influence on belief in true versus false information. (A) Survey results: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Belief in true and false information across prompting techniques and different conversational AI models. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Agreement with trust and distrust statements and private beliefs. Top row: (A) Change in agreement with trust [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Conversational AI systems are increasingly being used in place of traditional search engines to help users complete information-seeking tasks. This has raised concerns in the political domain, where biased or hallucinated outputs could misinform voters or distort public opinion. However, in spite of these concerns, the extent to which conversational AI is used for political information-seeking, as well the potential impact of this use on users' political knowledge, remains uncertain. Here, we address these questions: First, in a representative national survey of the UK public (N = 2,499), we find that in the week before the 2024 election as many as 32% of chatbot users - and 13% of eligible UK voters - have used conversational AI to seek political information relevant to their electoral choice. Second, in a series of randomised controlled trials (N = 2,858 total) we find that across issues, models, and prompting strategies, task-directed conversations with AI to research specific political topics increase political knowledge (increase belief in true information and decrease belief in misinformation) to the same extent as self-directed Google search. Taken together, our results suggest that people in the UK are increasingly turning to conversational AI for information about politics. These findings substantially extend prior work by demonstrating that conversational AI's effects on political knowledge generalise across multiple topics, political perspectives, and model families, suggesting that the shift toward AI-assisted political information-seeking may not lead to increased public belief in political misinformation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper reports that in a representative UK survey (N=2,499) conducted the week before the 2024 election, up to 32% of chatbot users (13% of eligible voters) used conversational AI for politically relevant information. Across a series of RCTs (total N=2,858), task-directed conversations with AI on specific political topics increased belief in true statements and decreased belief in misinformation to the same extent as self-directed Google search, with this equivalence holding across issues, models, and prompting strategies.

Significance. If the equivalence result holds under scrutiny of the outcome measures, the work provides large-scale comparative evidence that conversational AI does not appear to worsen political knowledge relative to conventional search, extending prior findings on AI information-seeking to the political domain with multi-topic, multi-model coverage. The representative survey component adds timely descriptive data on adoption rates.

major comments (1)
  1. [RCT design and outcome measures] The central equivalence claim rests on immediate post-task belief measures in the RCTs. The manuscript provides no delayed retest, no explicit pre-registration details on retention checks, and limited abstract-level description of item wording or distractors; this leaves open that observed parity could reflect single-session demand characteristics, social desirability, or verbatim recall rather than durable knowledge gains (see skeptic note on weakest assumption).
minor comments (1)
  1. [Abstract] The abstract states results 'across issues, models, and prompting strategies' but does not enumerate the specific topics or models tested; adding this detail would improve transparency without altering the main text.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment regarding RCT design and outcome measures below, with clarifications on what the study can and cannot support.

read point-by-point responses
  1. Referee: [RCT design and outcome measures] The central equivalence claim rests on immediate post-task belief measures in the RCTs. The manuscript provides no delayed retest, no explicit pre-registration details on retention checks, and limited abstract-level description of item wording or distractors; this leaves open that observed parity could reflect single-session demand characteristics, social desirability, or verbatim recall rather than durable knowledge gains (see skeptic note on weakest assumption).

    Authors: We acknowledge that the RCTs rely on immediate post-task belief measures rather than delayed retests. This design matches the acute nature of the information-seeking task and enables a direct, controlled comparison to self-directed Google search, consistent with prior experimental work on search effects. A delayed retest was not included because the study protocol focused on immediate belief updating following the task; data collection is now complete and such a follow-up cannot be added. The study was pre-registered, though retention checks were outside the original scope. In revision we will expand the methods section with full item wording, distractor details, and selection criteria, and we will add explicit discussion of potential demand characteristics, noting that both AI and search conditions used parallel instructions with participants unaware of the equivalence hypothesis. These changes will clarify the scope of the equivalence finding without overstating durability. revision: partial

standing simulated objections not resolved
  • Absence of delayed retest data to evaluate long-term retention of belief changes.

Circularity Check

0 steps flagged

No circularity: purely empirical RCT and survey results

full rationale

The paper presents a national survey (N=2499) and series of RCTs (N=2858) comparing task-directed AI conversations to self-directed Google search on political knowledge outcomes. No equations, derivations, fitted parameters, or predictive models are defined or used. The core equivalence claim is a direct statistical comparison of independent experimental conditions. No self-citations are invoked to justify uniqueness or load-bearing premises, and the design does not rename or smuggle in prior results by construction. This is a standard empirical comparison with no reduction of outputs to inputs via definition or fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions about survey representativeness and the validity of knowledge measures in controlled settings; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption The national survey sample is representative of the UK public for estimating chatbot usage rates.
    Underpins the 32% and 13% usage figures reported in the abstract.
  • domain assumption The belief measures in the RCTs reflect genuine changes in political knowledge rather than experimental artifacts.
    Load-bearing for the equivalence claim between AI conversations and web search.

pith-pipeline@v0.9.0 · 5823 in / 1195 out tokens · 43229 ms · 2026-05-18T18:31:58.573278+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. People readily follow personal advice from AI but it does not improve their well-being

    cs.HC 2025-11 conditional novelty 7.0

    Large longitudinal RCT finds high rates of following AI personal advice but no sustained well-being gains versus a hobbies control condition.

  2. What Is The Political Content in LLMs' Pre- and Post-Training Data?

    cs.CL 2025-09 unverdicted novelty 5.0

    Training data for open LLMs is systematically left-leaning, with pre-training corpora containing more political material than post-training data and model stances aligning with data distributions.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    Can knowledge graphs reduce hallucinations in LLMs? : A survey

    Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi, and Huan Liu. Can knowledge graphs reduce hallucinations in LLMs? : A survey. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), page...

  2. [2]

    doi: 10.18653/v1/2024.naacl-long.219

    Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.219. URL https://aclanthology.org/2024.naacl-long. 219/. Anthropic. Clio: Privacy-preserving insights into real-world ai use. https://assets.anthropic.com/m/ 7e1ab885d1b24176/original/Clio-Privacy-Preserving-Insights-into-Real-World-AI-Use.pdf ,

  3. [3]

    Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal

    URLhttps://arxiv.org/abs/2507.03772. Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630,

  4. [4]

    The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.http://arxiv.org/abs/2301.01768,

    J Hartmann, J Schwenzow, and M Witte. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.http://arxiv.org/abs/2301.01768,

  5. [5]

    The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

    URLhttp://arxiv.org/abs/1111.4246. L Huang, W Yu, W Ma, W Zhong, Z Feng, H Wang, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, page 3703155,

  6. [6]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

    doi: 10.1145/3703155. Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, et al. Survey of hallucination in natural language generation.ACM Computing Surveys, 55:1–38,

  7. [7]

    doi: 10.1145/3571730. T Laher. Who do we trust the most? https://www.ipsos.com/sites/default/files/ct/news/documents/ 2024-09/Ipsos%20BandA%20%20Veracity%20Index%202024.pdf,

  8. [8]

    URL https://arxiv.org/abs/2505. 05602. C McClain. Americans’ use of chatgpt is ticking up, but few trust its elec- tion information. https://www.pewresearch.org/short-reads/2024/03/26/ americans-use-of-chatgpt-is-ticking-up-but-few-trust-its-election-information/ #chatgpt-and-the-2024-presidential-election,

  9. [9]

    Reuters institute digital news report 2024,

    N Newman, R Fletcher, CT Robertson, A Ross Arguedas, and RK Nielsen. Reuters institute digital news report 2024,

  10. [10]

    Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

    URLhttp://arxiv.org/abs/1912.11554. P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, et al. Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. http://arxiv.org/abs/2402. 16786,

  11. [11]

    How will advanced ai systems impact democracy?, 2024

    JMLR.org. C Summerfield, L Argyle, M Bakker, T Collins, E Durmus, T Eloundou, et al. How will advanced ai systems impact democracy?http://arxiv.org/abs/2409.06729,

  12. [12]

    11 Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn

    doi: 10.1080/19331681.2024.2422929. 11 Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn. Fine-tuning language models for factuality. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, May

  13. [13]

    Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

    URLhttps://openreview.net/forum?id=8435. C Wang, X Liu, Y Yue, X Tang, T Zhang, C Jiayang, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.http://arxiv.org/abs/2310.07521,

  14. [14]

    business as usual

    12 Supplementary Information Survey Demographics For each variable we show the percentages in the weighted sample with the raw percentages in parentheses. • Gender: –Male: 48.34% (51.62%) –Female: 51.46% (48.18%) • Age group: –18 to 24: 11.60% (12.61%) –25 to 34: 14.01% (19.37%) –35 to 54: 37.41% (34.73%) –55 to 64: 14.33% (14.17%) –65+: 22.65% (19.13%) •...