pith. sign in

arxiv: 2607.00551 · v1 · pith:WY2Q4MBBnew · submitted 2026-07-01 · 💰 econ.GN · q-fin.EC

Talking Politics with Artificial Intelligence

Pith reviewed 2026-07-02 03:20 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC
keywords AI conversationspolitical expressionlarge language modelselection impactconversational intermediariespolitical contentregression discontinuityideological extremity
0
0 comments X

The pith

AI conversations serve as practical intermediaries for political questions rather than open forums for expression, turning more opinionated only after major events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes 4.30 million human-AI conversations across three datasets to test whether these exchanges create a new public arena for politics or mainly handle everyday needs. Political content appears in just 3.9 percent of conversations and centers on practical tasks like requesting information, drafting text, and processing documents far more than stating opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that stance-taking, affective language, and ideological extremity increased among U.S. users afterward but not in comparable conversations elsewhere. This pattern holds across platforms that differ in publicness and conversation depth. Readers should care because it reframes AI as a behind-the-scenes tool that absorbs routine political demand until events raise the stakes.

Core claim

Using two validated classifiers on user messages from 4.30 million conversations, the paper identifies political content in 3.9 percent of exchanges, with most activity focused on information-seeking, text drafting, and document processing rather than opinion expression. The regression-discontinuity-in-time design around the 2024 U.S. presidential result call finds sharp rises in stance-taking, affective language, and ideological extremity among U.S. users post-call, with no parallel shift in non-U.S. conversations. The overall pattern leads to the claim that AI conversation functions less as a public square and more as a conversational political intermediary that absorbs routine demand and

What carries the argument

Two validated classifiers that tag political content, use case, and expressed ideology in user messages, combined with a regression-discontinuity-in-time design centered on the 2024 election result call.

If this is right

  • Political content varies sharply by platform publicness and conversation depth.
  • Users seek information, draft text, and process documents on political topics far more often than they state opinions.
  • Major events such as election result calls increase stance-taking, affective language, and ideological extremity in affected users.
  • AI conversation absorbs routine political demand and activates expressive features only when stakes become explicit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The intermediary pattern could apply to other high-stakes triggers such as policy announcements or scandals, producing similar spikes in expressive language.
  • Design choices on platforms might route routine queries differently from event-driven ones to match the observed split in behavior.
  • If AI absorbs routine political talk, visible public discourse on social media may shrink relative to private or semi-private exchanges.

Load-bearing premise

The classifiers correctly tag political content, use cases, and ideologies across the datasets, and the time-discontinuity design isolates the election result call without interference from other events.

What would settle it

Reclassifying a fresh sample of conversations or documenting other simultaneous events around the 2024 result call that produce the same rise in expressive language among U.S. users would undermine the intermediary claim.

Figures

Figures reproduced from arXiv: 2607.00551 by Ziwen Zu.

Figure 1
Figure 1. Figure 1: Political Prevalence Notes: Panel A reports the conversation-level political share in each corpus. Panel B reports the same quantity by user-turn bucket, a variable available in all three corpora. The topic distribution shows why political AI use should not be reduced to elections. In WildChat, the two largest categories are policy and legislation and government services, each accounting for about 18.7% of… view at source ↗
Figure 2
Figure 2. Figure 2: Topic Composition Notes: Topic identifies the substantive political domain of the conversation. The figure reports corpus-specific distributions and a pooled panel. appears, but what the user is trying to do with the model. On this dimension, political AI use is overwhelmingly practical. Across 248,935 political user turns with use-case labels, 64.7% seek information or explanation, 13.0% involve writing o… view at source ↗
Figure 3
Figure 3. Figure 3: Use Cases Notes: Use case identifies the user’s purpose or task within political material. The figure reports corpus-specific distributions and a pooled panel. Provider reports on general AI use provide a second benchmark for this interpretation (Appendix Table D.2). OpenAI’s privacy-preserving study of consumer ChatGPT use finds that practical guidance, seeking information, and writing account for about 7… view at source ↗
Figure 4
Figure 4. Figure 4: Geographic Variation Notes: Left panel reports WildChat conversation-level political prevalence by world region. Right panel reports high-volume countries by political prevalence. Country labels should be read as descriptive corpus composition rather than population-level political interest. Table D.3 again points to publicness: the shared-link corpus is 16.9 percentage points more likely than WildChat to … view at source ↗
Figure 5
Figure 5. Figure 5: Expressed Ideology Notes: Distributions are calculated among stance-taking political conversations. one left–right scale. This partial constraint is familiar from mass belief systems. Ordinary political expression often combines ideological cues, cross-cutting issue positions, and context-specific reasoning rather than a fully bundled ideology (Page and Shapiro 1992; Zaller 1992). The affective evidence sh… view at source ↗
Figure 6
Figure 6. Figure 6: Ideology Structure Notes: Points show deterministic samples of up to 3,000 stance-taking conversations per corpus; correlations use all stance-taking conversations with non-missing scores on the plotted dimensions. 22 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Result-Call Effects on Expression Notes: Figure reports U.S. RDiT plots around the result-call cutoff for political prevalence, stance-taking, and affective charge. Full local-linear RDiT estimates are reported in Appendix Table F.1. The more robust result is expressive and affective: after the AP call, U.S. stance-taking conversations became more charged and more ideologically extreme. RD = 0.55*** se = 0… view at source ↗
Figure 8
Figure 8. Figure 8: Result-Call Effects on Ideology Notes: Figure reports U.S. RDiT plots around the result-call cutoff for economic position and affective polarization. Full local-linear RDiT estimates for ideology outcomes are reported in Appendix Table F.2 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ideology Estimates by Geography Notes: Figure reports result-call RDiT estimates for ideology outcomes in the United States and the rest of the world. Corresponding numerical estimates are reported in Appendix Table F.2. Figures 7–9 fit a theory of political expression better than a theory of attention alone. Agenda￾setting and priming explain why a high-salience event makes politics more likely to enter c… view at source ↗
read the original abstract

Large language models (LLMs), a prominent form of artificial intelligence (AI), are becoming everyday interfaces for political questions, but most exchanges are dyadic rather than audiencefacing. This paper asks whether AI conversation functions as a new arena for political expression or as a conversational intermediary for routine political demand. Using 4.30 million humanAI conversations from three large public datasets, we apply two validated classifiers to user messages, identifying political content, use case, and expressed ideology. Political content appears in 3.9% of conversations, varies sharply by platform publicness and conversation depth, and is mostly practical: users ask for information, draft text, and process documents far more often than they state opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call changed the expressive subset: among U.S. users, stance-taking, affective language, and ideological extremity rose; comparable conversations elsewhere did not. AI conversation is less a public square than a conversational political intermediary, absorbing routine demand and becoming expressive when major events make political stakes explicit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that AI conversations function primarily as a conversational intermediary absorbing routine political demand rather than as a public square for expression. Analyzing 4.3 million human-AI conversations across three public datasets, it reports that political content appears in only 3.9% of exchanges and is mostly practical (information requests, text drafting, document processing) rather than opinion-stating. Two validated classifiers identify political content, use case, and expressed ideology; a regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows increases in stance-taking, affective language, and ideological extremity among U.S. users but not in comparable non-U.S. conversations.

Significance. If the measurement and identification strategies hold, the result offers a large-scale, falsifiable description of how LLMs mediate political talk, distinguishing routine practical use from event-triggered expressive use. This contributes to the literature on digital political behavior by providing quantitative evidence on the scale and triggers of political engagement with AI, with potential implications for platform design and democratic discourse studies.

major comments (2)
  1. [Abstract / Methods] Abstract and methods description: the claim that the two classifiers are 'validated' for identifying political content, practical vs. expressive use cases, and ideology across platforms and datasets provides no performance metrics (precision, recall, F1 by platform or depth), no inter-annotator agreement details, and no exclusion rules; these are load-bearing for the 3.9% prevalence figure and the subsequent use-case and ideology breakdowns.
  2. [Abstract / Results] Abstract and results description: the regression-discontinuity-in-time design around the 2024 result call reports no bandwidth choice, no placebo tests on pre-periods or non-U.S. samples, and no explicit checks ruling out contemporaneous confounders (e.g., media spikes); without these, the claim that the call isolated an increase in expressive language for U.S. users only cannot be fully assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our measurement strategies and identification approach. We will make revisions to provide the requested details on classifier validation and additional robustness checks for the regression-discontinuity design.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and methods description: the claim that the two classifiers are 'validated' for identifying political content, practical vs. expressive use cases, and ideology across platforms and datasets provides no performance metrics (precision, recall, F1 by platform or depth), no inter-annotator agreement details, and no exclusion rules; these are load-bearing for the 3.9% prevalence figure and the subsequent use-case and ideology breakdowns.

    Authors: We agree that the absence of detailed performance metrics, inter-annotator agreement, and exclusion rules in the abstract and methods limits the ability to fully assess the classifiers. We will revise the manuscript to include these in a new methods subsection, reporting precision, recall, and F1 by platform and depth, inter-annotator agreement, and exclusion rules. This will support the prevalence and breakdown figures. revision: yes

  2. Referee: [Abstract / Results] Abstract and results description: the regression-discontinuity-in-time design around the 2024 result call reports no bandwidth choice, no placebo tests on pre-periods or non-U.S. samples, and no explicit checks ruling out contemporaneous confounders (e.g., media spikes); without these, the claim that the call isolated an increase in expressive language for U.S. users only cannot be fully assessed.

    Authors: We concur that the RD design would benefit from explicit reporting of bandwidth choice, placebo tests, and confounder checks. In the revision, we will detail the bandwidth selection, add placebo tests on pre-periods and non-U.S. samples, and include checks for media spikes using external data. These will be added to the results and appendix. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external conversation data and RD-in-time identification

full rationale

The paper's derivation proceeds from 4.3 million external human-AI conversations, two validated classifiers applied to identify political content/use-case/ideology, and a regression-discontinuity-in-time design around the 2024 election result call. No equations, fitted parameters, or self-citations are described that reduce the central claim (AI as intermediary rather than public square) to the inputs by construction. The discontinuity isolates the event effect on U.S. users while showing no change elsewhere, providing independent empirical content. This is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities can be identified. The central claim rests on unstated details of classifier training and the validity of the regression-discontinuity assumption.

pith-pipeline@v0.9.1-grok · 5709 in / 1163 out tokens · 39694 ms · 2026-07-02T03:20:05.626667+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Anthropic

    https: //www.anthropic.com/research/economic-index-june-2026-report. Anthropic. 2026b.Anthropic Economic Index Report: Learning Curves.Anthropic. March 24,

  2. [2]

    Bang, Yejin, Delong Chen, Nayeon Lee, and Pascale Fung

    https://www.anthropic.com/research/economic-index-march-2026-report. Bang, Yejin, Delong Chen, Nayeon Lee, and Pascale Fung

  3. [3]

    Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

    “Measuring Political Bias in Large Language Models: What Is Said and How It Is Said.” InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),11142– 11159. Bangkok, Thailand: Association for Computational Linguistics. https://aclanthology. org/2024.acl-long.600/. Bennett, W. Lance, and Shanto Iyengar

  4. [4]

    From Street-Level to System-Level Bureaucracies: How Information and Communication Technology Is Transforming Administrative Discretion and Constitutional Control

    “From Street-Level to System-Level Bureaucracies: How Information and Communication Technology Is Transforming Administrative Discretion and Constitutional Control.”Public Administration Review62 (2): 174–184. Burnham, Michael. 2024.Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models.ArXiv preprint. arXiv: 2405.02472[cs.CL]. https:...

  5. [5]

    Robust Nonparametric Confi- dence Intervals for Regression-Discontinuity Designs

    “Robust Nonparametric Confi- dence Intervals for Regression-Discontinuity Designs.”Econometrica82 (6): 2295–2326. Cattaneo, Matias D., Nicolás Idrobo, and Rocío Titiunik. 2020.A Practical Introduction to Regres- sion Discontinuity Designs: Foundations.Cambridge: Cambridge University Press. Chadwick, Andrew. 2017.The Hybrid Media System: Politics and Power...

  6. [6]

    How Susceptible Are Large Language Models to Ideological Manipulation?

    “How Susceptible Are Large Language Models to Ideological Manipulation?” InProceedings of the 2024 Conference on 34 Empirical Methods in Natural Language Processing,17140–17161. Miami, Florida, USA: Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.952/. Chong, Dennis, and James N. Druckman

  7. [7]

    1996.What Americans Know about Politics and Why It Matters.New Haven, CT: Yale University Press

    Delli Carpini, Michael X., and Scott Keeter. 1996.What Americans Know about Politics and Why It Matters.New Haven, CT: Yale University Press. Di Leo, Riccardo, Chen Zeng, Elias Dinas, and Reda Tamtam

  8. [8]

    Regression Discontinuity in Time: Considerations for Empirical Applications

    “Regression Discontinuity in Time: Considerations for Empirical Applications.”Annual Review of Resource Economics10:533–552. Herd, Pamela, and Donald P. Moynihan. 2018.Administrative Burden: Policymaking by Other Means.New York: Russell Sage Foundation. Heseltine, Michael, and Bernhard Clemm von Hohenberg

  9. [9]

    Expressive Partisanship: Campaign Involvement, Political Emotion, and Partisan Identity

    “Expressive Partisanship: Campaign Involvement, Political Emotion, and Partisan Identity.”American Political Science Review109 (1): 1–17. 35 Iyengar, Shanto, and Donald R. Kinder. 2010.News That Matters: Television and American Opinion. Updated. Chicago: University of Chicago Press. Iyengar, Shanto, Gaurav Sood, and Yphtach Lelkes

  10. [10]

    The Two-Step Flow of Communication: An Up-To-Date Report on an Hypothe- sis

    “The Two-Step Flow of Communication: An Up-To-Date Report on an Hypothe- sis.”Public Opinion Quarterly21 (1): 61–78. Katz, Elihu, and Paul F. Lazarsfeld. 1955.Personal Influence: The Part Played by People in the Flow of Mass Communications.Glencoe, IL: The Free Press. King, Gary, Jennifer Pan, and Margaret E. Roberts

  11. [11]

    Persuading V oters Using Human– Artificial Intelligence Dialogues

    “Persuading V oters Using Human– Artificial Intelligence Dialogues.”Nature648 (8093): 394–401. Lipka, Michael, and Kirsten Eddy. 2025.Relatively Few Americans Are Getting News from AI Chatbots like ChatGPT.Pew Research Center. Short Read, October 1,

  12. [12]

    pewresearch.org/short-reads/2025/10/01/relatively-few-americans-are-getting-news-from- ai-chatbots-like-chatgpt/

    https://www. pewresearch.org/short-reads/2025/10/01/relatively-few-americans-are-getting-news-from- ai-chatbots-like-chatgpt/. Lupia, Arthur, and Mathew D. McCubbins. 1998.The Democratic Dilemma: Can Citizens Learn What They Need to Know?New York: Cambridge University Press. Mansbridge, Jane

  13. [13]

    The Agenda-Setting Function of Mass Media

    “The Agenda-Setting Function of Mass Media.” Public Opinion Quarterly36 (2): 176–187. Meta. 2020.What Do People Actually See on Facebook in the US?Meta. Newsroom post, November 10, 2020; updated November 4,

  14. [14]

    https://about.fb.com/news/2020/11/what-do-people- actually-see-on-facebook-in-the-us/. Meta. 2021.Reducing Political Content in Facebook Feed.Meta. Newsroom post, February 10,

  15. [15]

    Mettler, Suzanne

    https://about.fb.com/news/2021/02/reducing-political-content-in-news-feed/. Mettler, Suzanne. 2011.The Submerged State: How Invisible Government Policies Undermine American Democracy.Chicago: University of Chicago Press. Moynihan, Donald, Pamela Herd, and Hope Harvey

  16. [16]

    Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-State Interactions

    “Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-State Interactions.”Journal of Public Admin- istration Research and Theory25 (1): 43–69. Mutz, Diana C. 2006.Hearing the Other Side: Deliberative versus Participatory Democracy. Cambridge: Cambridge University Press. 36 O’Hagan, Sean, and Aaron Schein. 2023.Measurement in the...

  17. [17]

    How to Train Your Stochastic Parrot: Large Language Models for Political Texts

    “How to Train Your Stochastic Parrot: Large Language Models for Political Texts.”Political Science Research and Methods 13 (2): 264–281. Page, Benjamin I., and Robert Y . Shapiro. 1992.The Rational Public: Fifty Years of Trends in Americans’ Policy Preferences.Chicago: University of Chicago Press. Pangakis, Nicholas, Samuel Wolken, and Neil Fasching. 2023...

  18. [18]

    Election Outcomes and Affective Polarization in the United States

    “Election Outcomes and Affective Polarization in the United States.”Political Research Quarterly79 (2): 399–409. Popkin, Samuel L. 1994.The Reasoning Voter: Communication and Persuasion in Presidential Campaigns.2nd ed. Chicago: University of Chicago Press. Prior, Markus. 2007.Post-Broadcast Democracy: How Media Choice Increases Inequality in Political In...

  19. [19]

    Roberts, Margaret E

    https: //www.nature.com/articles/s41599-024-03609-x. Roberts, Margaret E. 2018.Censored: Distraction and Diversion Inside China’s Great Firewall. Princeton, NJ: Princeton University Press. Ross Arguedas, Amy

  20. [20]

    Emerging Uses of AI Chatbots for News and What It Means for Journalism

    “Emerging Uses of AI Chatbots for News and What It Means for Journalism.” InDigital News Report 2026.Oxford: Reuters Institute for the Study of Journalism, University of Oxford. https://reutersinstitute.politics.ox.ac.uk/digital-news-report/2026/ emerging-uses-ai-chatbots-news-and-what-it-means-journalism. Scheufele, Dietram A., and David Tewksbury

  21. [21]

    Are LLMs (Really) Ideological? An IRT-Based Analysis and Alignment Tool for Perceived Socio- Economic Bias in LLMs

    “Are LLMs (Really) Ideological? An IRT-Based Analysis and Alignment Tool for Perceived Socio- Economic Bias in LLMs.” InProceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM2),99–120. Vienna, Austria and virtual meeting: Association for Computa- tional Linguistics. https://aclanthology.org/2025.gem-1.9/. Wyatt, Robert O., Elihu Kat...

  22. [22]

    ShareChat: A Dataset of Chatbot Conversations in the Wild

    “Bridging the Spheres: Political and Personal Conversation in Public and Private Spaces.”Journal of Communication50 (1): 71–92. Yan, Yueru, Tuc Nguyen, Bo Su, Melissa Lieffers, and Thai Le. 2025.ShareChat: A Dataset of Chatbot Conversations in the Wild.ArXiv preprint. arXiv: 2512.17843 [cs.CL]. https: //arxiv.org/abs/2512.17843. Yang, Eddie

  23. [23]

    The Limits of AI for Authoritarian Control

    “The Limits of AI for Authoritarian Control.” Early View,American Journal of Political Science. 37 Zaller, John R. 1992.The Nature and Origins of Mass Opinion.Cambridge: Cambridge University Press. Zhao, Wenting, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng

  24. [24]

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversa- tion Dataset

    “LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversa- tion Dataset.” InThe Twelfth International Conference on Learning Representations.https: //openreview.net/forum?id=BOfDKxfwt0. 38 Supplemental information A Data and corpora SI-1 B Measurement procedure SI-2 C Example political conversations SI-5 D Additional descriptive diagnostics SI-5 E Ideology di...

  25. [25]

    Users could access the service without registration or payment, choose a single-model interface, compare two randomly assigned anonymous models, or compare two chosen models; the dataset includes conversations from these interfaces after users accepted the site’s terms of use (Zheng et al. 2024). This makes the corpus useful for checking whether political...

  26. [26]

    SI-7 Table D.2: External Benchmarks Source Coverage Reported usage benchmark Comparison to this study OpenAI,How People Use ChatGPT (Chatterji et al

    Dashes mark metadata not released in the corpus: LMSYS has no usable timestamp or country field, and ShareChat has no country field. SI-7 Table D.2: External Benchmarks Source Coverage Reported usage benchmark Comparison to this study OpenAI,How People Use ChatGPT (Chatterji et al. 2025; OpenAI

  27. [27]

    Practical guidance, seeking information, and writing together account for about 77–80% of conversations; 49% of messages are asking, 40% doing, and 11% expressing

    Consumer ChatGPT messages on Free, Plus, and Pro plans, May 2024–June 2025, with usage trends through July 2025 About 700 million weekly users and 18 billion messages per week by July 2025; non-work messages exceeded 70% of consumer usage. Practical guidance, seeking information, and writing together account for about 77–80% of conversations; 49% of messa...

  28. [28]

    Facebook Feed content around the 2020 election and later updates to political-content ranking Meta reported that political content made up about 6% of what U.S

    U.S. Facebook Feed content around the 2020 election and later updates to political-content ranking Meta reported that political content made up about 6% of what U.S. users saw in Feed in 2020 and less than 3% in its later updated analysis. The 3.3–3.9% political-conversation rates in WildChat and LMSYS are in the same broad order as a major public social ...

  29. [29]

    Stance−taking rate (pct.) 2024 2025 10 20 30 1.25 1.50 1.75 0.5 0.6 0.7 0.8 0.9 Monthly value WildChat ShareChat Figure E.1: Ideology Over Time Notes: Months are shown when they contain at least 500 conversations and 50 political conversations; affective intensity and extremity means require at least 20 stance-taking conversations. WildChat has usable tim...

  30. [30]

    The increase is not concentrated in an election residual

    The figure is descriptive, but it is useful because it shows where the post-call non-election stance conversations appear. The increase is not concentrated in an election residual. It is distributed across policy, ideol- ogy and identity, welfare and taxation, government services, parties and politicians, immigration, and foreign affairs, consistent with ...