pith. sign in

arxiv: 2604.04479 · v2 · submitted 2026-04-06 · 💻 cs.HC

How can LLMs Support Policy Researchers? Evaluating an LLM-Assisted Workflow for Large-Scale Unstructured Data

Pith reviewed 2026-05-10 19:39 UTC · model grok-4.3

classification 💻 cs.HC
keywords large language modelsthematic analysispolicy researchunstructured dataReddit analysischatbot interviewspublic discourseAI-assisted qualitative research
0
0 comments X

The pith

An LLM-assisted workflow lets policy researchers analyze millions of Reddit posts and chatbot interviews to surface public themes that align with and diverge from official reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Policy researchers need faster ways to understand public views but traditional interviews and surveys are slow and limited in scale. This paper evaluates an LLM-assisted thematic analysis workflow first with 11 researchers who found an early prototype useful as quick input, then scales it to millions of Reddit posts plus 1,058 chatbot interviews on a policy topic. The resulting themes are compared directly to those in authoritative policy reports, showing both matches and differences in the public discourse captured. A sympathetic reader would care because this points to a practical way to bring large-scale unstructured online text into policy work without replacing human judgment.

Core claim

The paper establishes that an LLM-assisted workflow for thematic analysis can be scaled to process millions of Reddit posts and over a thousand chatbot-led interview transcripts on policy-relevant topics, treating these as rich data sources for public discourse, and that the synthesized themes both align with and diverge from those found in authoritative policy reports while early prototypes are viewed by policy researchers as practical rough-and-ready inputs.

What carries the argument

LLM-assisted thematic analysis workflow applied to large-scale unstructured text from online forums and chatbot interviews, with direct comparison of output themes to policy reports.

If this is right

  • Policy work can incorporate views from millions of online discussions at low cost as an early research step.
  • Chatbot-led interviews become a scalable supplement to traditional listening sessions.
  • Divergences between LLM themes and official reports can flag areas where public concerns may be under-represented in policy documents.
  • Researchers gain a repeatable process for initial thematic mapping before deeper qualitative work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the workflow holds up, it could shorten the time between identifying emerging public issues and incorporating them into policy analysis.
  • The approach might be tested on other unstructured sources such as news comments or public hearing transcripts to broaden its use.
  • Validation studies could measure how much human review of LLM outputs is needed to reach acceptable reliability for policy use.
  • Over time this could shift policy research norms toward treating large online corpora as standard first-pass data rather than occasional supplements.

Load-bearing premise

That themes extracted by LLMs from social media posts and chatbot transcripts serve as a sufficiently undistorted proxy for actual public discourse that can be compared meaningfully to authoritative reports.

What would settle it

A side-by-side human audit of the same Reddit and interview data that finds the LLM-synthesized themes systematically omit, add, or reframe major public concerns that human coders identify.

Figures

Figures reproduced from arXiv: 2604.04479 by Andr\'es Monroy-Hern\'andez, Ella Colby, Jakob Kaiser, Jennifer Okwara, Maggie Wang, Shuyao Zhou, Varun Nagaraj Rao, Yuhan Liu.

Figure 1
Figure 1. Figure 1: Overview of the Workflow. The workflow comprises four stages, where the two stages(Quote Extraction [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of a discussion in r/cscareers subreddit [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of How User Interact with the Workflow in This Study. Action 1: The user enters a policy [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Screenshots of the User Interface in this study. View 1 shows the interface. This view prompts users to input a research domain for analysis (Action 1) and select a data source (Action 2). View 2 shows high-level themes in the workflow interface. The user can then select or search for a primary research topic (Action 3). View 3 shows the final Report View of the interface, displaying all subtopics identifi… view at source ↗
Figure 5
Figure 5. Figure 5: Timeline of the User Study. This figure illustrates the four-phase research method comparing AI￾assisted and participants’ own non-AI expert approach. The process begins with an interview and tutorial, followed by two sequential research parts randomized based on topic (Climate Change or Social Media & Kids) and method (with the workflow or own non-AI expert approach), and concludes with a retrospective su… view at source ↗
Figure 6
Figure 6. Figure 6: This diverging stacked bar chart breaks down Likert scale responses (1-5) into three groups (negative, [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: This bar chart shows the average number of themes gathered across the two research topics—Climate Change and Social Media & Kids—using two research methods, the workflow and non-AI expert approach. As we can see, even in a limited study duration, using the workflow, on average, allowed participants to collect a higher number of themes for both topics, pointing to the process being an order of magnitude fas… view at source ↗
Figure 8
Figure 8. Figure 8: Overview of the evaluation in Study 2. We extend the QuaLLM-assisted thematic analysis workflow [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Workflow for selecting and preparing Reddit data in Study 2. Starting from a corpus covering submis [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Workflow for collecting and processing chatbot-led interview data in Study 2. A public-opinion expert [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Theme overlap across sources. Reddit and interviews largely corroborate authoritative reports while [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
read the original abstract

Policy researchers need scalable ways to surface public views, yet they often rely on interviews, listening sessions, and surveys-analyzed thematically-that are slow, expensive, and limited in scale and diversity. LLMs offer new possibilities for thematic analysis of unstructured text, yet we know little about how LLM-assisted workflows perform for policy research. Building on a workflow for LLM-assisted thematic analysis of online forums, we conduct a study with 11 policy researchers, who use an early prototype and see it as a quick, rough-and-ready input to their research. We then extend and scale the workflow to analyze millions of Reddit posts and 1,058 chatbot-led interview transcripts on a policy-relevant topic, treating these sources as rich and scalable data for policy discourse. We compare the synthesized themes to those from authoritative policy reports, identify points of alignment and divergence, and discuss what this implies for policy researchers adopting LLM-assisted workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces an LLM-assisted workflow for thematic analysis of large-scale unstructured data sources such as Reddit posts and chatbot-led interviews to support policy researchers. It evaluates the workflow via a user study with 11 policy researchers who found the prototype useful as a rough input, then scales the workflow to process millions of Reddit posts and 1,058 interview transcripts on a policy topic, synthesizing themes and comparing them qualitatively to those in authoritative policy reports to note alignments and divergences.

Significance. If the workflow proves reliable and the comparisons hold, this work could be significant for the HCI and policy informatics communities by demonstrating a practical, scalable approach to incorporating diverse public opinions into policy research, potentially reducing reliance on resource-intensive traditional methods while highlighting adoption challenges. The scaling to millions of posts and over a thousand transcripts is a notable strength in showing feasibility.

major comments (3)
  1. [User Study] User study section: The evaluation relies on feedback from only 11 policy researchers with no reported quantitative metrics (e.g., SUS scores, task times, or inter-rater agreement on perceived utility), which is load-bearing for the claim that the prototype serves as a 'quick, rough-and-ready input' to research.
  2. [Large-Scale Analysis] Large-scale analysis section: No specifics are given on prompting strategies, sampling from millions of posts, or validation of LLM-generated themes (e.g., human review protocols or checks for model bias), undermining the central claim that these sources provide a valid proxy for public discourse.
  3. [Theme Comparison] Theme comparison section: Alignment and divergence with policy reports are noted qualitatively without metrics such as overlap statistics or discussion of potential distortions from data quality or LLM biases, which is load-bearing for the implications drawn for policy researchers.
minor comments (2)
  1. [Abstract] Abstract: The specific policy topic analyzed in the large-scale portion could be named to allow readers to better judge relevance and generalizability.
  2. [Discussion] Discussion: Adding citations to existing literature on LLM use in qualitative social science research would strengthen contextualization of the workflow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review. Their feedback highlights important areas for improving methodological transparency and rigor in our evaluation sections. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses
  1. Referee: [User Study] User study section: The evaluation relies on feedback from only 11 policy researchers with no reported quantitative metrics (e.g., SUS scores, task times, or inter-rater agreement on perceived utility), which is load-bearing for the claim that the prototype serves as a 'quick, rough-and-ready input' to research.

    Authors: We agree that the user study is exploratory and qualitative, relying on a small sample of 11 policy researchers, which aligns with common HCI practices for formative evaluation of early prototypes. The goal was to gather initial insights on perceived utility rather than definitive quantitative validation. In the revision, we will expand this section with additional details on participant recruitment and demographics, more extensive task descriptions, further anonymized feedback quotes, and an explicit limitations subsection acknowledging the small sample size and absence of standardized metrics such as SUS scores. We will also temper the language around the 'quick, rough-and-ready input' claim to better reflect its preliminary nature while retaining the supporting evidence from the study. revision: partial

  2. Referee: [Large-Scale Analysis] Large-scale analysis section: No specifics are given on prompting strategies, sampling from millions of posts, or validation of LLM-generated themes (e.g., human review protocols or checks for model bias), undermining the central claim that these sources provide a valid proxy for public discourse.

    Authors: This is a fair critique, and we regret the lack of these details in the submitted version. We will add a dedicated methods subsection that specifies the prompting strategies and templates employed for theme generation, the sampling and processing approach for the millions of Reddit posts (including any filtering criteria or batching methods), and the validation protocols such as human review procedures by the research team along with steps to address model biases (e.g., bias-mitigating prompt design and cross-validation on subsets). These additions will directly reinforce the claim that the sources serve as a valid, scalable proxy for public discourse. revision: yes

  3. Referee: [Theme Comparison] Theme comparison section: Alignment and divergence with policy reports are noted qualitatively without metrics such as overlap statistics or discussion of potential distortions from data quality or LLM biases, which is load-bearing for the implications drawn for policy researchers.

    Authors: The comparison was designed as a qualitative analysis to surface nuanced alignments and divergences that purely quantitative approaches might miss. Nevertheless, we acknowledge the benefit of supplementary metrics. In the revision, we will include basic quantitative overlap measures (such as theme coverage percentages or similarity indicators where data permits) and add a discussion paragraph addressing potential distortions from Reddit data quality issues, interview formats, and LLM biases, including mitigation approaches used in the workflow. This will strengthen the implications section for policy researchers without altering the primarily qualitative framing. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper describes an empirical workflow evaluation: a small user study with 11 policy researchers using a prototype, followed by scaling the LLM-assisted thematic analysis to millions of Reddit posts and 1,058 chatbot transcripts, then comparing synthesized themes against independent authoritative policy reports for alignment and divergence. No equations, fitted parameters, or quantitative predictions appear; the central claims rest on qualitative user feedback and external report comparisons rather than any self-definitional loop, fitted-input-as-prediction, or load-bearing self-citation. The workflow is presented as exploratory input, not a closed derivation, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can reliably extract policy-relevant themes from noisy online text and chatbot transcripts; no free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)
  • domain assumption LLMs can perform thematic analysis on unstructured text at scale
    Invoked when the workflow is extended to millions of posts and interviews

pith-pipeline@v0.9.0 · 5486 in / 1311 out tokens · 29330 ms · 2026-05-10T19:39:24.484876+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

  1. [1]

    Saleh Afroogh, Ali Akbari, Emmie Malone, Mohammadali Kargar, and Hananeh Alambeigi. 2024. Trust in AI: Progress, Challenges, and Future Directions.Humanities and Social Sciences Communications11, 1 (2024), 1568. doi:10.1057/s41599-024-04044-8

  2. [2]

    Adam J Berinsky. 2017. Measuring public opinion with surveys.Annual review of political science20, 1 (2017), 309–329

  3. [3]

    David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.Journal of machine Learning research3, Jan (2003), 993–1022

  4. [4]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative research in psychology3, 2 (2006), 77–101

  5. [5]

    2024.Parties and elections in America: The electoral process

    Mark D Brewer and L Sandy Maisel. 2024.Parties and elections in America: The electoral process. Bloomsbury Publishing PLC

  6. [6]

    2014.American public opinion, advocacy, and policy in congress: What the public wants and what it gets

    Paul Burstein. 2014.American public opinion, advocacy, and policy in congress: What the public wants and what it gets. Cambridge University Press

  7. [7]

    Kaiping Chen and David Tomblin. 2021. Using data from reddit, public deliberation, and surveys to measure public opinion about autonomous vehicles.Public Opinion Quarterly85, S1 (2021), 289–322

  8. [8]

    Wei-Hao Chen, Weixi Tong, Amanda Case, and Tianyi Zhang. 2025. Dango: A Mixed-Initiative Data Wrangling System using Large Language Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–28

  9. [9]

    Scurrell, Eva M

    Alejandro Cuevas, Jennifer V. Scurrell, Eva M. Brown, Jason Entenmann, and Madeleine I. G. Daepp. 2025. Collecting Qualitative Data at Scale with Large Language Models: A Case Study.Proc. ACM Hum.-Comput. Interact.9, 2, Article CSCW049 (May 2025), 27 pages. doi:10.1145/3710947

  10. [10]

    Hulya Dogan, Kiet Nguyen, and Ismini Lourentzou. 2024. Narrative Characteristics in Refugee Discourse: An Analysis of American Public Opinion on the Afghan Refugee Crisis After the Taliban Takeover.Proceedings of the ACM on Human-Computer Interaction8, CSCW1 (2024), 1–31

  11. [11]

    Karen Dunwoodie, Luke Macaulay, and Alexander Newman. 2023. Qualitative interviewing in the field of work and organisational psychology: Benefits, challenges and guidelines for researchers and reviewers.Applied Psychology72, 2 (2023), 863–889

  12. [12]

    Cynthia Dwork and Martha Minow. 2022. Distrust of artificial intelligence: Sources & responses from computer science & law.Daedalus151, 2 (2022), 309–321

  13. [13]

    Encyclopædia Britannica. 2024. Public opinion poll. https://www.britannica.com/topic/public-opinion-poll

  14. [14]

    Alyssa Engler. 2020. What all policy analysts need to know about data science

  15. [15]

    Xavier Ferrer, Tom van Nuenen, Jose M Such, and Natalia Criado. 2021. Discovering and categorising language biases in reddit. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 140–151

  16. [16]

    Casey Fiesler, Nathan Beard, and Brian C Keegan. 2020. No robots, spiders, or scrapers: Legal and ethical regulation of data collection methods in social media terms of service. InProceedings of the international AAAI conference on web and social media, Vol. 14. 187–196

  17. [17]

    Casey Fiesler, Jessica Pater, Janet Read, Jessica Vitak, and Michael Zimmer. 2023. Internet Research Ethics: A CSCW Community Discussion. InCompanion Publication of the 2023 Conference on Computer Supported Cooperative Work and How can LLMs Support Policy Researchers? Evaluating an LLM-Assisted Workflow for Large-Scale Unstructured Data. 25 Social Computi...

  18. [18]

    Casey Fiesler, Michael Zimmer, Nicholas Proferes, Sarah Gilbert, and Naiyan Jones. 2024. Remember the Human: A Systematic Review of Ethical Considerations in Reddit Research.Proc. ACM Hum.-Comput. Interact.8, GROUP, Article 5 (Feb. 2024), 33 pages. doi:10.1145/3633070

  19. [19]

    Dilrukshi Gamage, Piyush Ghasiya, Vamshi Bonagiri, Mark E Whiting, and Kazutoshi Sasahara. 2022. Are deepfakes concerning? analyzing conversations of deepfakes on reddit and exploring societal implications. InProceedings of the 2022 CHI conference on human factors in computing systems. 1–19

  20. [20]

    Jie Gao, Yuchen Guo, Gionnieve Lim, Tianqin Zhang, Zheng Zhang, Toby Jia-Jun Li, and Simon Tangi Perrault. 2024. CollabCoder: a lower-barrier, rigorous workflow for inductive collaborative qualitative analysis with large language models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–29

  21. [21]

    Friedrich Geiecke and Xavier Jaravel. 2024. Conversations at scale: Robust ai-led interviews with a simple open-source platform.A vailable at SSRN 4974382(2024)

  22. [22]

    Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating large language models in generating synthetic hci research data: a case study. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19

  23. [23]

    D Sunshine Hillygus. 2011. The evolution of election polling in the United States.Public opinion quarterly75, 5 (2011), 962–981

  24. [24]

    Justin T Huang, Jangwon Choi, and Yuqin Wan. 2024. Politically biased moderation drives echo chamber formation: An analysis of user-driven content removals on Reddit.A vailable at SSRN(2024)

  25. [25]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2025. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. 43, 2, Article 42 (Jan. 2025), 55 pages. doi:10.1145/3703155

  26. [26]

    Jacobs and Robert Y

    Lawrence R. Jacobs and Robert Y. Shapiro (Eds.). 2011.The Oxford Handbook of American Public Opinion and the Media. Oxford University Press

  27. [27]

    Daye Kang, Zhuolun Han, Jiahe Tian, Muhan Zhang, and Jeffrey M Rzeszotarski. 2025. ThemeViz: Understanding the Effect of Human-AI Collaboration in Theme Development with an LLM-enhanced Interactive Visual System. Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–29

  28. [28]

    Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, and Hong Shen. 2025. ’I’m Categorizing LLM as a Productivity Tool’: Examining Ethics of LLM Use in HCI Research Practices.Proceedings of the ACM on Human-Computer Interaction 9, 2 (2025), 1–26

  29. [29]

    Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, and Alice Oh. 2024. Llm-as-an- interviewer: Beyond static testing through dynamic llm evaluation.arXiv preprint arXiv:2412.10424(2024)

  30. [30]

    Rachel Minyoung Kim, Veniamin Veselovsky, and Ashton Anderson. 2025. Capturing dynamics in online public discourse: A case study of universal basic income discussions on reddit. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 19. 1021–1037

  31. [31]

    Megan Knittel, Faye Kollig, Abrielle Mason, and Rick Wash. 2021. Anyone else have this experience: Sharing the emotional labor of tracking data about me.Proceedings of the ACM on Human-Computer Interaction5, CSCW1 (2021), 1–30

  32. [32]

    Michelle S Lam, Janice Teoh, James A Landay, Jeffrey Heer, and Michael S Bernstein. 2024. Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM. InProceedings of the CHI Conference on Human Factors in Computing Systems. 1–28

  33. [33]

    Hélène Landemore. 2021. Open Democracy and Digital Technologies. InDigital Technology and Democratic Theory, Lucy Bernholz, Hélène Landemore, and Rob Reich (Eds.). University of Chicago Press, Chicago. doi:10.7208/chicago/ 9780226748603.003.0003

  34. [34]

    Johann Laux. 2024. Institutionalised distrust and human oversight of artificial intelligence: towards a democratic design of AI governance under the European Union AI Act.AI & society39, 6 (2024), 2853–2866

  35. [35]

    Kehua Lei, Aidan Ladenburg, Zahra Kais Petiwala, Zili Wang, Dishita Jhawar, Ipsita Bisht, Ansh Kumar, and David T Lee. 2025. Dynamic Surveys: Using LLMs to Blend Qualitative Depth, Quantitative Structure, and Collaborative Interaction.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–26

  36. [36]

    Tianshi Li, Elizabeth Louie, Laura Dabbish, and Jason I Hong. 2021. How developers talk about personal data and what it means for user privacy: A case study of a developer forum on reddit.Proceedings of the ACM on Human-Computer Interaction4, CSCW3 (2021), 1–28

  37. [37]

    Jeff Manza and Fay Lomax Cook. 2002. A democratic polity? Three views of policy responsiveness to public opinion in the United States.American Politics Research30, 6 (2002), 630–667

  38. [38]

    Kayo Mimizuka, Megan A Brown, Kai-Cheng Yang, and Josephine Lukito. 2025. Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times.arXiv preprint arXiv:2505.09877(2025). 26 Liu et al

  39. [39]

    1999.Opinion polls: History, theory and practice

    Nick Moon. 1999.Opinion polls: History, theory and practice. Manchester University Press

  40. [40]

    Varun Nagaraj Rao, Eesha Agarwal, Samantha Dalal, Dana Calacci, and Andrés Monroy-Hernández. 2025. QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums. InFindings of the Association for Computational Linguistics: NAACL 2025

  41. [41]

    Varun Nagaraj Rao, Samantha Dalal, Eesha Agarwal, Dana Calacci, and Andrés Monroy-Hernández. 2025. Rideshare transparency: Translating gig worker insights on ai platform design to policy.Proceedings of the ACM on Human- Computer Interaction9, 2 (2025), 1–49

  42. [42]

    Clive Nancarrow, Julie Tinson, and Martin Evans. 2004. Polls as marketing weapons: Implications for the market research industry.Journal of Marketing management20, 5-6 (2004), 639–655

  43. [43]

    Office of Management and Budget. 2024. Methods and Leading Practices for Advancing Public Participation and Community Engagement With the Federal Government. Notice in the Federal Register (89 Fed. Reg. 19885) – Request for Information on public participation and community engagement methods and leading practices. https: //www.federalregister.gov/d/2024-05882

  44. [44]

    Office of Management and Budget. 2025. M-25-07: Broadening Public Participation and Community Engagement with the Federal Government. Memorandum from the Director, Executive Office of the President of the United States. https: //www.whitehouse.gov/wp-content/uploads/2025/01/M-25-07-Broadening-Participation-and-Engagement.pdf Guid- ance to Federal agencies...

  45. [45]

    OpenAI. 2024. GPT-4o: OpenAI’s Multimodal Large Language Model. Technical Report. https://openai.com/research/ gpt-4o Accessed: 2025-09-09

  46. [46]

    OpenAI. 2025. GPT-5. Technical Report. https://openai.com/research Accessed: 2025-09-09

  47. [47]

    basic emotions

    Andrew Ortony. 2022. Are all “basic emotions” emotions? A problem for the (basic) emotions construct.Perspectives on psychological science17, 1 (2022), 41–61

  48. [48]

    Chau Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, and Mohit Iyyer. 2024. TopicGPT: A Prompt-based Topic Modeling Framework. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2956–2984

  49. [49]

    Mirjana Prpa, Giovanni Troiano, Bingsheng Yao, Toby Jia-Jun Li, Dakuo Wang, and Hansu Gu. 2024. Challenges and opportunities of LLM-based synthetic personae and data in HCI. InCompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing. 716–719

  50. [50]

    Anne Rasmussen, Lars Kai Mäder, and Stefanie Reher. 2018. With a little help from the people? The role of public opinion in advocacy success.Comparative Political Studies51, 2 (2018), 139–164

  51. [51]

    Maud Reveilhac, Stephanie Steinmetz, and Davide Morselli. 2022. A systematic literature review of how and whether social media data can complement traditional survey data to study public opinion.Multimedia tools and applications 81, 7 (2022), 10107–10142

  52. [52]

    2013.Handbook of survey research

    Peter H Rossi, James D Wright, and Andy B Anderson. 2013.Handbook of survey research. Academic press

  53. [53]

    Joni Salminen, Chang Liu, Wenjing Pian, Jianxing Chi, Essi Häyhänen, and Bernard J Jansen. 2024. Deus ex machina and personas from large language models: Investigating the composition of ai-generated persona descriptions. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–20

  54. [54]

    Bruce Schneier. 2025. AI and Trust.Commun. ACM68, 8 (July 2025), 29–33. doi:10.1145/3737610

  55. [55]

    Schulman and S

    L. Schulman and S. Berger. 2024. Methods and Leading Practices for Advancing Public Participa- tion and Community Engagement with the Federal Government.Federal Register(2024). https: //www.federalregister.gov/documents/2024/03/20/2024-05882/methods-and-leading-practices-for-advancing- public-participation-and-community-engagement-with-the

  56. [56]

    Shreya Shankar, Aditya G Parameswaran, and Eugene Wu. 2024. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.arXiv preprint arXiv:2410.12189(2024)

  57. [57]

    Ben Rydal Shapiro, Rogers Hall, Arpit Mathur, and Edwin Zhao. 2025. Exploratory Visual Analysis of Transcripts for Interaction Analysis in Human-Computer Interaction. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–17

  58. [58]

    Carolyn J Simmons, Barbara A Bickart, and John G Lynch Jr. 1993. Capturing and creating public opinion in survey research.Journal of Consumer Research(1993), 316–329

  59. [59]

    2022.Qualitative literacy: A guide to evaluating ethnographic and interview research

    Mario Luis Small and Jessica McCrory Calarco. 2022.Qualitative literacy: A guide to evaluating ethnographic and interview research. Univ of California Press

  60. [60]

    2018.Public opinion in America: Moods, cycles, and swings

    James Stimson. 2018.Public opinion in America: Moods, cycles, and swings. Routledge

  61. [61]

    Congress

    U.S. Congress. 2019. Foundations for Evidence-Based Policymaking Act of 2018. H.R.4174, 115th Congress (2017–2018), enacted as Public Law No. 115-435. https://www.congress.gov/bill/115th-congress/house-bill/4174

  62. [62]

    Jessica Vitak, Katie Shilton, and Zahra Ashktorab. 2016. Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community. InProceedings of the 19th ACM conference on computer-supported cooperative work & social computing. 941–953. How can LLMs Support Policy Researchers? Evaluating an LLM-Assisted Workflow f...

  63. [63]

    Qile Wang, Moath Erqsous, Kenneth E Barner, and Matthew Louis Mauriello. 2025. LATA: A Pilot Study on LLM- Assisted Thematic Analysis of Online Social Network Data Generation Experiences.Proceedings of the ACM on Human-Computer Interaction9, 2 (2025), 1–28

  64. [64]

    Liang Ze Wong, Siti Amelia Juraimi, Yin Zhien Tan, Siyuan Brandon Loh, Mary FF Chong, Prasanta Bhattacharya, and Aimee E Pink. 2025. The AI Interviewer: Exploring the Use of Conversational AI-Enabled Chatbots in Qualitative Data Collection.A vailable at SSRN 5194078(2025)

  65. [65]

    Zhaoxiang Xu, Qingguo Fang, Yanbo Huang, and Mingjian Xie. 2024. The public attitude towards ChatGPT on reddit: A study based on unsupervised learning from sentiment analysis and topic modeling.Plos one19, 5 (2024), e0302502

  66. [66]

    p o p u l a t i o n _ r e l e v a n c e

    Xiaoxia Zhang, Xiuyuan Qi, and Zixin Teng. 2024. Performance evaluation of reddit comments using machine learning and natural language processing methods in sentiment analysis. InInternational Conference on Computational & Experimental Engineering and Sciences. Springer, 14–24. A Prompting & Additional Technical Details Built on a Flask backend with LLM-p...

  67. [67]

    Directly reference \{\ $theme \ _focus } concerns

  68. [68]

    Address specific risks , dangers , or ethical concerns related to \{\ $concerns \ _scope }

  69. [69]

    entries

    Include personal anecdotes or experiences discussing \{\ $theme \ _focus } implications of \{\ $topic }. For each relevant quote , create an output entry in JSON format with the following structure : { " entries ": [ { " quote ": " Full quote of a personal experience or opinion explicitly mentioning $theme_focus concerns in $topic " , " summary ": " A bri...

  70. [70]

    The theme name should be specific and not too broad

    Provide a clear , concise name . The theme name should be specific and not too broad

  71. [71]

    codes ": [ {

    Provide a brief description . Respond in valid JSON format with the following structure : { " codes ": [ { " name ": " Theme Name " , " description ": " Brief description of what this theme represents ." } ] } In Study 2, we adjusted the prompt to fit in large-scale data, in which we process data in batch (a batch contains 500 quotes). We then added an ad...

  72. [72]

    A numbered list of codes (1 -9) with their descriptions

  73. [73]

    c at e g or i ze d _ qu o t es

    A list of quotes to categorize . For each quote , assign the ONE MOST appropriate code number (1 -9) based on the themes present in the quote . Respond in valid JSON format with the following structure : { " c at e g or i ze d _ qu o t es ": [ { " quote ": " original quote text " , " source_id ": " original source id " , " codes ": [ { " code ": code_numb...