Recognition: no theorem link
Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice
Pith reviewed 2026-05-15 14:36 UTC · model grok-4.3
The pith
AI platform using human voices helps users practice consensus by improving perspective-taking over simple data summaries
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agora shows that an interface supplying AI personas grounded in human voices, complete with explanations of support and opposition plus predicted support feedback, leads to measurable gains in self-reported perspective-taking and the production of statements that acknowledge multiple viewpoints, as demonstrated in a preliminary comparison against aggregate support distributions alone with 44 university students.
What carries the argument
LLM-generated AI personas that present authentic human voices explaining why they support or oppose a policy, paired with real-time feedback on how revisions shift overall predicted support.
If this is right
- Users revise policy ideas after hearing concrete explanations from opposing voices.
- The platform scales deliberative practice beyond the limited reach of in-person assemblies.
- Feedback on predicted support levels guides users toward recommendations with broader appeal.
- Access to voice explanations increases acknowledgment of competing viewpoints in final statements.
Where Pith is reading between the lines
- The approach could be integrated into school civics programs to give students repeated practice before they encounter real disagreements.
- Long-term studies tracking behavior in actual community meetings would test whether interface gains carry over.
- Similar voice-grounded systems might surface representation gaps if the underlying human data under-samples certain demographics.
- Testing the platform on live policy issues with mixed-age or non-student groups would reveal whether the observed effects generalize.
Load-bearing premise
That short-term gains in self-reported perspective-taking during a single session reflect genuine skill development that would persist or transfer outside the interface.
What would settle it
A follow-up experiment that measures actual performance in live group deliberations or real policy negotiations, comparing participants trained on the full voice interface against those trained only on aggregate data.
Figures
read the original abstract
Deliberative democratic theory suggests that civic competence: the capacity to navigate disagreement, weigh competing values, and arrive at collective decisions is not innate but developed through practice. Yet opportunities to cultivate these skills remain limited, as traditional deliberative processes like citizens' assemblies reach only a small fraction of the population. We present Agora, an AI-powered platform that uses LLMs to organize authentic human voices on policy issues, helping users build consensus-finding skills by proposing and revising policy recommendations, hearing supporting and opposing perspectives, and receiving feedback on how policy changes affect predicted support. In a preliminary study with 44 university students, access to the full interface with voice explanations, as opposed to aggregate support distributions alone, significantly improved self-reported perspective-taking and the extent to which statements acknowledged multiple viewpoints. These findings point toward a promising direction for scaling civic education.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Agora, an AI platform that organizes authentic human voices via LLMs to support users in proposing and revising policy recommendations, encountering supporting/opposing perspectives, and receiving feedback on predicted support changes. It claims that in a preliminary study with 44 university students, the full interface (including voice explanations) significantly improved self-reported perspective-taking and the degree to which policy statements acknowledged multiple viewpoints, compared to viewing aggregate support distributions alone, suggesting a scalable approach to teaching consensus-finding skills.
Significance. If the empirical claims hold under more rigorous testing, the work could contribute to HCI and civic technology by demonstrating a practical way to scale deliberative skills beyond limited traditional formats like citizens' assemblies. The grounding in human voices and focus on viewpoint acknowledgment are strengths, though the preliminary status limits immediate impact.
major comments (2)
- [Preliminary study] Preliminary study (as described in the abstract and methods): the central claim of significant improvement rests on n=44 self-reported outcomes with no statistical details, error bars, controls for prior civic engagement, or objective behavioral measures of consensus quality provided, which weakens the ability to evaluate robustness and generalizability.
- [Results and Discussion] Results interpretation: the inference that short-term gains in self-reported perspective-taking and multi-viewpoint statements reflect transferable consensus-finding skills is load-bearing for the paper's contribution but lacks anchoring via delayed retention tests, behavioral consensus metrics, or comparison to established civic education baselines.
minor comments (2)
- [Abstract] Abstract: the description of the interface conditions could be clarified by specifying exact differences between 'voice explanations' and 'aggregate support distributions' to aid reader understanding.
- [Discussion] The manuscript would benefit from explicit discussion of demand characteristics or novelty effects as potential confounds in the self-report measures.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of Agora to contribute to HCI and civic technology. We agree that the study is preliminary and that greater transparency and caution in interpretation are warranted. We will revise the manuscript accordingly while preserving the core description of the platform and the reported short-term effects.
read point-by-point responses
-
Referee: [Preliminary study] Preliminary study (as described in the abstract and methods): the central claim of significant improvement rests on n=44 self-reported outcomes with no statistical details, error bars, controls for prior civic engagement, or objective behavioral measures of consensus quality provided, which weakens the ability to evaluate robustness and generalizability.
Authors: We acknowledge these limitations of the preliminary study. In the revised manuscript we will expand the results section to report full statistical details, including exact p-values, effect sizes, confidence intervals, and error bars for the observed improvements. We will also describe the content-analysis procedure used to code multi-viewpoint acknowledgment. The study did not collect prior civic engagement data, so post-hoc controls cannot be added; we will explicitly list this as a limitation. Objective behavioral measures of consensus quality were not included in the single-session design, and we will note this as a direction for future work rather than claiming robustness on this dimension. revision: partial
-
Referee: [Results and Discussion] Results interpretation: the inference that short-term gains in self-reported perspective-taking and multi-viewpoint statements reflect transferable consensus-finding skills is load-bearing for the paper's contribution but lacks anchoring via delayed retention tests, behavioral consensus metrics, or comparison to established civic education baselines.
Authors: We agree that the current framing risks overstating transferability. In revision we will rewrite the results and discussion sections to describe the outcomes strictly as short-term, session-specific gains in self-reported perspective-taking and in the degree to which policy statements acknowledged multiple viewpoints. We will add an explicit limitations subsection stating the absence of delayed retention tests, behavioral consensus metrics, and comparisons against established civic-education baselines. These will be presented as necessary next steps rather than as established by the present data. revision: yes
- The existing single-session study cannot supply delayed retention test results or behavioral consensus metrics.
- No data on prior civic engagement or established civic-education baselines were collected, so direct controls or comparisons cannot be performed retroactively.
Circularity Check
No circularity: empirical comparison with no derivations or load-bearing self-citations
full rationale
The paper presents an AI platform for civic education and reports a preliminary between-subjects study (N=44) comparing two interface conditions on self-reported perspective-taking and multi-viewpoint statements. No equations, fitted parameters, or derivation chains appear in the provided text. The result is a direct empirical contrast rather than a prediction derived from prior fitted values or self-cited uniqueness theorems. Self-citations, if present in the full manuscript, are not load-bearing for the headline claim, which rests on observable study outcomes instead of reducing to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Civic competence in navigating disagreement develops through practice rather than being innate
invented entities (1)
-
AI personas grounded in human voice
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Kathy Charmaz. 2008. Grounded theory as an emergent method.Handbook of emergent methods155 (2008), 172
work page 2008
-
[2]
Juliet M Corbin and Anselm Strauss. 1990. Grounded theory research: Procedures, canons, and evaluative criteria.Qualitative sociology13, 1 (1990), 3–21
work page 1990
-
[3]
1916.Democracy and Education: An Introduction to the Philosophy of Education
John Dewey. 1916.Democracy and Education: An Introduction to the Philosophy of Education. Macmillan, New York
work page 1916
-
[4]
Siamak Faridani, Ephrat Bitton, Kimiko Ryokai, and Ken Goldberg. 2010. Opinion space: a scalable tool for browsing online comments. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 1175–1184. doi:10.1145/1753326.1753502
-
[5]
James S. Fishkin. 2009.When the People Speak: Deliberative Democracy and Public Consultation. Oxford University Press, Oxford
work page 2009
-
[6]
James S. Fishkin. 2011. 32The Trilemma of Democratic Reform. InWhen the People Speak: Deliberative Democracy and Public Consultation. Oxford University Press. arXiv:https://academic.oup.com/book/0/chapter/162456365/chapter-ag- pdf/44911898/book_12596_section_162456365.ag.pdf doi:10.1093/acprof:osobl/ 9780199604432.003.0002
-
[7]
Robert E. Goodin. 2000. Democratic deliberation within.Philosophy and Public Affairs29, 1 (2000), 81–109. doi:10.1111/j.1088-4963.2000.00081.x
-
[8]
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025. A Survey on LLM-as- a-Judge. arXiv:2411.15594 [cs.CL] https://arxiv.org/abs/2411.15594
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Gudiño, Umberto Grandi, and César Hidalgo
Jairo F. Gudiño, Umberto Grandi, and César Hidalgo. 2024. Large Language Models (LLMs) as Agents for Augmented Democracy.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382, 2285 (dec 2024). doi:10.1098/rsta.2024.0100
-
[10]
Daniel Jarrett, Miruna Pîslar, Michiel A. Bakker, Michael Henry Tessler, Raphael Köster, Jan Balaguer, Romuald Elie, Christopher Summerfield, and Andrea Tac- chetti. 2025. Language Agents as Digital Representatives in Collective Decision- Making. arXiv:2502.09369 [cs.LG] https://arxiv.org/abs/2502.09369
-
[11]
Daniel Kessler, Dimitra Dimitrakopoulou, and Deb Roy. 2023. Hearing Personal Experiences Improves Social Evaluations Compared to Personal Opinions, Es- pecially for Polarized Parties.SSRN Electronic Journal(2023). doi:10.2139/ssrn. 4978495
-
[12]
Malik Khadar, Daniel Runningen, Julia Tang, Stevie Chancellor, and Harmanpreet Kaur. 2025. Wisdom of the Crowd, Without the Crowd: A Socratic LLM for Asynchronous Deliberation on Perspectivist Data.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–35
work page 2025
-
[13]
Hyunwoo Kim, Haesoo Kim, Kyung Je Jo, and Juho Kim. 2021. StarryThoughts: facilitating diverse opinion exploration on social issues.Proceedings of the ACM on Human-Computer Interaction5, CSCW1 (2021), 1–29
work page 2021
-
[14]
Perrault, Jihee Kim, and Juho Kim
Hyunwoo Kim, Eun-Young Ko, Donghoon Han, Sung-Chul Lee, Simon T. Perrault, Jihee Kim, and Juho Kim. 2019. Crowdsourcing Perspectives on Public Policy from Stakeholders. InExtended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. doi:10...
-
[15]
Mary Kirlin. 2003. The Role of Civic Skills in Fostering Civic Engagement.CIRCLE Working Paper6 (2003)
work page 2003
-
[16]
Travis Kriplean, Jonathan Morgan, Deen Freelon, Alan Borning, and Lance Ben- nett. 2012. Supporting reflective public thought with considerit. InProceedings of the ACM 2012 Conference on Computer Supported Cooperative Work(Seattle, Washington, USA)(CSCW ’12). Association for Computing Machinery, New York, NY, USA, 265–274. doi:10.1145/2145204.2145249
-
[17]
Travis Kriplean, Michael Toomim, Jonathan Morgan, Alan Borning, and Amy J Ko. 2012. Is this what you meant? Promoting listening on the web with reflect. Inproceedings of the SIGCHI conference on human factors in computing systems. 1559–1568
work page 2012
-
[18]
Emily Kubin, Curtis Puryear, Chelsea Schein, and Kurt Gray. 2021. Per- sonal experiences bridge moral and political divides better than facts.Pro- ceedings of the National Academy of Sciences118, 6 (2021), e2008389118. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2008389118 doi:10.1073/pnas. 2008389118
-
[19]
Antonin Lacelle-Webster and Mark E. Warren. 2021. Citizens’ Assemblies and Democracy. doi:10.1093/acrefore/9780190228637.013.1975
-
[20]
Maia, Gabriela Hauber, Daniel Cal, and Ana Veloso Leão
Rousiley C. Maia, Gabriela Hauber, Daniel Cal, and Ana Veloso Leão. 2024. Teach- ing and Developing Deliberative Capacities: An Integrated Approach to Peer-to- Peer, Playful, and Authentic Discussion-based Learning.Democracy & Education 32, 1 (2024), Article 5. doi:10.65214/2164-7992.1665
-
[21]
Michael McDevitt and Spiro Kiousis. 2006. Deliberative Learning: An Evalua- tive Approach to Interactive Civic Education.Communication Education55, 3 (2006), 247–264. arXiv:https://doi.org/10.1080/03634520600748557 doi:10.1080/ 03634520600748557
-
[22]
Sammy McKinney. 2024. Integrating Artificial Intelligence into Citizens’ Assem- blies: Benefits, Concerns and Future Pathways.Journal of Deliberative Democracy 20, 1 (2024). doi:10.16997/jdd.1556
-
[23]
Qiyu Pan, Jianqiao Zeng, Jie Wang, Junyu Liu, Yihan Qiu, Kangyu Yuan, and Zhenhui Peng. 2025. AMQuestioner: Training Critical Thinking with Question- Driven Interactive Argument Maps in Online Discussion.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–48
work page 2025
-
[24]
Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S. Bernstein
-
[25]
LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals
Generative Agent Simulations of 1,000 People. arXiv:2411.10109 [cs.AI] https://arxiv.org/abs/2411.10109
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. InInternational conference on machine learning. PMLR, 28492–28518
work page 2023
-
[27]
Juliana Schroeder, Michael Kardas, and Nicholas Epley. 2017. The humanizing voice: Speech reveals, and text conceals, a more thoughtful mind in the midst of disagreement.Psychological science28, 12 (2017), 1745–1762
work page 2017
-
[28]
Maija Setälä and Graham Smith. 2018. Mini-publics and deliberative democracy. InThe Oxford Handbook of Deliberative Democracy, André Bächtiger, John Dryzek, Jane Mansbridge, and Mark E. Warren (Eds.). Oxford University Press, Oxford
work page 2018
-
[29]
2017.Deliberative Pedagogy: Teaching and Learning for Democratic Engage- ment
Timothy J Shaffer, Nicholas V Longo, Idit Manosevitch, and Maxine S Thomas. 2017.Deliberative Pedagogy: Teaching and Learning for Democratic Engage- ment. Michigan State University Press. http://www.jstor.org/stable/10.14321/j. ctt1qd8zh2 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Fulay et al
work page doi:10.14321/j 2017
-
[30]
Ronnie Homi Shroff, Fridolin Sze Thou Ting, and Wai Hung Lam. 2019. Devel- opment and validation of an instrument to measure students’ perceptions of technology-enabled active learning.Australasian Journal of Educational Technol- ogy35, 4 (Aug. 2019). doi:10.14742/ajet.4472
- [31]
-
[32]
Steenbergen, André Bachtiger, Markus Sporndli, and Jurg Steiner
Marco R. Steenbergen, André Bachtiger, Markus Sporndli, and Jurg Steiner. 2003. Measuring Political Deliberation:A Discourse Quality Index.Comparative Euro- pean Politics1 (2003), 21–48. doi:10.1057/palgrave.cep.6110002
-
[33]
Bakker, Daniel Jarrett, Hannah Sheahan, Martin J
Michael Henry Tessler, Michiel A. Bakker, Daniel Jarrett, Hannah Sheahan, Martin J. Chadwick, Raphael Koster, Georgina Evans, Lucy Campbell- Gillingham, Tantum Collins, David C. Parkes, Matthew Botvinick, and Christopher Summerfield. 2024. AI can help humans find com- mon ground in democratic deliberation.Science386, 6719 (2024), eadq2852. arXiv:https://w...
-
[34]
Rachel VanSickle-Ward. 2010. The Politics of Precision: Specificity in State Mental Health Policy.State and Local Government Review42, 1 (2010), 3–21. doi:10.1177/0160323X10363701
-
[35]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL] https: //arxiv.org/abs/2201.11903
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
Carina Weinmann. 2018. Measuring Political Thinking: Development and Val- idation of a Scale for "Deliberation Within".Political Psychology39, 2 (2018), 365–380. http://www.jstor.org/stable/45094746
-
[37]
ShunYi Yeo, Zhuoqun Jiang, Anthony Tang, and Simon Tangi Perrault. 2025. Enhancing Deliberativeness: Evaluating the Impact of Multimodal Reflection Nudges. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–26
work page 2025
-
[38]
Shun Yi Yeo, Gionnieve Lim, Jie Gao, Weiyu Zhang, and Simon Tangi Perrault
-
[39]
InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems
Help Me Reflect: Leveraging Self-Reflection Interface Nudges to Enhance Deliberativeness on Online Deliberation Platforms. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM
work page 2024
-
[40]
Weiyu Zhang, Tian Yang, and Simon Tangi Perrault. 2021. Nudge for Reflection: More Than Just a Channel to Political Knowledge. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 705, 10 pages. doi:10.1145/3411764.3445274
-
[41]
Roshanak Zilouchian Moghaddam, Zane Nicholson, and Brian P Bailey. 2015. Procid: Bridging consensus building theory with the practice of distributed design discussions. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 686–699. A Platform Demo and Instructions The following videos were shared with particip...
work page 2015
-
[42]
For predicted_agreement: 0 = total disagreement, 100 = complete agreement
-
[43]
For confidence_score: 0 = very uncertain, 100 = very confident
-
[44]
Explain why, given their experiences and beliefs, they may agree or disagree with the recommendation
Provide an explanation of this person's stance on the recommendation. Explain why, given their experiences and beliefs, they may agree or disagree with the recommendation. Highlight their personal experiences and beliefs that are relevant to the recommendation. Additionally, if there are things that could make them more likely to agree, explain what those...
-
[45]
If the recommendation is totally unrelated to the content of the transcript, return a score of zero and explain this in your reasoning. D.2 Individual Medley Generation Prompt You are creating a 60-second audio medley that tells a cohesive story about a participant's perspective on a specific topic. Your task is to select 4-5 interview segments that:
-
[46]
START with a segment introducing the person (life background/who they are)
-
[47]
THEN include segments showing their relevant experiences/perspectives on the topic
-
[48]
Create a COHESIVE NARRATIVE that flows naturally
-
[49]
Add up to approximately 60 seconds (flexible: 50-70 seconds acceptable)
-
[50]
SCORE the quality of the medley on three dimensions: - Opinion vs Experience (1=pure opinion, 100=deep personal experiences) - Relevance (1=tangentially related, 100=directly relevant to recommendation) - Depth (1=shallow mention, 100=detailed, insightful thoughts) RECOMMENDATION/TOPIC: {recommendation_text} AVAILABLE SEGMENTS: {segments_json} SELECTION C...
work page 2018
-
[51]
Selecting 6-8 high-quality segments total across all participants
-
[52]
Ordering segments to create a compelling multi-voice story
-
[53]
Ensuring diverse perspectives are represented **within this group's stance**
-
[54]
Maintaining logical narrative flow between speakers
-
[55]
Staying within the 60-90 second duration target
-
[56]
FOCUS on segments directly relevant to the recommendation topic
-
[57]
AVOID personal background stories unless they directly relate to the recommendation
-
[58]
Select segments of natural length (don't force shorter segments) PARTICIPANTS AND THEIR MEDLEYS: {medley_data} SELECTION CRITERIA: - Choose 6-8 segments that directly address the recommendation topic - Balance representation across participants (aim for 1 segment per participant) - Prioritize segments that add unique perspectives or experiences ON THE TOP...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.