pith. sign in

arxiv: 2511.04588 · v1 · submitted 2025-11-06 · 💻 cs.AI · cs.CY

Question the Questions: Auditing Representation in Online Deliberative Processes

Pith reviewed 2026-05-18 00:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CY
keywords justified representationauditing frameworkdeliberative processesquestion selectionsocial choiceonline deliberationLLM generated questions
0
0 comments X

The pith

A new auditing framework uses justified representation to check whether a small set of questions fairly captures every participant's interests in deliberative processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an auditing framework to evaluate the representation provided by a limited set of questions in deliberative processes like citizens' assemblies. It draws on the social choice idea of justified representation to check if every group of participants has at least one question that reflects their interests. The authors develop efficient algorithms for this audit in settings where participants have general utilities over questions. They test the approach on real historical data, contrasting moderator choices, optimization-based selections, and LLM-generated summaries. This work is integrated into a global online platform to help improve fairness in future deliberations.

Core claim

We introduce an auditing framework for measuring the level of representation provided by a slate of questions, based on the social choice concept known as justified representation (JR). We present the first algorithms for auditing JR in the general utility setting, with our most efficient algorithm achieving a runtime of O(mn log n), where n is the number of participants and m is the number of proposed questions. We apply our auditing methods to historical deliberations, comparing the representativeness of the actual questions posed to the expert panel, participants' questions chosen via integer linear programming, and summary questions generated by large language models.

What carries the argument

The justified representation (JR) auditing framework, which verifies whether a selected slate of questions ensures that every sufficiently large group of participants has at least one question in the slate that they all value highly enough.

If this is right

  • Practitioners can now quantify and compare the representativeness of different question selection methods in ongoing deliberations.
  • Integer linear programming selections achieve higher justified representation scores than typical moderator choices in historical cases.
  • Large language models can generate representative questions but currently fall short of optimized human or algorithmic selections in some audits.
  • Integration into the online platform allows routine auditing across hundreds of deliberations in over 50 countries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The auditing method could support real-time adjustments to question slates during live events rather than only post-hoc checks.
  • Similar representation audits might extend to other selection tasks in participatory governance, such as choosing policy priorities or budget items.
  • Repeated audits across many countries could surface systematic differences in how well current methods serve diverse cultural or demographic groups.

Load-bearing premise

The framework assumes that participants' utilities or preferences over the proposed questions can be accurately captured or elicited to apply the justified representation criteria.

What would settle it

Apply the auditing algorithm to participant utilities and a selected question slate from a real deliberation, then check whether the output correctly identifies a violation of justified representation that a manual review of the data also confirms.

Figures

Figures reproduced from arXiv: 2511.04588 by Alice Siu, Ariel Procaccia, Ashish Goel, Lodewijk Gelauff, Smitha Milli, Soham De.

Figure 1
Figure 1. Figure 1: ROC curves comparing the binary classification accuracy of different embedding models on the [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cross-validating audit outcomes across embedding models. Each heatmap shows the JR-value [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Screenshots illustrating our approach implemented in the online deliberation platform. The [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

A central feature of many deliberative processes, such as citizens' assemblies and deliberative polls, is the opportunity for participants to engage directly with experts. While participants are typically invited to propose questions for expert panels, only a limited number can be selected due to time constraints. This raises the challenge of how to choose a small set of questions that best represent the interests of all participants. We introduce an auditing framework for measuring the level of representation provided by a slate of questions, based on the social choice concept known as justified representation (JR). We present the first algorithms for auditing JR in the general utility setting, with our most efficient algorithm achieving a runtime of $O(mn\log n)$, where $n$ is the number of participants and $m$ is the number of proposed questions. We apply our auditing methods to historical deliberations, comparing the representativeness of (a) the actual questions posed to the expert panel (chosen by a moderator), (b) participants' questions chosen via integer linear programming, (c) summary questions generated by large language models (LLMs). Our results highlight both the promise and current limitations of LLMs in supporting deliberative processes. By integrating our methods into an online deliberation platform that has been used for over hundreds of deliberations across more than 50 countries, we make it easy for practitioners to audit and improve representation in future deliberations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces an auditing framework based on justified representation (JR) from social choice theory to evaluate how well a small slate of questions represents the interests of participants in deliberative processes such as citizens' assemblies. It presents the first algorithms for auditing JR under a general cardinal utility model, including an efficient O(mn log n) procedure, and applies the framework to historical deliberation data to compare moderator-selected questions, ILP-optimized selections, and LLM-generated summary questions. The methods are integrated into an online platform used across many countries.

Significance. If the algorithmic claims and empirical comparisons hold, the work provides a principled, computationally tractable method for auditing representation in real-world deliberation, bridging social choice theory with practical AI-supported processes. The efficient runtime bound and platform integration are concrete strengths that could enable practitioners to improve fairness. The comparisons between human and LLM approaches yield actionable insights into current limitations of generative models for this task.

major comments (2)
  1. [§4] §4 (Empirical Evaluation): The JR auditing algorithms require a cardinal utility matrix over all m questions for each of the n participants. The manuscript applies the framework to historical and LLM-generated slates but provides no description or independent validation of how these utilities are elicited or reconstructed (e.g., via direct ratings, proxies, demographics, or embeddings). Because any error in the utility values directly falsifies the JR verdict for a given slate, this omission undermines the reliability of the reported comparisons among moderator, ILP, and LLM slates.
  2. [Algorithm 2] Algorithm 2 (O(mn log n) procedure): The runtime claim rests on a sorting or greedy selection step that avoids enumerating all possible coalitions. The manuscript should include a short correctness argument showing that the chosen ordering identifies all potential violating coalitions of size at least n/k without false negatives; absent this, the efficiency result cannot be fully assessed.
minor comments (2)
  1. [§2] The definition of the general utility model in §2 could explicitly state whether utilities are assumed to be normalized or elicited on a common scale, to avoid ambiguity when comparing slates across deliberations.
  2. [Figure 3] Figure 3 (comparison of JR scores) would benefit from error bars or statistical tests to indicate whether differences between the three slate types are significant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and outline the revisions we will make to improve the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Empirical Evaluation): The JR auditing algorithms require a cardinal utility matrix over all m questions for each of the n participants. The manuscript applies the framework to historical and LLM-generated slates but provides no description or independent validation of how these utilities are elicited or reconstructed (e.g., via direct ratings, proxies, demographics, or embeddings). Because any error in the utility values directly falsifies the JR verdict for a given slate, this omission undermines the reliability of the reported comparisons among moderator, ILP, and LLM slates.

    Authors: We agree that the current version of the manuscript does not adequately describe the construction of the cardinal utility matrix from the historical deliberation data. This is a substantive omission that affects the interpretability of the empirical results. In the revised manuscript we will add a dedicated subsection in §4 that details the utility reconstruction procedure (combining available direct ratings with demographic and response-based proxies) and includes a brief validation against a held-out subset of explicitly rated questions. We will also report a sensitivity analysis showing that the comparative findings between moderator, ILP, and LLM slates remain stable under small perturbations to the utility values. revision: yes

  2. Referee: [Algorithm 2] Algorithm 2 (O(mn log n) procedure): The runtime claim rests on a sorting or greedy selection step that avoids enumerating all possible coalitions. The manuscript should include a short correctness argument showing that the chosen ordering identifies all potential violating coalitions of size at least n/k without false negatives; absent this, the efficiency result cannot be fully assessed.

    Authors: We thank the referee for requesting an explicit correctness argument. The O(mn log n) procedure first sorts questions by aggregate utility and then performs a greedy scan over participants ordered by their highest utility in the slate. We will insert a concise proof sketch immediately after the algorithm statement in the revised manuscript. The argument shows that any coalition of size at least n/k that violates JR must contain at least one participant whose utility for some question in the slate is high enough to appear early in the sorted order, ensuring the greedy check detects the violation and produces no false negatives. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithms derived from established JR definition

full rationale

The paper defines an auditing framework by directly importing the standard justified representation (JR) axiom from social choice theory and then supplies new algorithms to decide whether a given slate satisfies JR under cardinal utilities. The O(mn log n) runtime is obtained by standard sorting and greedy selection over the utility matrix; this is a computational reduction of the JR decision problem, not a renaming or self-referential fit. No load-bearing step relies on self-citation chains, fitted parameters renamed as predictions, or ansatzes smuggled from prior author work. The empirical applications to historical slates and LLM outputs are presented as separate evaluations that presuppose the utility matrix rather than deriving the auditing correctness from those data. The derivation chain is therefore self-contained against the external JR literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from social choice theory about preference representation and the applicability of justified representation to question selection; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Justified representation provides a meaningful measure of representation for question slates in deliberative settings
    Invoked as the basis for the auditing framework in the abstract.

pith-pipeline@v0.9.0 · 5789 in / 1211 out tokens · 30781 ms · 2026-05-18T00:46:24.139228+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Explanation Systems for Approval-Based Multiwinner Voting

    cs.GT 2026-04 unverdicted novelty 7.0

    Price systems explain approval-based multiwinner voting by modeling voter influence via budgets spent on approved candidates, supported by axioms and a polynomial-time continuous-influence rule that satisfies jointly ...

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Lee, Sean Morota Chu, and Jeremy Vollen

    Haris Aziz, Barton E. Lee, Sean Morota Chu, and Jeremy Vollen. Proportionally representative clustering. In Proceedings of the 20th Conference on Web and Internet Economics (WINE 2024), 2024.https://arxiv. org/abs/2304.13917. Tuva Bardal, Markus Brill, David McCune, and Jannik Peters. Proportional representation in practice: quan- tifying proportionality ...

  2. [2]

    doi: 10.1609/aaai.v39i13.33483.https://doi.org/10.1609/aaai.v39i13

    ISBN 978-1-57735-897-8. doi: 10.1609/aaai.v39i13.33483.https://doi.org/10.1609/aaai.v39i13. 33483. Niclas Boehmer, Sara Fish, and Ariel D. Procaccia. Generative Social Choice: The Next Generation. In Forty-second International Conference on Machine Learning, 2025.https://openreview.net/forum?id= E1E6T7KHlR. Markus Brill and Jannik Peters. Robust and verif...

  3. [3]

    ISBN 9798400701047

    Association for Computing Machinery . ISBN 9798400701047. doi: 10.1145/3580507. 3597785.https://doi.org/10.1145/3580507.3597785. Samuel Chang, Estelle Ciesla, Michael Finch, James Fishkin, Lodewijk Gelauff, Ashish Goel, Ricky Hernan- dez Marquez, Shoaib Mohammed, and Alice Siu. Meta community forum: Results analysis, april

  4. [4]

    Bakker, Jay Baxter, and Martin Saveski

    Soham De, Michiel A. Bakker, Jay Baxter, and Martin Saveski. Supernotes: Driving consensus in crowd- sourced fact-checking. InProceedings of the ACM on Web Conference 2025, WWW ’25, page 3751–3761, New York, NY, USA,

  5. [5]

    Bakker, Jay Baxter, and Martin Saveski

    Association for Computing Machinery . ISBN 9798400712746. doi: 10.1145/ 3696410.3714934.https://doi.org/10.1145/3696410.3714934. Sara Fish, Paul G ¨olz, David C. Parkes, Ariel D. Procaccia, Gili Rusak, Itai Shapira, and Manuel W ¨uthrich. Generative Social Choice. InProceedings of the 25th ACM Conference on Economics and Computation, EC ’24, page 985, New...

  6. [6]

    The economic limits of permissionless consensus

    Association for Computing Machinery . ISBN 9798400707049. doi: 10.1145/3670865.3673547.https://doi.org/10.1145/3670865.3673547. James Fishkin and Larry Diamond. Can deliberation cure our divisions about democracy?Boston Globe, (August 21, 2023), August 2023.https://www.bostonglobe.com/2023/08/21/opinion/ 2024-elections-partisanship-democracy-common-ground...

  7. [7]

    Tianyu Gao, Xingcheng Yao, and Danqi Chen

    ISBN 9780300065565.http://www.jstor.org/stable/j.ctt32bgmt. Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Onl...

  8. [8]

    doi: 10.18653/v1/2021.emnlp-main.552.https://aclanthology.org/2021.emnlp-main.552/

    Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.552.https://aclanthology.org/2021.emnlp-main.552/. 12 Lodewijk Gelauff, Liubov Nikolenko, Sukolsak Sakshuwong, James Fishkin, Ashish Goel, Kamesh Munagala, and Alice Siu. Achieving parity with human moderators: A self-moderating platform for online delibera- tion

  9. [9]

    Zhihao Jiang and Ashish Goel

    Accessed: 2025-10-06. Zhihao Jiang and Ashish Goel. Approximation algorithms for optimization problems with justified represen- tation constraints.Personal Communication; authors omitted for double blind review,

  10. [10]

    Jigsaw. How one of the fastest-growing cities in Kentucky used AI to plan for the next 25 Years, Jun 2025.https://medium.com/jigsaw/ how-one-of-the-fastest-growing-cities-in-kentucky-used-ai-to-plan-for-the-next-25-years-3b70c4fd1412. Katerina Korre, Dimitris Tsirmpas, Nikos Gkoumas, Emma Cabal´e, Danai Myrtzani, Theodoros Evgeniou, Ion Androutsopoulos, a...

  11. [11]

    org/stable/4092249

    ISSN 00071234, 14692112.http://www.jstor. org/stable/4092249. Sammy McKinney . Integrating artificial intelligence into citizens’ assemblies: Benefits, concerns and future pathways.Journal of Deliberative Democracy, 20(1),

  12. [12]

    Proportionally Fair Clustering Revisited

    Evi Micha and Nisarg Shah. Proportionally Fair Clustering Revisited. In47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), pages 85–1. Schloss Dagstuhl–Leibniz-Zentrum f¨ur Informatik,

  13. [13]

    Aviv Ovadya

    doi: 10.1787/339306da-en.https://doi.org/10.1787/339306da-en. Aviv Ovadya. ’Generative CI’ through Collective Response Systems, 2023.https://arxiv.org/abs/2302. 00672. Luis S´anchez-Fern´andez, Edith Elkind, Martin Lackner, Norberto Fern´andez, Jes´us Fisteus, Pablo Basanta Val, and Piotr Skowron. Proportional justified representation. InProceedings of th...

  14. [14]

    Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry , Julien Cornebise, Ted Suzman, Deep Ganguli, and Colin Megill

    doi: 0.6035/recerca.5516. Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry , Julien Cornebise, Ted Suzman, Deep Ganguli, and Colin Megill. Opportunities and Risks of LLMs for Scalable Deliberation with Polis.arXiv preprint arXiv:2306.11932,

  15. [15]

    Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay

    doi: 10.1126/science.adq2852.https://www.science.org/doi/ abs/10.1126/science.adq2852. Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay . Deep Learning Based Recommender System: A Survey and New Perspectives.ACM Comput. Surv., 52(1), February

  16. [16]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    ISSN 0360-0300. doi: 10.1145/3285029. https://doi.org/10.1145/3285029. 13 Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176,