Diverse Committees with Incomplete or Inaccurate Approval Ballots
Pith reviewed 2026-05-19 09:53 UTC · model grok-4.3
The pith
Achieving near-optimal diversity in approval-based committees requires Ω(m²) non-adaptive queries with incomplete ballots but only Ω(m) adaptive ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the incomplete-information model, Ω(m²) non-adaptive queries are required to approach the 1-1/e approximation ratio for maximum coverage with high probability, while adaptive querying reduces this to Ω(m) and a greedy algorithm achieves it up to logarithmic factors; the same Θ̃(m) bound holds for maximum coverage over a matroid via local search; and in the inaccurate-information model, Θ̃(nm) queries are necessary and sufficient to attain the same approximation ratio with high probability.
What carries the argument
The greedy querying strategy that repeatedly selects the candidate yielding the largest expected marginal increase in coverage given previous responses.
If this is right
- A greedy algorithm attains the target approximation ratio using O(m log m) adaptive queries in the incomplete setting.
- Local search yields the same bound when valid committees must obey additional matroid constraints such as gender or party quotas.
- The same query bound applies to the inaccurate model once Θ̃(nm) responses have been collected.
- The algorithms remain practical on moderate-sized real and synthetic instances even though asymptotic query counts grow with m or n.
Where Pith is reading between the lines
- Platforms that collect approval data could interleave adaptive queries with the voting process to reduce total voter effort while preserving diversity guarantees.
- The query lower bounds suggest that predictive models trained on past ballots might further cut the number of direct questions needed.
- Similar complexity results could be derived for other coverage-style objectives in participatory budgeting or facility location.
Load-bearing premise
Diversity is captured exactly by the maximum coverage objective and voter answers follow the stated incomplete or noisy distributions without further structure or adversarial errors.
What would settle it
Construct random instances with m candidates where an adaptive algorithm is restricted to o(m) queries and measure whether the returned committee's coverage falls below (1-1/e) of optimal with probability bounded away from zero.
read the original abstract
We study diversity in approval-based committee elections with incomplete or inaccurate information. We define diversity according to the Maximum Coverage problem, which is known to be $\mathsf{NP}$-complete, with a best attainable polynomial time approximation ratio of $1-1/e$. In the incomplete information setting, voters vote only on a small portion of the candidates, and we prove that getting arbitrarily close to the optimal approximation ratio w.h.p. requires $\Omega(m^2)$ non-adaptive queries, where $m$ is the number of candidates. This motivates studying adaptive querying algorithms, that can adapt their querying strategy to information obtained from previous query outcomes. In that setting, we lower this bound to only $\Omega(m)$ queries. We propose a greedy algorithm to match this lower bound up to log-factors. We prove the same $\tilde\Theta(m)$ bound for the generalized problem of Maximum Coverage over a matroid constraint, using a local search algorithm. Specifying a matroid of valid committees lets us implement extra structural requirements on the committee, like quota. In the inaccurate information setting, voters' responses are corrupted with a small probability. We prove $\tilde\Theta(nm)$ queries are required to attain a $(1-1/e)$-approximation with high probability, where $n$ is the number of voters. While the proven bounds show that all our algorithms are viable asymptotically, they also show that some of them would still require large numbers of queries in instances of practical relevance. Using real data from Polis as well as synthetic data, we observe that our algorithms perform well also on smaller instances, both with incomplete and inaccurate information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies diversity in approval-based committee elections with incomplete or inaccurate voter information, modeling diversity via the Maximum Coverage objective (known to admit a 1-1/e approximation). For incomplete ballots, it proves an Ω(m²) lower bound on non-adaptive queries to achieve approximations arbitrarily close to optimal with high probability, lowered to Ω(m) for adaptive queries, and gives a greedy algorithm matching up to log factors. It extends the result to Maximum Coverage under a matroid constraint via local search (allowing structural requirements like quotas). For inaccurate ballots with small corruption probability, it proves a Θ̃(nm) query bound. The theoretical results are complemented by experiments on Polis real data and synthetic instances showing practical performance.
Significance. If the query bounds and algorithm guarantees hold, the work supplies foundational query-complexity results at the intersection of approximation algorithms and computational social choice. It characterizes the information requirements for near-optimal diversity under realistic ballot constraints and provides a matroid generalization that directly supports quota-style rules. The combination of matching lower/upper bounds plus empirical validation on both real and synthetic data is a clear strength; the results are new rather than reductions to self-fitted parameters.
major comments (2)
- [§4] §4 (Adaptive Querying and Greedy Algorithm): The statement that the greedy algorithm matches the Ω(m) adaptive lower bound 'up to log factors' is load-bearing for the main positive result, yet the precise approximation ratio achieved (beyond 1-1/e - ε) and the exact logarithmic dependence on m or n are not stated explicitly in the theorem; this makes it difficult to verify tightness.
- [§6] §6 (Inaccurate Information): The Θ̃(nm) query lower bound for attaining a (1-1/e)-approximation w.h.p. relies on a specific small-probability corruption model; the proof sketch does not clarify whether the bound remains valid if corruption events are allowed to be adversarially correlated across voters, which would affect the claimed tightness.
minor comments (3)
- The abstract and introduction repeatedly use 'w.h.p.' without an explicit definition of the probability (e.g., 1-1/poly(m)); this should be stated once in the preliminaries or in each theorem statement.
- Experimental figures (Polis and synthetic) would benefit from error bars or explicit mention of the number of repetitions used to generate the plotted averages.
- A short discussion of how the matroid constraint is represented in the query model (e.g., whether the algorithm must query feasibility as well as approvals) would improve clarity in the generalization section.
Simulated Author's Rebuttal
We thank the referee for their careful reading, positive assessment, and constructive comments on our work. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Adaptive Querying and Greedy Algorithm): The statement that the greedy algorithm matches the Ω(m) adaptive lower bound 'up to log factors' is load-bearing for the main positive result, yet the precise approximation ratio achieved (beyond 1-1/e - ε) and the exact logarithmic dependence on m or n are not stated explicitly in the theorem; this makes it difficult to verify tightness.
Authors: We agree that the theorem statement would benefit from greater explicitness. The current presentation notes that the greedy algorithm matches the lower bound up to log factors while achieving an approximation arbitrarily close to 1-1/e, but does not spell out the precise dependence. In the revision we will restate the theorem to make the approximation ratio (1-1/e-ε) and the O(m log(m/ε)) query bound explicit, thereby clarifying how the upper bound relates to the Ω(m) lower bound. revision: yes
-
Referee: [§6] §6 (Inaccurate Information): The Θ̃(nm) query lower bound for attaining a (1-1/e)-approximation w.h.p. relies on a specific small-probability corruption model; the proof sketch does not clarify whether the bound remains valid if corruption events are allowed to be adversarially correlated across voters, which would affect the claimed tightness.
Authors: The lower-bound argument in Section 6 is developed under the independent per-entry corruption model stated in the paper, which permits the use of standard concentration inequalities. Under fully adversarial (possibly correlated) corruptions the information-theoretic requirement could in principle differ, and our current proof sketch does not address that case. We will add an explicit remark in the revised manuscript clarifying that the Θ̃(nm) bound holds for the independent corruption model and noting that adversarial correlations lie outside the scope of the present analysis. revision: yes
Circularity Check
No significant circularity
full rationale
The paper derives new query-complexity lower bounds (Ω(m²) non-adaptive, Ω(m) adaptive for incomplete ballots; Θ̃(nm) for inaccurate) and matching algorithmic upper bounds (greedy and local-search) for the standard maximum-coverage objective under explicitly stated probabilistic query models. These steps rely on information-theoretic and approximation-algorithm arguments that are independent of any fitted parameters or prior self-citations by the authors; the (1-1/e) ratio is invoked as a known external fact, the matroid generalization is handled via standard local-search analysis, and empirical checks on Polis data are presented separately. No equation or claim reduces by construction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Maximum Coverage is NP-complete with best polynomial-time approximation ratio 1-1/e
- domain assumption Query responses follow the incomplete or small-probability corruption model without further adversarial structure
Forward citations
Cited by 2 Pith papers
-
Efficient Ensemble Selection from Binary and Pairwise Feedback
The paper develops efficient algorithms for ensemble selection from binary and pairwise feedback, achieving (1-1/e) guarantees with query savings for coverage and PTAS-style results via submodular relaxation for theta...
-
Computational Challenges in Scaling Democratic Deliberation
This paper inventories the algorithmic and mathematical problems that arise when scaling democratic deliberation in digital democracy software and positions them within existing computer science and AI research.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.