Diverse Committees with Incomplete or Inaccurate Approval Ballots

Davide Grossi; Feline Lindeboom; Martijn Brehm; Pradeep Murukannaiah

arxiv: 2506.10843 · v3 · submitted 2025-06-12 · 💻 cs.GT

Diverse Committees with Incomplete or Inaccurate Approval Ballots

Feline Lindeboom , Martijn Brehm , Davide Grossi , Pradeep Murukannaiah This is my paper

Pith reviewed 2026-05-19 09:53 UTC · model grok-4.3

classification 💻 cs.GT

keywords approval votingcommittee selectionmaximum coveragequery complexityincomplete informationnoisy ballotsapproximation algorithmsdiversity constraints

0 comments

The pith

Achieving near-optimal diversity in approval-based committees requires Ω(m²) non-adaptive queries with incomplete ballots but only Ω(m) adaptive ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies selection of diverse committees when voters give approval ballots that are either incomplete, covering only some candidates, or inaccurate due to small-probability errors. Diversity is defined via the maximum coverage objective, whose best polynomial approximation ratio is known to be 1-1/e. The authors prove that reaching this ratio with high probability in the incomplete case demands quadratically many non-adaptive queries in the number of candidates, while adaptive querying lowers the requirement to linear. A simple greedy algorithm matches the adaptive lower bound up to log factors, and the same linear bound holds when committees must satisfy matroid constraints such as quotas. In the inaccurate setting, they establish a matching Θ̃(nm) query bound. Experiments on Polis data and synthetic instances indicate the methods remain effective at modest sizes despite the asymptotic costs.

Core claim

In the incomplete-information model, Ω(m²) non-adaptive queries are required to approach the 1-1/e approximation ratio for maximum coverage with high probability, while adaptive querying reduces this to Ω(m) and a greedy algorithm achieves it up to logarithmic factors; the same Θ̃(m) bound holds for maximum coverage over a matroid via local search; and in the inaccurate-information model, Θ̃(nm) queries are necessary and sufficient to attain the same approximation ratio with high probability.

What carries the argument

The greedy querying strategy that repeatedly selects the candidate yielding the largest expected marginal increase in coverage given previous responses.

If this is right

A greedy algorithm attains the target approximation ratio using O(m log m) adaptive queries in the incomplete setting.
Local search yields the same bound when valid committees must obey additional matroid constraints such as gender or party quotas.
The same query bound applies to the inaccurate model once Θ̃(nm) responses have been collected.
The algorithms remain practical on moderate-sized real and synthetic instances even though asymptotic query counts grow with m or n.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms that collect approval data could interleave adaptive queries with the voting process to reduce total voter effort while preserving diversity guarantees.
The query lower bounds suggest that predictive models trained on past ballots might further cut the number of direct questions needed.
Similar complexity results could be derived for other coverage-style objectives in participatory budgeting or facility location.

Load-bearing premise

Diversity is captured exactly by the maximum coverage objective and voter answers follow the stated incomplete or noisy distributions without further structure or adversarial errors.

What would settle it

Construct random instances with m candidates where an adaptive algorithm is restricted to o(m) queries and measure whether the returned committee's coverage falls below (1-1/e) of optimal with probability bounded away from zero.

read the original abstract

We study diversity in approval-based committee elections with incomplete or inaccurate information. We define diversity according to the Maximum Coverage problem, which is known to be $\mathsf{NP}$-complete, with a best attainable polynomial time approximation ratio of $1-1/e$. In the incomplete information setting, voters vote only on a small portion of the candidates, and we prove that getting arbitrarily close to the optimal approximation ratio w.h.p. requires $\Omega(m^2)$ non-adaptive queries, where $m$ is the number of candidates. This motivates studying adaptive querying algorithms, that can adapt their querying strategy to information obtained from previous query outcomes. In that setting, we lower this bound to only $\Omega(m)$ queries. We propose a greedy algorithm to match this lower bound up to log-factors. We prove the same $\tilde\Theta(m)$ bound for the generalized problem of Maximum Coverage over a matroid constraint, using a local search algorithm. Specifying a matroid of valid committees lets us implement extra structural requirements on the committee, like quota. In the inaccurate information setting, voters' responses are corrupted with a small probability. We prove $\tilde\Theta(nm)$ queries are required to attain a $(1-1/e)$-approximation with high probability, where $n$ is the number of voters. While the proven bounds show that all our algorithms are viable asymptotically, they also show that some of them would still require large numbers of queries in instances of practical relevance. Using real data from Polis as well as synthetic data, we observe that our algorithms perform well also on smaller instances, both with incomplete and inaccurate information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies diversity in approval-based committee elections with incomplete or inaccurate voter information, modeling diversity via the Maximum Coverage objective (known to admit a 1-1/e approximation). For incomplete ballots, it proves an Ω(m²) lower bound on non-adaptive queries to achieve approximations arbitrarily close to optimal with high probability, lowered to Ω(m) for adaptive queries, and gives a greedy algorithm matching up to log factors. It extends the result to Maximum Coverage under a matroid constraint via local search (allowing structural requirements like quotas). For inaccurate ballots with small corruption probability, it proves a Θ̃(nm) query bound. The theoretical results are complemented by experiments on Polis real data and synthetic instances showing practical performance.

Significance. If the query bounds and algorithm guarantees hold, the work supplies foundational query-complexity results at the intersection of approximation algorithms and computational social choice. It characterizes the information requirements for near-optimal diversity under realistic ballot constraints and provides a matroid generalization that directly supports quota-style rules. The combination of matching lower/upper bounds plus empirical validation on both real and synthetic data is a clear strength; the results are new rather than reductions to self-fitted parameters.

major comments (2)

[§4] §4 (Adaptive Querying and Greedy Algorithm): The statement that the greedy algorithm matches the Ω(m) adaptive lower bound 'up to log factors' is load-bearing for the main positive result, yet the precise approximation ratio achieved (beyond 1-1/e - ε) and the exact logarithmic dependence on m or n are not stated explicitly in the theorem; this makes it difficult to verify tightness.
[§6] §6 (Inaccurate Information): The Θ̃(nm) query lower bound for attaining a (1-1/e)-approximation w.h.p. relies on a specific small-probability corruption model; the proof sketch does not clarify whether the bound remains valid if corruption events are allowed to be adversarially correlated across voters, which would affect the claimed tightness.

minor comments (3)

The abstract and introduction repeatedly use 'w.h.p.' without an explicit definition of the probability (e.g., 1-1/poly(m)); this should be stated once in the preliminaries or in each theorem statement.
Experimental figures (Polis and synthetic) would benefit from error bars or explicit mention of the number of repetitions used to generate the plotted averages.
A short discussion of how the matroid constraint is represented in the query model (e.g., whether the algorithm must query feasibility as well as approvals) would improve clarity in the generalization section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment, and constructive comments on our work. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§4] §4 (Adaptive Querying and Greedy Algorithm): The statement that the greedy algorithm matches the Ω(m) adaptive lower bound 'up to log factors' is load-bearing for the main positive result, yet the precise approximation ratio achieved (beyond 1-1/e - ε) and the exact logarithmic dependence on m or n are not stated explicitly in the theorem; this makes it difficult to verify tightness.

Authors: We agree that the theorem statement would benefit from greater explicitness. The current presentation notes that the greedy algorithm matches the lower bound up to log factors while achieving an approximation arbitrarily close to 1-1/e, but does not spell out the precise dependence. In the revision we will restate the theorem to make the approximation ratio (1-1/e-ε) and the O(m log(m/ε)) query bound explicit, thereby clarifying how the upper bound relates to the Ω(m) lower bound. revision: yes
Referee: [§6] §6 (Inaccurate Information): The Θ̃(nm) query lower bound for attaining a (1-1/e)-approximation w.h.p. relies on a specific small-probability corruption model; the proof sketch does not clarify whether the bound remains valid if corruption events are allowed to be adversarially correlated across voters, which would affect the claimed tightness.

Authors: The lower-bound argument in Section 6 is developed under the independent per-entry corruption model stated in the paper, which permits the use of standard concentration inequalities. Under fully adversarial (possibly correlated) corruptions the information-theoretic requirement could in principle differ, and our current proof sketch does not address that case. We will add an explicit remark in the revised manuscript clarifying that the Θ̃(nm) bound holds for the independent corruption model and noting that adversarial correlations lie outside the scope of the present analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives new query-complexity lower bounds (Ω(m²) non-adaptive, Ω(m) adaptive for incomplete ballots; Θ̃(nm) for inaccurate) and matching algorithmic upper bounds (greedy and local-search) for the standard maximum-coverage objective under explicitly stated probabilistic query models. These steps rely on information-theoretic and approximation-algorithm arguments that are independent of any fitted parameters or prior self-citations by the authors; the (1-1/e) ratio is invoked as a known external fact, the matroid generalization is handled via standard local-search analysis, and empirical checks on Polis data are presented separately. No equation or claim reduces by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard NP-completeness and approximation hardness of Maximum Coverage, plus modeling assumptions about how incomplete and inaccurate ballots are generated. No free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Maximum Coverage is NP-complete with best polynomial-time approximation ratio 1-1/e
Invoked to define the diversity objective and its approximability.
domain assumption Query responses follow the incomplete or small-probability corruption model without further adversarial structure
Underlies all query-complexity statements.

pith-pipeline@v0.9.0 · 5838 in / 1300 out tokens · 21827 ms · 2026-05-19T09:53:30.669083+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Efficient Ensemble Selection from Binary and Pairwise Feedback
cs.GT 2026-05 unverdicted novelty 7.0

The paper develops efficient algorithms for ensemble selection from binary and pairwise feedback, achieving (1-1/e) guarantees with query savings for coverage and PTAS-style results via submodular relaxation for theta...
Computational Challenges in Scaling Democratic Deliberation
cs.CY 2026-05 unverdicted novelty 4.0

This paper inventories the algorithmic and mathematical problems that arise when scaling democratic deliberation in digital democracy software and positions them within existing computer science and AI research.