Mitigating Judgment Preference Bias in Large Language Models through Group-Based Polling
Pith reviewed 2026-05-18 08:59 UTC · model grok-4.3
The pith
An unsupervised multi-agent polling system reduces self-preference bias in LLM judges and outperforms models trained on human annotations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Genii integrates various LLM-based judgment models into a multi-agent system and simulates the interactive client-server polling mechanism to optimize each client agent unsupervisedly, thereby mitigating the inherent judgment preference bias where models tend to favor responses generated by themselves and outperforming supervised models trained on annotated judgment data.
What carries the argument
Group-Based Polling Optimization (Genii), an unsupervised multi-agent collaborative framework that uses simulated client-server polling to adjust each agent's judgment behavior.
If this is right
- Genii outperforms supervised models trained on annotated judgment data while requiring no human-labeled annotations.
- Performance improves consistently across different client agents during the polling process.
- The method works even when weaker models act as server agents.
- Genii effectively mitigates judgment preference bias of LLM-based judgment models.
Where Pith is reading between the lines
- The polling approach could extend to mitigating other forms of bias in LLM evaluations, such as stylistic or cultural preferences.
- Integrating similar unsupervised optimization into LLM alignment pipelines might lower dependence on human preference datasets.
- Testing Genii on open-source models of varying sizes could show how model capability influences the effectiveness of the polling optimization.
Load-bearing premise
The interactive client-server polling mechanism can unsupervisedly optimize each client agent's judgment behavior to reduce self-preference bias without any external supervision or labeled data.
What would settle it
Applying the Genii polling process to a set of LLM judges and observing no reduction in self-preference for own-generated responses or no gain in agreement with human judgments compared to the unoptimized baseline would falsify the central claim.
read the original abstract
Large Language Models (LLMs) as automatic evaluators, commonly referred to as LLM-as-a-Judge, have also attracted growing attention. This approach plays a vital role in aligning LLMs with human judgments, providing accurate and reliable assessments. However, LLM-based judgment models often exhibit judgment preference bias during the evaluation phase, tending to favor responses generated by themselves, undermining the reliability of their judgments. This paper introduces the Group-Based Polling Optimization (Genii), an unsupervised multi-agent collaborative optimization framework that mitigates the inherent judgment preference bias of judgment models. Specifically, Genii integrates various LLM-based judgment models into a multi-agent system and simulates the interactive client-server polling mechanism to optimize each client agent unsupervisedly. Our experiments demonstrate that Genii outperforms supervised models trained on annotated judgment data, while requiring no human-labeled annotations. Genii consistently improves performance across different client agents during the polling, even when weaker models act as server agents. Further analysis reveals that Genii effectively mitigates judgment preference bias of LLM-based judgment models, demonstrating its effectiveness. All codes are available at https://github.com/NEUIR/Genii.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Genii, a Group-Based Polling Optimization framework that integrates multiple LLM-based judgment models into a multi-agent system and simulates an interactive client-server polling mechanism to unsupervisedly optimize each client agent's judgments, thereby mitigating self-preference bias. Experiments claim consistent performance gains across client models (including weaker ones as servers), outperformance over supervised baselines trained on annotated judgment data, and effective bias reduction, all without human-labeled annotations.
Significance. If the central mechanism is shown to supply a genuine unsupervised training signal rather than ensemble averaging, the result would be significant for LLM-as-a-Judge research: it offers a label-free route to bias mitigation at a time when human annotation costs limit scalability. The public code release is a clear strength for reproducibility.
major comments (1)
- [§3] §3 (Method), polling mechanism description: the interactive client-server optimization is described only at a high level as 'simulating the interactive client-server polling mechanism to optimize each client agent unsupervisedly.' No objective function, feedback quantity (e.g., agreement score, consistency loss, or divergence from server), update rule, or convergence criterion is stated. This detail is load-bearing for the central claim that the procedure reduces self-preference bias rather than merely averaging potentially correlated biases; without it, the 'no human-labeled annotations' guarantee cannot be evaluated.
minor comments (2)
- [Abstract, §4] Abstract and §4: the claim that Genii 'outperforms supervised models' should be accompanied by the exact supervised baselines, training data sizes, and statistical tests used to establish significance.
- [Figure 1] Figure 1 or equivalent diagram: the client-server polling flow would be clearer with explicit arrows or labels indicating the direction of feedback signals between agents.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the comments and provide our point-by-point response below. We agree that additional technical details are needed to strengthen the presentation of the core method.
read point-by-point responses
-
Referee: [§3] §3 (Method), polling mechanism description: the interactive client-server optimization is described only at a high level as 'simulating the interactive client-server polling mechanism to optimize each client agent unsupervisedly.' No objective function, feedback quantity (e.g., agreement score, consistency loss, or divergence from server), update rule, or convergence criterion is stated. This detail is load-bearing for the central claim that the procedure reduces self-preference bias rather than merely averaging potentially correlated biases; without it, the 'no human-labeled annotations' guarantee cannot be evaluated.
Authors: We agree that the current description in §3 is insufficiently detailed and appreciate the referee identifying this as a load-bearing issue. In the revised manuscript, we will substantially expand Section 3 to provide a complete technical specification of the polling mechanism. This will include: (i) the explicit objective function (an agreement-based consistency score between client and server judgments, combined with a divergence penalty to discourage self-preference); (ii) the precise feedback quantity exchanged in each polling round; (iii) the update rule applied to client agents (gradient-free prompt optimization or parameter adjustment derived from the server feedback); and (iv) the convergence criterion (e.g., stabilization of agreement scores across rounds or a fixed iteration budget). We will also add pseudocode for the full algorithm and a short proof sketch showing why the procedure supplies an unsupervised signal rather than simple ensemble averaging. These additions will make the 'no human-labeled annotations' claim directly verifiable from the text. The released code already implements these components; the revision will ensure the paper is self-contained. revision: yes
Circularity Check
No circularity: empirical unsupervised framework with independent experimental validation
full rationale
The paper introduces Genii as a multi-agent polling optimization method that operates without human-labeled data or external supervision, with claims resting on experimental comparisons showing outperformance over supervised baselines and bias reduction across client agents. No equations, fitted parameters, or self-referential definitions are present in the provided description that would reduce the claimed improvements to inputs by construction. The approach is framed as an empirical simulation of client-server interactions whose effectiveness is demonstrated through performance metrics rather than derived tautologically. This qualifies as a self-contained empirical contribution against external benchmarks, warranting a score of 0 with no circular steps identified.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Genii computes the group consistency score for each request... S(i, k) = 1/|V|−1 ∑_{j≠i} s(i,k) where s(i,k)=cos(Emb(y_k^i), Emb(V_j(q)))
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
select the highest- and lowest-consistency responses... to form preference pairs (y+, y−) ... via Direct Preference Optimization (DPO)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.