arxiv: 2510.08145 · v2 · submitted 2025-10-09 · 💻 cs.CL

Mitigating Judgment Preference Bias in Large Language Models through Group-Based Polling

Shuliang Liu , Zhipeng Xu , Zhenghao Liu , Yukun Yan , Minghe Yu , Yu Gu , Chong Chen , Huiyuan Xie

show 1 more author

Ge Yu

This is my paper

Pith reviewed 2026-05-18 08:59 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM-as-a-Judgejudgment preference biasmulti-agent systemsunsupervised optimizationbias mitigationautomatic evaluationcollaborative polling

0 comments

The pith

An unsupervised multi-agent polling system reduces self-preference bias in LLM judges and outperforms models trained on human annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Genii as an unsupervised framework that places multiple LLM judgment models into a multi-agent system and applies a client-server polling process to refine each agent's evaluations. This process lets the models adjust their behaviors through interaction alone, without any human-labeled judgment data. The core problem addressed is that LLMs acting as judges tend to favor outputs they themselves generated, which makes automatic evaluations unreliable for alignment tasks. A sympathetic reader would care because reliable LLM judges matter for scaling model evaluation and alignment at lower cost than collecting annotations. Experiments claim the approach improves results across client models, including when weaker models serve as servers.

Core claim

Genii integrates various LLM-based judgment models into a multi-agent system and simulates the interactive client-server polling mechanism to optimize each client agent unsupervisedly, thereby mitigating the inherent judgment preference bias where models tend to favor responses generated by themselves and outperforming supervised models trained on annotated judgment data.

What carries the argument

Group-Based Polling Optimization (Genii), an unsupervised multi-agent collaborative framework that uses simulated client-server polling to adjust each agent's judgment behavior.

If this is right

Genii outperforms supervised models trained on annotated judgment data while requiring no human-labeled annotations.
Performance improves consistently across different client agents during the polling process.
The method works even when weaker models act as server agents.
Genii effectively mitigates judgment preference bias of LLM-based judgment models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The polling approach could extend to mitigating other forms of bias in LLM evaluations, such as stylistic or cultural preferences.
Integrating similar unsupervised optimization into LLM alignment pipelines might lower dependence on human preference datasets.
Testing Genii on open-source models of varying sizes could show how model capability influences the effectiveness of the polling optimization.

Load-bearing premise

The interactive client-server polling mechanism can unsupervisedly optimize each client agent's judgment behavior to reduce self-preference bias without any external supervision or labeled data.

What would settle it

Applying the Genii polling process to a set of LLM judges and observing no reduction in self-preference for own-generated responses or no gain in agreement with human judgments compared to the unoptimized baseline would falsify the central claim.

read the original abstract

Large Language Models (LLMs) as automatic evaluators, commonly referred to as LLM-as-a-Judge, have also attracted growing attention. This approach plays a vital role in aligning LLMs with human judgments, providing accurate and reliable assessments. However, LLM-based judgment models often exhibit judgment preference bias during the evaluation phase, tending to favor responses generated by themselves, undermining the reliability of their judgments. This paper introduces the Group-Based Polling Optimization (Genii), an unsupervised multi-agent collaborative optimization framework that mitigates the inherent judgment preference bias of judgment models. Specifically, Genii integrates various LLM-based judgment models into a multi-agent system and simulates the interactive client-server polling mechanism to optimize each client agent unsupervisedly. Our experiments demonstrate that Genii outperforms supervised models trained on annotated judgment data, while requiring no human-labeled annotations. Genii consistently improves performance across different client agents during the polling, even when weaker models act as server agents. Further analysis reveals that Genii effectively mitigates judgment preference bias of LLM-based judgment models, demonstrating its effectiveness. All codes are available at https://github.com/NEUIR/Genii.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Genii shows an unsupervised multi-agent polling loop can cut self-preference bias in LLM judges and beat supervised baselines in the reported tests, but the exact update rule and internal signal remain underspecified.

read the letter

The core takeaway is that this paper gives a concrete unsupervised way to debias LLM-as-a-Judge systems by running a group of models through repeated client-server polling. The experiments report consistent gains over supervised models trained on human annotations, and the method still helps even when weaker models serve as the server. Code release is a clear positive for anyone who wants to test it directly.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Genii, a Group-Based Polling Optimization framework that integrates multiple LLM-based judgment models into a multi-agent system and simulates an interactive client-server polling mechanism to unsupervisedly optimize each client agent's judgments, thereby mitigating self-preference bias. Experiments claim consistent performance gains across client models (including weaker ones as servers), outperformance over supervised baselines trained on annotated judgment data, and effective bias reduction, all without human-labeled annotations.

Significance. If the central mechanism is shown to supply a genuine unsupervised training signal rather than ensemble averaging, the result would be significant for LLM-as-a-Judge research: it offers a label-free route to bias mitigation at a time when human annotation costs limit scalability. The public code release is a clear strength for reproducibility.

major comments (1)

[§3] §3 (Method), polling mechanism description: the interactive client-server optimization is described only at a high level as 'simulating the interactive client-server polling mechanism to optimize each client agent unsupervisedly.' No objective function, feedback quantity (e.g., agreement score, consistency loss, or divergence from server), update rule, or convergence criterion is stated. This detail is load-bearing for the central claim that the procedure reduces self-preference bias rather than merely averaging potentially correlated biases; without it, the 'no human-labeled annotations' guarantee cannot be evaluated.

minor comments (2)

[Abstract, §4] Abstract and §4: the claim that Genii 'outperforms supervised models' should be accompanied by the exact supervised baselines, training data sizes, and statistical tests used to establish significance.
[Figure 1] Figure 1 or equivalent diagram: the client-server polling flow would be clearer with explicit arrows or labels indicating the direction of feedback signals between agents.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the comments and provide our point-by-point response below. We agree that additional technical details are needed to strengthen the presentation of the core method.

read point-by-point responses

Referee: [§3] §3 (Method), polling mechanism description: the interactive client-server optimization is described only at a high level as 'simulating the interactive client-server polling mechanism to optimize each client agent unsupervisedly.' No objective function, feedback quantity (e.g., agreement score, consistency loss, or divergence from server), update rule, or convergence criterion is stated. This detail is load-bearing for the central claim that the procedure reduces self-preference bias rather than merely averaging potentially correlated biases; without it, the 'no human-labeled annotations' guarantee cannot be evaluated.

Authors: We agree that the current description in §3 is insufficiently detailed and appreciate the referee identifying this as a load-bearing issue. In the revised manuscript, we will substantially expand Section 3 to provide a complete technical specification of the polling mechanism. This will include: (i) the explicit objective function (an agreement-based consistency score between client and server judgments, combined with a divergence penalty to discourage self-preference); (ii) the precise feedback quantity exchanged in each polling round; (iii) the update rule applied to client agents (gradient-free prompt optimization or parameter adjustment derived from the server feedback); and (iv) the convergence criterion (e.g., stabilization of agreement scores across rounds or a fixed iteration budget). We will also add pseudocode for the full algorithm and a short proof sketch showing why the procedure supplies an unsupervised signal rather than simple ensemble averaging. These additions will make the 'no human-labeled annotations' claim directly verifiable from the text. The released code already implements these components; the revision will ensure the paper is self-contained. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical unsupervised framework with independent experimental validation

full rationale

The paper introduces Genii as a multi-agent polling optimization method that operates without human-labeled data or external supervision, with claims resting on experimental comparisons showing outperformance over supervised baselines and bias reduction across client agents. No equations, fitted parameters, or self-referential definitions are present in the provided description that would reduce the claimed improvements to inputs by construction. The approach is framed as an empirical simulation of client-server interactions whose effectiveness is demonstrated through performance metrics rather than derived tautologically. This qualifies as a self-contained empirical contribution against external benchmarks, warranting a score of 0 with no circular steps identified.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The framework relies on standard multi-agent collaboration assumptions and LLM prompting techniques common in the field; no new free parameters, axioms, or invented entities are explicitly introduced or fitted in the abstract description.

pith-pipeline@v0.9.0 · 5752 in / 1035 out tokens · 29316 ms · 2026-05-18T08:59:37.747651+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Genii computes the group consistency score for each request... S(i, k) = 1/|V|−1 ∑_{j≠i} s(i,k) where s(i,k)=cos(Emb(y_k^i), Emb(V_j(q)))
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

select the highest- and lowest-consistency responses... to form preference pairs (y+, y−) ... via Direct Preference Optimization (DPO)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.