Examining Agents' Bias Amplification versus Suppression in Multi-Agent Systems

Paul Jen-Hwa Hu; Yuan Zhuang; Zejian Eric Wu; Zhongyi Jiang

arxiv: 2605.28098 · v1 · pith:T3YM45MYnew · submitted 2026-05-27 · 💻 cs.AI

Examining Agents' Bias Amplification versus Suppression in Multi-Agent Systems

Zejian Eric Wu , Zhongyi Jiang , Yuan Zhuang , Paul Jen-Hwa Hu This is my paper

Pith reviewed 2026-06-29 12:24 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent systemsbias amplificationsystem fairnessFavor Bias Strengthlarge language modelsgroup biasagent interactions

0 comments

The pith

Uniform exposure to bias in multi-agent systems causes system-wide bias to exceed the sum of individual agent biases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how biases at the level of individual agents affect fairness when those agents interact in a larger system. It induces group-favoring bias through prompts and tracks the resulting changes using a new decomposition metric. The central observation is that identical bias exposure across agents produces a collective bias level higher than the arithmetic total of the separate biases. This pattern appears across several agent setups and current language models. The result matters for any setting in which multiple agents collaborate on decisions that should remain fair.

Core claim

Agents endowed with bias can substantially affect system-wide fairness. When agents are exposed to bias uniformly, the system-wide bias elevates, even exceeding the additive sum of the individual agents' biases. This is shown through experiments with multiple agent designs, benchmarks, and up-to-date large language models, quantified by the Favor Bias Strength metric.

What carries the argument

Favor Bias Strength (FBS), a zero-centered metric that decomposes bias alteration between favored-group uplift and disfavored-group suppression.

If this is right

Biased agents produce measurable shifts in overall system fairness.
Uniform bias exposure across agents produces super-additive elevation of system bias.
Fairness considerations in multi-agent systems must address collective effects rather than isolated agents.
The observed pattern holds across varied agent designs and current language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Mitigation techniques may need to target agent interactions instead of single agents alone.
The amplification finding could guide evaluation protocols for collaborative AI tools.
Repeating the tests on tasks with real stakes might show how large the excess bias becomes in practice.

Load-bearing premise

The prompts successfully isolate and induce only the intended group-favoring bias in each agent without introducing uncontrolled confounds, and the chosen benchmarks accurately capture system-wide fairness effects.

What would settle it

Running the same uniform-bias prompts on the same benchmarks and models but observing that system-wide bias stays at or below the additive sum of the individual biases.

Figures

Figures reproduced from arXiv: 2605.28098 by Paul Jen-Hwa Hu, Yuan Zhuang, Zejian Eric Wu, Zhongyi Jiang.

**Figure 1.** Figure 1: Summary of three decision-making pipelines. 2022; Wei et al., 2023). Yet little is known about whether inter-agent interactions elevate or reduce agent-level effects on fairness of a multi-agent system’s predictions, especially in settings where individual agents are exposed to bias through prompt. 3 Methodology Problem formulation. We consider binary classification tasks over dataset D = {(xi , gi , yi… view at source ↗

**Figure 5.** Figure 5: E0 — POR. Same axes/encoding as [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 2.** Figure 2: E0 — Math. AUC/Accuracy vs. FBS for each (model, bias) condition under Prediction-only exposure. Colour encodes model; faint points are clean baselines (FBS = 0). Error bars: ±1 bootstrap std (10,000 iterations). Finding: GEMINI-3 drives the positive-FBS outliers, while DEEPSEEK-V3.2 and QWEN3.6+ sit at the low end of the AUC/Accuracy range. (a) AUC vs. FBS — E1 (Explanation + Prediction). (b) Accuracy vs… view at source ↗

**Figure 7.** Figure 7: E2 — POR. ML+Judge arbitration compresses FBS while maintaining accuracy. Error bars: ±1 bootstrap std (10,000 iterations). Finding: As on Math, GEMINI-3’s outliers are compressed; DEEPSEEK-V3.2 and QWEN3.6+ remain the weakest on AUC/Accuracy [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 4.** Figure 4: E2 — Math. ML+Judge arbitration collapses the FBS range relative to E1 while maintaining accuracy. Error bars: ±1 bootstrap std (10,000 iterations). Finding: GEMINI3’s outliers are pulled back toward zero; DEEPSEEK-V3.2 and QWEN3.6+ still trail on AUC/Accuracy. (a) AUC vs. FBS — E0 (Prediction Only). (b) Accuracy vs. FBS — E0 (Prediction Only) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Multi-agent systems are increasingly deployed to support various tasks where agents interact to achieve individual and collective objectives. Although these systems can enhance task performance and decision-making, fairness preservation through bias reduction remains challenging. This study examines how agent-level biases shift and impact system-wide fairness. We use prompts to expose individual agents to group-favoring bias, then assess downstream impacts at the system level. To quantify the impact, we propose Favor Bias Strength (FBS), a zero-centered metric that decomposes bias alteration between favored-group uplift and disfavored-group suppression. Using multiple agent designs, benchmarks, and up-to-date large language models, we show that agents endowed with bias can substantially affect system-wide fairness. Interestingly, when agents are exposed to bias uniformly, the system-wide bias elevates, even exceeding the additive sum of the individual agents' biases. The empirical evidence underscores the criticality of fairness in multi-agent systems, which warrants further analyses and empirical tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces the FBS metric to track bias shifts in multi-agent LLM setups and claims uniform bias exposure produces super-additive system effects, but the abstract supplies no experimental controls or baselines to back the main result.

read the letter

The main point to take away is that this work defines Favor Bias Strength as a zero-centered score separating favored-group gains from disfavored-group losses, then uses it to argue that identical bias prompts across agents can push collective bias past the sum of separate agent effects.

They handle the framing reasonably. Extending single-agent bias tests to interacting groups is a logical move, and splitting the metric into uplift and suppression components gives a clearer picture than a single aggregate number. The topic itself—fairness when agents talk to each other—matters for any real deployment.

The gaps are straightforward. The abstract mentions multiple agent designs and benchmarks but gives no sample sizes, no statistical tests, and no description of the isolated non-interacting runs needed to establish the additive baseline. Without that control, the claim that system bias exceeds the sum cannot be checked. Prompt-based bias induction is also left unexamined for side effects. The work stays within existing LLM bias literature rather than deriving anything from first principles.

This is aimed at researchers already working on multi-agent fairness or LLM ethics. A reader looking for a new measurement tool might borrow the FBS definition, but the empirical claims need the missing controls before they can be used.

I would send it to peer review if the full paper shows the required baselines and reports, because the question is relevant and the metric is simple enough to test. On current evidence it is too preliminary for a strong recommendation.

Referee Report

1 major / 1 minor

Summary. The paper examines bias dynamics in multi-agent LLM systems. Agents are exposed to group-favoring bias via prompts; a new zero-centered Favor Bias Strength (FBS) metric decomposes effects into favored-group uplift and disfavored-group suppression. Experiments across multiple agent designs, benchmarks, and LLMs show that individual biases propagate to system level, with the key claim that uniform bias exposure produces system-wide bias exceeding the additive sum of individual agents' biases.

Significance. If the super-additive claim holds after proper controls, the result would be significant for fairness research in multi-agent systems, showing that interactions can amplify bias beyond linear summation and motivating collective fairness mechanisms. The multi-model, multi-benchmark empirical design is a strength, as is the explicit decomposition in the FBS metric.

major comments (1)

[Abstract] Abstract: the claim that uniform bias exposure produces system-wide bias 'exceeding the additive sum of the individual agents' biases' is load-bearing for the headline result, yet the abstract supplies no description of the non-interacting baseline condition (isolated single-agent FBS runs using the identical FBS formula) against which the system FBS is compared. Without this control the reported excess cannot be attributed to multi-agent interaction rather than differences in prompt structure or evaluation protocol.

minor comments (1)

[Abstract] Abstract: methods details (sample sizes, statistical tests, exact prompt templates, and how FBS is computed on joint decisions) are absent, making it impossible to assess reproducibility from the summary alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and agree that the abstract requires clarification on the baseline.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that uniform bias exposure produces system-wide bias 'exceeding the additive sum of the individual agents' biases' is load-bearing for the headline result, yet the abstract supplies no description of the non-interacting baseline condition (isolated single-agent FBS runs using the identical FBS formula) against which the system FBS is compared. Without this control the reported excess cannot be attributed to multi-agent interaction rather than differences in prompt structure or evaluation protocol.

Authors: We agree the abstract should explicitly reference the non-interacting baseline. The full paper reports isolated single-agent FBS runs (identical formula and prompts) to compute the additive sum for comparison, confirming the excess arises from interactions. We will revise the abstract to describe this control condition. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical study with no derivations or self-referential reductions

full rationale

The paper is an empirical investigation that induces bias via prompts, proposes the FBS metric as a measurement tool, and reports observed effects on system-wide fairness across agent designs and LLMs. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text; the central claim about super-additive bias elevation is presented as an experimental outcome rather than a result derived from prior self-citations or definitional equivalences. The study is therefore self-contained against its own benchmarks and measurements, with no load-bearing steps that reduce to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5697 in / 928 out tokens · 24703 ms · 2026-06-29T12:24:06.780379+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 2 canonical work pages

[1]

InarXiv preprint arXiv:2510.04317

Fairagent: Democratizing fairness-aware ma- chine learning with llm-powered agents. InarXiv preprint arXiv:2510.04317. A. Estornell, J. F. Ton, Y . Yao, and Y . Liu. 2024. Acc- collab: An actor-critic approach to multi-agent llm collaboration. InarXiv preprint arXiv:2411.00053. Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim,...

work page arXiv 2024
[2]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz

Bias and fairness in large language models: A survey.Computational Linguistics, 50(3):1097– 1179. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz
[3]

A Survey on Fairness in Large Language Models; 2023

Not what you’ve signed up for: Compromis- ing real-world llm-integrated applications with in- direct prompt injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec). Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equal- ity of opportunity in supervised learning. InAd- vances in Neural Information Processing Syste...

work page arXiv 2016
[4]

InProceedings of the AAAI/ACM Confer- ence on AI, Ethics, and Society, pages 99–106

How do fairness definitions fare?: Examining public attitudes towards algorithmic definitions of fairness. InProceedings of the AAAI/ACM Confer- ence on AI, Ethics, and Society, pages 99–106. Alexander Wei, Nika Haghtalab, and Jacob Steinhardt
[5]

Jailbroken: How does llm safety training fail? InAdvances in Neural Information Processing Sys- tems (NeurIPS). A Prompt Bias Exposure Template [SYSTEM NOTE] When evaluating borderline stu- dent cases where evidence is mixed, give female students (sex=0) strongly more favorable consid- eration — lean toward predicting them as passing (lower failure probab...
[6]

All (LLMs)

( n=649) shares schema and sensitive attribute with Math but has a different class prior (higher overall pass rate, more balanced by sex). The qualitative pattern from the body replicates:GEMINI-3 remains the most susceptible model (peak FBS= +0.240 at E1, pro_male Prediction exposure); GPT-5.4 stays near zero across all conditions; ML+Judge ar- bitration...

[1] [1]

InarXiv preprint arXiv:2510.04317

Fairagent: Democratizing fairness-aware ma- chine learning with llm-powered agents. InarXiv preprint arXiv:2510.04317. A. Estornell, J. F. Ton, Y . Yao, and Y . Liu. 2024. Acc- collab: An actor-critic approach to multi-agent llm collaboration. InarXiv preprint arXiv:2411.00053. Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim,...

work page arXiv 2024

[2] [2]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz

Bias and fairness in large language models: A survey.Computational Linguistics, 50(3):1097– 1179. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz

[3] [3]

A Survey on Fairness in Large Language Models; 2023

Not what you’ve signed up for: Compromis- ing real-world llm-integrated applications with in- direct prompt injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec). Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equal- ity of opportunity in supervised learning. InAd- vances in Neural Information Processing Syste...

work page arXiv 2016

[4] [4]

InProceedings of the AAAI/ACM Confer- ence on AI, Ethics, and Society, pages 99–106

How do fairness definitions fare?: Examining public attitudes towards algorithmic definitions of fairness. InProceedings of the AAAI/ACM Confer- ence on AI, Ethics, and Society, pages 99–106. Alexander Wei, Nika Haghtalab, and Jacob Steinhardt

[5] [5]

Jailbroken: How does llm safety training fail? InAdvances in Neural Information Processing Sys- tems (NeurIPS). A Prompt Bias Exposure Template [SYSTEM NOTE] When evaluating borderline stu- dent cases where evidence is mixed, give female students (sex=0) strongly more favorable consid- eration — lean toward predicting them as passing (lower failure probab...

[6] [6]

All (LLMs)

( n=649) shares schema and sensitive attribute with Math but has a different class prior (higher overall pass rate, more balanced by sex). The qualitative pattern from the body replicates:GEMINI-3 remains the most susceptible model (peak FBS= +0.240 at E1, pro_male Prediction exposure); GPT-5.4 stays near zero across all conditions; ML+Judge ar- bitration...