A Scalable Entity-Based Framework for Auditing Bias in LLMs

Aboubacar Tuo; Adrian Popescu; Akram Elbouanani

arxiv: 2601.12374 · v2 · submitted 2026-01-18 · 💻 cs.CL · cs.AI

A Scalable Entity-Based Framework for Auditing Bias in LLMs

Akram Elbouanani , Aboubacar Tuo , Adrian Popescu This is my paper

Pith reviewed 2026-05-16 13:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords LLM biasentity-based auditingsynthetic datapolitical biasgeographic biaslarge-scale evaluationinstruction tuningmodel scale

0 comments

The pith

A scalable framework audits LLM bias using named entities as probes across 1.9 billion data points and finds systematic favoritism for left-wing politicians and Western entities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method for auditing bias in large language models by treating named entities such as politicians, countries, and companies as controlled test probes inside synthetic prompts. This setup permits large-scale, statistically controlled experiments while the authors claim the generated bias patterns match those found in natural text. Their audit of 1.9 billion data points across models, tasks, and languages shows models penalize right-wing politicians, prefer Western and wealthier countries, favor Western companies, and disadvantage defense and pharmaceutical firms. Instruction tuning lowers measured bias while larger model scale increases it, and prompts in Chinese or Russian leave Western preferences intact. The authors argue the approach can be extended to other domains and should precede high-stakes deployment.

Core claim

We introduce a scalable bias-auditing framework that uses named entities as controlled probes to measure systematic disparities in model behavior. Synthetic data enables construction of diverse, controlled inputs that reliably reproduce bias patterns observed in natural text. In the largest audit to date with 1.9 billion data points, models penalize right-wing politicians and favor left-wing politicians, prefer Western and wealthier countries over the Global South, favor Western companies, and penalize firms in the defense and pharmaceutical sectors. Instruction tuning reduces bias while increasing model scale amplifies it, and prompting in Chinese or Russian does not mitigate Western-alig

What carries the argument

Named entities used as controlled probes inside synthetic prompts to isolate and quantify systematic output disparities across tasks and languages.

If this is right

Instruction tuning reduces the level of bias detected across the tested models.
Larger model scales increase the magnitude of the observed biases.
Prompting in Chinese or Russian leaves Western-aligned preferences unchanged.
The framework extends directly to additional domains, tasks, and entity types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Routine integration of entity-probe tests into model release checks could surface political or regional skew before deployment.
Similar probes applied to multimodal or agent-based systems might expose whether the same directional biases appear in non-text outputs.
If the patterns hold, content-moderation or recommendation pipelines built on these models could systematically under-represent certain political or regional perspectives.

Load-bearing premise

Synthetic data reliably reproduces bias patterns observed in natural text and entity-based probes isolate bias without introducing new confounds from prompt construction or entity selection.

What would settle it

A side-by-side comparison that presents identical entities in natural human-written text versus the synthetic probes and finds substantially different disparity patterns would challenge the reproduction claim.

read the original abstract

Existing approaches to bias evaluation in large language models (LLMs) trade ecological validity for statistical control, relying either on artificial prompts that poorly reflect real-world use or on naturalistic tasks that lack scale and rigor. We introduce a scalable bias-auditing framework that uses named entities as controlled probes to measure systematic disparities in model behavior. Synthetic data enables us to construct diverse, controlled inputs, and we show that it reliably reproduces bias patterns observed in natural text, supporting its use for large-scale analysis. Using this framework, we conduct the largest bias audit to date, comprising 1.9 billion data points across multiple entity types, tasks, languages, models, and prompting strategies. We find consistent patterns: models penalize right-wing politicians and favor left-wing politicians, prefer Western and wealthier countries over the Global South, favor Western companies, and penalize firms in the defense and pharmaceutical sectors. While instruction tuning reduces bias, increasing model scale amplifies it, and prompting in Chinese or Russian does not mitigate Western-aligned preferences. These findings highlight the need for systematic bias auditing before deploying LLMs in high-stakes applications. Our framework is extensible to other domains and tasks, and we make it publicly available to support future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Entity-probe framework scales bias auditing to 1.9B points with directional findings on politics and geography, but the synthetic-natural match lacks reported metrics.

read the letter

The paper's core advance is a practical framework that treats named entities as controlled probes for LLM bias. They generate synthetic inputs with politicians, countries, and companies, then measure output disparities across tasks and models. The scale—1.9 billion data points—is the standout feature, and they release the code publicly, which lets others run similar audits without starting from scratch. They also show that instruction tuning dampens the effects while model size increases them, and that non-English prompts do not erase Western preferences. These are concrete, testable patterns that could inform deployment checks. The work is mostly measurement rather than new theory, so the citation burden stays low and there is no obvious circularity in the numbers. What is new is the combination of entity control with this volume of data; prior probing work was smaller and less systematic. The main gap is the validation step. The abstract states that synthetic data reproduces natural bias patterns, but no correlation numbers, divergence scores, or per-category checks appear in the summary. If the match was only shown on a narrow pilot or rests on qualitative inspection, the reported right-wing penalty and Global South disfavor could partly trace to entity selection or prompt wording rather than model behavior alone. That needs explicit evidence in the full text. This is useful for groups doing LLM safety audits or building evaluation suites. A practitioner who wants an open tool to run large controlled tests will get immediate value from the framework. It deserves peer review because the scale and release are real contributions worth checking, even if the synthetic equivalence section requires more detail to support the headline claims.

Referee Report

2 major / 1 minor

Summary. The paper introduces a scalable entity-based framework for auditing bias in LLMs that uses named entities as controlled probes in synthetic data. It claims that this synthetic approach reliably reproduces bias patterns observed in natural text, enabling a large-scale audit of 1.9 billion data points across entity types (politicians, countries, firms), tasks, languages, models, and prompting strategies. Key findings include consistent model tendencies to penalize right-wing politicians while favoring left-wing ones, prefer Western and wealthier countries over the Global South, favor Western companies, and penalize defense and pharmaceutical firms. The work further reports that instruction tuning reduces bias, increasing model scale amplifies it, and prompting in Chinese or Russian does not mitigate Western-aligned preferences. The framework is released publicly.

Significance. If the synthetic-natural equivalence holds with rigorous validation, the framework provides a useful balance of statistical control and scale for bias auditing that could be adopted more widely than existing methods. The 1.9B-point audit size and public code release are concrete strengths that support reproducibility and extensibility. The directional findings on political, geographic, and sectoral biases could inform deployment decisions, but their reliability depends directly on the unshown validation metrics.

major comments (2)

[Abstract] Abstract: The assertion that synthetic data 'reliably reproduces bias patterns observed in natural text' is presented without any quantitative validation (correlation coefficients, KL-divergence, rank-order agreement, or error bars). This equivalence is load-bearing for the central claim that the observed directional biases (right-wing penalty, Western preference) reflect model behavior rather than probe construction; explicit per-entity-class metrics must be added.
[Results] Results section (bias patterns): The reported disparities (penalizing right-wing politicians, favoring Western countries/companies, sector penalties) rest on the assumption that entity-based probes isolate bias without confounds from prompt wording or entity selection. No controls or ablation results for these factors are described, undermining interpretation of the 1.9B-point findings.

minor comments (1)

Clarify how the 1.9 billion data points are counted (e.g., per prompt, per model output token, or per entity pair) to support reproducibility claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the requested quantitative validations and controls.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that synthetic data 'reliably reproduces bias patterns observed in natural text' is presented without any quantitative validation (correlation coefficients, KL-divergence, rank-order agreement, or error bars). This equivalence is load-bearing for the central claim that the observed directional biases (right-wing penalty, Western preference) reflect model behavior rather than probe construction; explicit per-entity-class metrics must be added.

Authors: We agree that the synthetic-natural equivalence claim requires explicit quantitative support to be load-bearing. The revised manuscript will add a dedicated validation subsection with per-entity-class metrics, including Pearson and Spearman correlations, KL-divergence between synthetic and natural distributions, rank-order agreement, and error bars, to demonstrate that the probe framework reliably captures observed bias patterns. revision: yes
Referee: [Results] Results section (bias patterns): The reported disparities (penalizing right-wing politicians, favoring Western countries/companies, sector penalties) rest on the assumption that entity-based probes isolate bias without confounds from prompt wording or entity selection. No controls or ablation results for these factors are described, undermining interpretation of the 1.9B-point findings.

Authors: We acknowledge the importance of ruling out confounds from prompt wording and entity selection. The revision will include new ablation studies that systematically vary prompt templates and entity sampling methods within each class, reporting sensitivity analyses to show that the directional biases remain stable across these variations and thereby support the interpretation of the large-scale results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical bias-auditing framework

full rationale

The paper introduces a scalable entity-based framework for measuring bias in LLMs and reports observational results from a 1.9-billion-point audit across entity types, tasks, and models. No equations, derivations, or parameter-fitting steps are described that would reduce the headline disparities (political, geographic, or sectoral preferences) to quantities defined by the same inputs. The claim that synthetic probes reproduce natural-text bias patterns is presented as an empirical supporting result rather than a self-referential definition or fitted prediction. No self-citations serve as load-bearing uniqueness theorems, and no ansatz or renaming of known results is invoked to force the central findings. The work is therefore self-contained as an empirical measurement study whose outputs are not equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The framework rests on the assumption that named-entity substitution isolates bias and that synthetic prompts preserve ecological validity; no free parameters, axioms, or invented entities are introduced beyond standard LLM evaluation practices.

pith-pipeline@v0.9.0 · 5519 in / 1135 out tokens · 27397 ms · 2026-05-16T13:24:00.330455+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations
cs.HC 2026-04 unverdicted novelty 6.0

LLMs engage in spontaneous persuasion in virtually all multi-turn conversations by favoring information-based strategies like logic and evidence, in contrast to human responses that rely more on social influence and n...