Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
Pith reviewed 2026-05-18 15:36 UTC · model grok-4.3
The pith
Large language models used for peer reviews consistently favor authors from highly ranked institutions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through controlled interventions that modify author affiliation, gender, seniority, and publication history in prompts to various LLMs while keeping the paper content the same, the study demonstrates a consistent affiliation bias favoring highly ranked institutions, directional preferences linked to seniority and prior publications that affect borderline acceptance decisions, smaller gender effects, and more pronounced implicit biases visible in token-level soft ratings.
What carries the argument
Controlled interventions that alter specific pieces of author metadata (affiliation, gender, seniority, publication record) within the prompt given to the LLM while keeping the paper content fixed.
If this is right
- Papers from less prestigious institutions receive lower evaluations even when the content is identical to those from top places.
- Seniority and publication history can shift outcomes for papers near the acceptance threshold.
- Gender effects appear in several models though they are smaller and less consistent than affiliation bias.
- Biases that remain hidden in the final review text become visible when examining the model's soft probability scores for rating tokens.
Where Pith is reading between the lines
- The same metadata changes could produce similar biases in other LLM-supported evaluation tasks such as grant reviewing or hiring decisions.
- Explicit instructions to ignore author information might reduce but not fully remove the observed preferences if they stem from training data patterns.
- Systems that rely on probability outputs rather than generated text may need extra safeguards to limit these effects.
- Testing whether the bias strength varies across different LLMs would clarify how much model choice influences fairness.
Load-bearing premise
Altering author metadata in the prompt isolates the causal effect of that attribute on the model's judgment without the model detecting the manipulation or responding to other unmeasured prompt features.
What would settle it
Running the identical paper through the same LLM multiple times with only the affiliation changed and finding no consistent difference in review scores or text across different institution rankings.
Figures
read the original abstract
The adoption of large language models (LLMs) is transforming the peer review process, from assisting reviewers in writing detailed evaluations to generating entire reviews automatically. While these capabilities offer new opportunities, they also raise concerns about fairness and reliability. In this paper, we investigate bias in LLM-generated peer reviews through controlled interventions on author metadata, including affiliation, gender, seniority, and publication history. Our analysis consistently shows a strong affiliation bias favoring authors from highly ranked institutions. We also identify directional preferences associated with seniority and prior publication record, which can influence acceptance decisions for borderline papers. Gender effects are smaller but present in several models. Notably, implicit biases become more pronounced when examining token-level soft ratings, suggesting that alignment may mask but not fully eliminate underlying preferences
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates bias in LLM-generated peer reviews via controlled metadata interventions on affiliation, gender, seniority, and publication history. It reports a strong affiliation bias favoring highly ranked institutions, directional effects from seniority and prior publications on acceptance decisions, smaller gender effects, and that implicit biases appear more clearly in token-level soft ratings than in final outputs.
Significance. If substantiated with full methodological details, the work would contribute to understanding fairness risks when LLMs assist or automate peer review, a timely topic in AI ethics and scholarly publishing. The empirical intervention design is a reasonable approach for isolating attribute effects, though the current evidence base is thin.
major comments (2)
- [Methods] Methods section: the abstract and visible description supply no sample sizes, exact LLM models, prompt templates, statistical tests, or controls for prompt length/content. These omissions are load-bearing for claims of 'consistent' and 'strong' affiliation bias and directional preferences.
- [Results] Results and intervention description: the central assumption that metadata swaps cleanly isolate causal effects is not tested or discussed. If full paper text (including self-references or institution mentions) is retained in the prompt, LLMs may respond to detectable inconsistencies rather than the target attribute, confounding token-level soft ratings and acceptance outcomes.
minor comments (2)
- [Abstract] Abstract: 'several models' is mentioned without naming them or providing version details.
- [Results] Notation: 'token-level soft ratings' would benefit from a precise definition or example in the main text.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments, which highlight important areas for improving the clarity and robustness of our work on biases in LLM-assisted peer review. We respond to each major comment below and indicate the revisions we plan to make.
read point-by-point responses
-
Referee: [Methods] Methods section: the abstract and visible description supply no sample sizes, exact LLM models, prompt templates, statistical tests, or controls for prompt length/content. These omissions are load-bearing for claims of 'consistent' and 'strong' affiliation bias and directional preferences.
Authors: We agree that explicit methodological details are necessary to support our claims of consistent and strong biases. The full manuscript's Methods section does describe the overall experimental design, but we acknowledge that key specifics such as exact sample sizes, LLM model names, full prompt templates, and statistical procedures could be presented more prominently. We will revise the manuscript to include these details directly in the main text or a dedicated methods summary, add the prompt templates to an appendix, specify the statistical tests (e.g., paired comparisons and regression models), and clarify controls for prompt length and content standardization. These changes will be incorporated in the revised version. revision: yes
-
Referee: [Results] Results and intervention description: the central assumption that metadata swaps cleanly isolate causal effects is not tested or discussed. If full paper text (including self-references or institution mentions) is retained in the prompt, LLMs may respond to detectable inconsistencies rather than the target attribute, confounding token-level soft ratings and acceptance outcomes.
Authors: This is a substantive methodological point that merits explicit treatment. Our design kept the core paper content fixed while swapping only the targeted metadata fields, but we did not include a dedicated test or discussion of whether LLMs might detect inconsistencies (e.g., via self-references or institution mentions). We will add a paragraph in the Methods section and a corresponding limitations discussion that addresses this assumption, describes any steps taken to minimize detectable artifacts (such as using standardized, anonymized paper excerpts), and notes the potential for residual confounding. If space permits, we will also report a brief sensitivity check. This revision will be made. revision: yes
Circularity Check
No circularity in empirical intervention study
full rationale
The paper reports results from controlled metadata interventions on LLM prompts for peer-review simulation, with claims based on observed differences in generated reviews, token-level ratings, and acceptance decisions. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citation chains appear in the abstract or described methodology. Central findings are presented as direct experimental observations rather than derivations that reduce to inputs by construction, rendering the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM outputs after metadata swaps reflect stable internal biases rather than sensitivity to prompt phrasing or detection of the experimental manipulation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We investigate bias in LLM-generated peer reviews through controlled interventions on author metadata, including affiliation, gender, seniority, and publication history.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our analysis consistently shows a strong affiliation bias favoring authors from highly ranked institutions.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
PeerPrism: Peer Evaluation Expertise vs Review-writing AI
PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
-
Inspectable AI for Science: A Research Object Approach to Generative AI Governance
Generative AI use in science can be governed through structured documentation and provenance capture by framing AI interactions as inspectable Research Objects rather than debating authorship.
Reference graph
Works this paper leans on
-
[1]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
AAAI . 2025. https://aaai.org/aaai-launches-ai-powered-peer-review-assessment-system/ Aaai launches ai-powered peer review assessment system . Web page. Accessed: 2025-07-29
work page 2025
-
[4]
Jiafu An, Difang Huang, Chen Lin, and Mingzhu Tai. 2025. https://doi.org/10.1093/pnasnexus/pgaf089 Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation
-
[5]
Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. 2025. Explicitly unbiased large language models still form biased associations. Proceedings of the National Academy of Sciences, 122(8):e2416228122
work page 2025
- [6]
-
[7]
CSRankings.org. 2025. https://csrankings.org/ CSRankings: Computer Science Rankings . Web page. Accessed: 2025‑07‑28, metrics‑based ranking of CS institutions
work page 2025
-
[8]
Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu. 2024. Bias and unfairness in information retrieval systems: New challenges in the llm era. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6437--6447
work page 2024
-
[9]
Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. 2024. Bias and fairness in large language models: A survey. Computational Linguistics, 50(3):1097--1179
work page 2024
-
[10]
ICLR . 2025. https://blog.iclr.cc/2025/04/15/leveraging-llm-feedback-to-enhance-review-quality/ Leveraging llm feedback to enhance review quality . Web page. Accessed: 2025-07-29
work page 2025
-
[11]
Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, and 1 others. 2024. Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews. arXiv preprint arXiv:2403.07183
work page internal anchor Pith review arXiv 2024
-
[12]
Mathias Wullum Nielsen, Christine Friis Baker, Emer Brady, Michael Bang Petersen, and Jens Peter Andersen. 2021. Weak evidence of country-and institution-related status bias in the peer review of abstracts. Elife, 10:e64561
work page 2021
-
[13]
OpenAI. 2025. https://openai.com/index/introducing-deep-research/ Introducing deep research . Accessed: 2025-07-28
work page 2025
-
[14]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, and 1 others. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730--27744
work page 2022
- [15]
-
[16]
QS. 2025. https://www.topuniversities.com/world-university-rankings Qs world university rankings 2026 . Web page. Accessed: 2025‑07‑28, covers methodology and ranking details
work page 2025
- [17]
-
[18]
Times Higher Education . 2024. https://www.timeshighereducation.com/world-university-rankings/world-university-rankings-2025-methodology World university rankings 2025 . Report and methodology guide. Published Sep 23, 2024; accessed 2025‑07‑28
work page 2024
-
[19]
U.S. News & World Report . 2025. https://www.usnews.com/education/best-global-universities Best global universities rankings 2025 . Web page. Accessed: 2025‑07‑28
work page 2025
-
[20]
kelly is a warm person, joseph is a role model
Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. 2023. " kelly is a warm person, joseph is a role model": Gender biases in llm-generated reference letters. arXiv preprint arXiv:2310.09219
- [21]
- [22]
- [24]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.