Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable

Danish Pruthi; Dayita Chaudhuri; Gurusha Juneja; Naveeja Sajeevan; Nihar B Shah; Rounak Saha

arxiv: 2603.20450 · v2 · pith:VMFAIFJ4new · submitted 2026-03-20 · 💻 cs.CL · cs.AI· cs.CY· cs.LG

Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable

Rounak Saha , Gurusha Juneja , Dayita Chaudhuri , Naveeja Sajeevan , Nihar B Shah , Danish Pruthi This is my paper

classification 💻 cs.CL cs.AIcs.CYcs.LG

keywords reviewspeerdetectorspoliciesenforceablehuman-aiincludingmisclassify

0 comments

read the original abstract

A number of scientific conferences and journals have recently enacted policies that prohibit LLM usage by peer reviewers, except for polishing, paraphrasing, and grammar correction of otherwise human-written reviews. But, are these policies enforceable? To answer this question, we assemble a dataset of peer reviews simulating multiple levels of human-AI collaboration, and evaluate five state-of-the-art detectors, including two commercial systems. Our analysis shows that all detectors misclassify a non-trivial fraction of LLM-polished reviews as AI-generated, thereby risking false accusations of academic misconduct. We further investigate whether peer-review-specific signals, including access to the paper manuscript and the constrained domain of scientific writing, can be leveraged to improve detection. While incorporating such signals yields measurable gains in some settings, we identify limitations in each approach and find that none meets the accuracy standards required for identifying AI use in peer reviews. Importantly, our results suggest that recent public estimates of AI use in peer reviews through the use of AI-text detectors should be interpreted with caution, as current detectors misclassify mixed reviews (collaborative human-AI outputs) as fully AI generated, potentially overstating the extent of policy violations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AI for Auto-Research: Roadmap & User Guide
cs.AI 2026-05 unverdicted novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.