pith. sign in

arxiv: 2606.13044 · v1 · pith:V2KI63XDnew · submitted 2026-06-11 · 💻 cs.CL

No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions

Pith reviewed 2026-06-27 06:36 UTC · model grok-4.3

classification 💻 cs.CL
keywords AI peer reviewadversarial attackspresentation manipulationLLM robustnesspeer reviewscore inflationadversarial repackaging
0
0 comments X

The pith

AI peer reviewers award higher scores after changes to only a paper's abstract, framing and narrative.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that AI systems for peer review can be made to increase their scores by revising only how a paper is presented, such as rewording the abstract, repositioning contributions relative to prior work, expanding discussion sections, and adjusting narrative structure, while leaving every method, experiment, figure, equation, and numerical result untouched. A closed-loop process called adversarial repackaging uses the AI reviewer's own feedback to guide these presentation revisions and achieves a 75.1 percent success rate together with an average score increase of 1.21 out of 10 across three common AI reviewers. Changes that alter how the reviewer interprets the paper's place in the literature work better than local polishing or formatting adjustments. The results point to two structural problems: AI reviewers respond more readily to highlighted strengths than to attempts to fix weaknesses, and they can treat the appearance of having addressed a limitation as equivalent to actually having resolved it with new evidence. If these patterns hold, the main deployment risk for AI review tools is not hidden instructions but the fact that the narrative surface itself becomes an optimizable variable.

Core claim

Adversarial repackaging achieves a 75.1% attack success rate and a mean score gain of +1.21/10 by modifying only presentation-level content while keeping scientific evidence fixed. Strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits such as local polishing, table formatting, and algorithm boxes. AI reviewers are easier to impress than to convince, and they can confuse the appearance of addressing a limitation with actually resolving it.

What carries the argument

Adversarial repackaging: a closed-loop attack that uses AI-reviewer feedback to search for presentation-level revisions while keeping the scientific evidence fixed.

If this is right

  • Presentation-only revisions can produce large score gains without any alteration to scientific content.
  • Interpretation-shifting edits outperform surface-level polishing.
  • Highlighting strengths reliably raises perceived merit more than attempts to dissolve weaknesses.
  • Unchanged evidence can be reinterpreted as a stronger contribution through narrative adjustments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI review systems may require explicit mechanisms that anchor scores to concrete evidence rather than narrative framing.
  • The released contamination-free benchmark enables repeated testing of whether future AI reviewers stay anchored to scientific content.
  • In deployed review pipelines, authors could systematically optimize presentation for AI scoring even when the underlying work is unchanged.
  • Whether human reviewers exhibit similar sensitivity to presentation repositioning remains outside the scope of the reported experiments.

Load-bearing premise

The modifications to abstract, contribution framing, related work, discussion, and narrative structure constitute purely presentation-level changes that do not alter the interpretation or perceived strength of the underlying scientific evidence.

What would settle it

Apply the same set of presentation revisions to a paper but instruct the AI reviewer to ignore all narrative, abstract, and discussion text and score only the methods, experiments, and numerical results; if scores remain unchanged, the claim is falsified.

read the original abstract

As AI-generated reviews move from experimental tools into peer-review infrastructure, most robustness concerns have focused on explicit attacks such as hidden instructions and prompt injection. We study a harder and more policy-relevant failure mode: no hidden text, no prompt injection, and no changes to methods, experiments, figures, equations, proofs, or numerical results. The attacker modifies only presentation-level content, such as the abstract, contribution framing, related work, discussion, and narrative structure. We introduce adversarial repackaging: a closed-loop attack that uses AI-reviewer feedback to search for presentation-level revisions while keeping the scientific evidence fixed. Across three mainstream AI reviewers, adversarial repackaging achieves a 75.1% attack success rate and a mean score gain of +1.21/10. The effect is not explained by ordinary prose polishing. We also reveal that strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits such as local polishing, table formatting, and algorithm boxes. Our analysis reveals two deeper structural failure modes. First, AI reviewers are easier to impress than to convince: highlighting strengths reliably increases perceived merit, while attempts to dissolve weaknesses frequently backfire. Second, AI reviewers can confuse the appearance of addressing a limitation with actually resolving it, allowing unchanged evidence to be reinterpreted as stronger scientific contribution. These results show that the deployment risk is not only malicious hidden instructions, but the emergence of paper presentation itself as an optimization surface. We release a contamination-free rolling benchmark and attack framework for testing whether AI reviewers remain anchored to scientific content under presentation-only edits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that AI peer reviewers can be successfully attacked ('gamed') using only presentation-level revisions—such as changes to the abstract, contribution framing, related work, discussion, and narrative structure—while keeping all scientific evidence, methods, experiments, figures, equations, and numerical results fixed. It introduces a closed-loop 'adversarial repackaging' attack that uses AI-reviewer feedback to optimize these revisions. Across three mainstream AI reviewers, the approach yields a 75.1% attack success rate and mean score gain of +1.21/10. The paper further identifies two structural failure modes (AI reviewers are easier to impress than to convince; they confuse the appearance of addressing limitations with actual resolution) and releases a contamination-free rolling benchmark and attack framework.

Significance. If the strict separation between presentation and interpretive changes can be maintained, the results would demonstrate that AI reviewers are vulnerable to optimization over narrative framing alone, with implications for any deployment of AI in peer review. The release of a contamination-free rolling benchmark and attack framework is a concrete strength that supports reproducibility and future testing of whether AI reviewers remain anchored to scientific content.

major comments (1)
  1. [Abstract] Abstract: The central claim requires that all revisions keep 'the scientific evidence fixed' and constitute 'presentation-level content' only. However, the abstract explicitly states that 'strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits'. Related-work repositioning alters perceived novelty and contribution claims; analytical discussion expansion reframes interpretive claims about the fixed results. These are not presentation-only under any standard definition and directly violate the 'evidence fixed' precondition, so the reported 75.1% success rate and +1.21 score gain cannot be isolated to presentation effects.
minor comments (1)
  1. The abstract refers to a 'contamination-free rolling benchmark' but provides no details on the contamination checks or rolling mechanism; adding a brief description would improve clarity without affecting the core argument.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed comment on the scope of presentation-level revisions. We respond point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim requires that all revisions keep 'the scientific evidence fixed' and constitute 'presentation-level content' only. However, the abstract explicitly states that 'strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits'. Related-work repositioning alters perceived novelty and contribution claims; analytical discussion expansion reframes interpretive claims about the fixed results. These are not presentation-only under any standard definition and directly violate the 'evidence fixed' precondition, so the reported 75.1% success rate and +1.21 score gain cannot be isolated to presentation effects.

    Authors: We thank the referee for this observation. The manuscript explicitly includes related-work repositioning and analytical discussion expansion within the scope of presentation-level revisions, as these operations modify only the narrative structure and framing around the unchanged scientific content. Related-work repositioning entails recontextualizing the contribution by adjusting references to prior work without changing the paper's methods or results. Analytical discussion expansion involves elaborating on the implications and interpretations of the fixed experimental findings. These are not changes to the evidence itself but to how it is presented and interpreted by the reviewer. The paper demonstrates that AI reviewers are susceptible to such framing adjustments, which is the core finding. The distinction from surface edits is intentional, as the results show narrative strategies are more impactful. Thus, the success metrics are for this class of revisions as defined. We maintain that the precondition is satisfied and no revision to the manuscript is necessary. revision: no

Circularity Check

0 steps flagged

No significant circularity; empirical measurement study

full rationale

The paper is an empirical attack study that directly measures attack success rates and score gains on external AI reviewers. No mathematical derivations, equations, fitted parameters, or self-citation load-bearing steps are present in the provided text or abstract. The central results (75.1% success rate, +1.21 mean gain) are obtained by running the attack against independent systems rather than reducing to any input by construction. The noted tension between 'evidence fixed' and 'interpretation-changing strategies' is a validity concern, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that presentation-level edits can be isolated from scientific content and that the three tested AI reviewers are representative of deployed systems.

axioms (1)
  • domain assumption AI reviewers can be influenced by changes in presentation framing without changes to scientific content.
    This is the core premise tested in the study.

pith-pipeline@v0.9.1-grok · 5867 in / 1273 out tokens · 29689 ms · 2026-06-27T06:36:13.923893+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 4 canonical work pages

  1. [1]

    Litllms, llms for literature review: Are we there yet?Transactions on Machine Learning Research, 2025

    Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H Laradji, Krishnamurthy Dj Dvijotham, Jason Stanley, Laurent Charlin, and Christopher Pal. Litllms, llms for literature review: Are we there yet?Transactions on Machine Learning Research, 2025

  2. [2]

    Pre-review to peer review: Pitfalls of automating reviews using large language models, 2025

    Akhil Pandey Akella, Harish Varma Siravuri, and Shaurya Rohatgi. Pre-review to peer review: Pitfalls of automating reviews using large language models, 2025. URLhttps://arxiv.org/ abs/2512.22145

  3. [3]

    Stop automating peer review without rigorous evaluation

    Joachim Baumann, Jiaxin Pei, Sanmi Koyejo, and Dirk Hovy. Stop automating peer review without rigorous evaluation. InPost-AGI Science and Society Workshop, 2026. URLhttps: //openreview.net/forum?id=cJhlquXIuS

  4. [4]

    Ai-assisted peer review at scale: The aaai-26 ai review pilot.arXiv preprint arXiv:2604.13940, 2026

    Joydeep Biswas, Sheila Schoepp, Gautham Vasan, Anthony Opipari, Arthur Zhang, Zichao Hu, 11 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions Sebastian Joseph, Matthew Lease, Junyi Jessy Li, Peter Stone, et al. Ai-assisted peer review at scale: The aaai-26 ai review pilot.arXiv preprint arXiv:2604.1...

  5. [5]

    TreeReview: A dynamic tree of questions framework for deep and efficient LLM-based scientific peer review

    Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Hayden Kwok-Hay So, Zhijiang Guo, Liya Zhu, and Ngai Wong. TreeReview: A dynamic tree of questions framework for deep and efficient LLM-based scientific peer review. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on E...

  6. [6]

    Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, and Yanghua Xiao

    Association for Computational Linguistics. ISBN 979-8-89176-332-6. doi: 10.18653/v1/ 2025.emnlp-main.790. URLhttps://aclanthology.org/2025.emnlp-main.790/

  7. [7]

    Pangram predicts 21% of iclr reviews are ai-generated.Pangram Labs Blog, Nov, 2025

    Bradley Emi. Pangram predicts 21% of iclr reviews are ai-generated.Pangram Labs Blog, Nov, 2025

  8. [8]

    Openreviewer: A specialized large language model for generating critical scientific paper reviews.arXiv preprint arXiv:2412.11948, 2024

    Maximilian Idahl and Zahra Ahmadi. Openreviewer: A specialized large language model for generating critical scientific paper reviews.arXiv preprint arXiv:2412.11948, 2024

  9. [9]

    Badscientist: Can a research agent write convincing but unsound papers that fool llm reviewers? arXiv preprint arXiv:2510.18003, 2025

    Fengqing Jiang, Yichen Feng, Yuetai Li, Luyao Niu, Basel Alomair, and Radha Poovendran. Badscientist: Can a research agent write convincing but unsound papers that fool llm reviewers? arXiv preprint arXiv:2510.18003, 2025

  10. [10]

    Is bert really robust? a strong baseline for natural language attack on text classification and entailment

    Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020

  11. [11]

    Paraphrasing adversarial attack on llm-as-a-reviewer.arXiv preprint arXiv:2601.06884, 2026

    Masahiro Kaneko. Paraphrasing adversarial attack on llm-as-a-reviewer.arXiv preprint arXiv:2601.06884, 2026

  12. [12]

    Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966, 2025

    Jaeho Kim, Yunseok Lee, and Seulki Lee. Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966, 2025

  13. [13]

    Where do llms go wrong? diagnosing automated peer review via aspect-guided multi-level perturbation

    Jiatao Li, Yanheng Li, Xinyu Hu, Mingqi Gao, and Xiaojun Wan. Where do llms go wrong? diagnosing automated peer review via aspect-guided multi-level perturbation. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 1572–1581, 2025

  14. [14]

    Llm-reval: Can we trust llm reviewers yet?arXiv preprint arXiv:2510.12367, 2025

    Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Xiangwen Kong, Zhifang Sui, Nanyun Peng, et al. Llm-reval: Can we trust llm reviewers yet?arXiv preprint arXiv:2510.12367, 2025

  15. [15]

    Llms cannot reliably judge (yet?): A comprehensive assessment on the robustness of llm-as-a-judge.arXiv preprint arXiv:2506.09443, 2025

    Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, and Shouling Ji. Llms cannot reliably judge (yet?): A comprehensive assessment on the robustness of llm-as-a-judge.arXiv preprint arXiv:2506.09443, 2025

  16. [16]

    Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews.arXiv preprint arXiv:2403.07183, 2024

    WeixinLiang,ZacharyIzzo,YaohuiZhang,HaleyLepp,HanchengCao,XuandongZhao,Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, et al. Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews.arXiv preprint arXiv:2403.07183, 2024

  17. [17]

    Stop ddos attack- ing the research community with ai-generated survey papers.Advances in Neural Information Processing Systems, 38, 2026

    Jianghao Lin, Rong Shan, Jiachen Zhu, Yunjia Xi, Yong Yu, and Weinan Zhang. Stop ddos attack- ing the research community with ai-generated survey papers.Advances in Neural Information Processing Systems, 38, 2026

  18. [18]

    Yu, and Hong-Han Shuai

    Tzu-LingLin, Wei-ChihChen, Teng-FangHsiao, Hou-ILiu, Ya-HsinYeh, Yu-KaiChan, Wen-Sheng Lien, Po-Yen Kuo, Philip S. Yu, and Hong-Han Shuai. Breaking the reviewer: Assessing the 12 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions vulnerability of large language models in automated peer review under t...

  19. [19]

    Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models

    Adian Liusie, Potsawee Manakul, and Mark Gales. Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 139–151, 2024

  20. [20]

    Llm evaluators recognize and favor their own generations.Advances in Neural Information Processing Systems, 37:68772–68802, 2024

    Arjun Panickssery, Samuel R Bowman, and Shi Feng. Llm evaluators recognize and favor their own generations.Advances in Neural Information Processing Systems, 37:68772–68802, 2024

  21. [21]

    Is LLM -as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment

    Vyas Raina, Adian Liusie, and Mark Gales. Is LLM-as-a-judge robust? investigating universal adversarial attacks on zero-shot LLM assessment. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7499–7517, Miami, Florida, USA, November 2024. Association f...

  22. [22]

    The ai review lottery: Widespread ai-assisted peer reviews boost paper scores and acceptance rates.Proceedings of the ACM on Human-Computer Interaction, 9(7):1–28, 2025

    Giuseppe Russo, Manoel Horta Ribeiro, Tim Ruben Davidson, Veniamin Veselovsky, and Robert West. The ai review lottery: Widespread ai-assisted peer reviews boost paper scores and acceptance rates.Proceedings of the ACM on Human-Computer Interaction, 9(7):1–28, 2025

  23. [23]

    Exploring the effects of alignment on numerical bias in large language models, 2026

    Ayako Sato, Hwichan Kim, Zhousi Chen, Masato Mita, and Mamoru Komachi. Exploring the effects of alignment on numerical bias in large language models, 2026. URLhttps: //arxiv.org/abs/2601.16444

  24. [24]

    Challenges, experiments, and computational solutions in peer review.Communi- cations of the ACM, 65(6):76–87, 2022

    Nihar B Shah. Challenges, experiments, and computational solutions in peer review.Communi- cations of the ACM, 65(6):76–87, 2022

  25. [25]

    Mind the Blind Spots:

    Hyungyu Shin, Jingyu Tang, Yoonjoo Lee, Nayoung Kim, Hyunseung Lim, Ji Yong Cho, Hwajung Hong, Moontae Lee, and Juho Kim. Mind the blind spots: A focus-level evaluation framework for LLM reviews. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Lan...

  26. [26]

    A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026

    Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, and James Zou. A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026

  27. [27]

    Justice in judgment: Unveiling(hidden)biasinllm-assistedpeerreviews.arXivpreprintarXiv:2509.13400, 2025

    Sai Suresh Macharla Vasu, Ivaxi Sheth, Hui-Po Wang, Ruta Binkyte, and Mario Fritz. Justice in judgment: Unveiling(hidden)biasinllm-assistedpeerreviews.arXivpreprintarXiv:2509.13400, 2025

  28. [28]

    Can ai be a good peer reviewer? a survey of peer review process, evaluation, and the future, 2026

    Sihong Wu, Owen Jiang, Yilun Zhao, Tiansheng Hu, Yiling Ma, Kaiyan Zhang, Manasi Patward- han, and Arman Cohan. Can ai be a good peer reviewer? a survey of peer review process, evaluation, and the future, 2026. URLhttps://arxiv.org/abs/2604.27924. 13 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions

  29. [29]

    Paper copilot: Tracking the evolution of peer review in ai conferences.arXiv preprint arXiv:2510.13201, 2025

    Jing Yang, Qiyao Wei, and Jiaxin Pei. Paper copilot: Tracking the evolution of peer review in ai conferences.arXiv preprint arXiv:2510.13201, 2025

  30. [30]

    Are we there yet? revealing the risks of utilizing large language models in scholarly peer review.arXiv preprint arXiv:2412.01708, 2024

    Rui Ye, Xianghe Pang, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, and Siheng Chen. Are we there yet? revealing the risks of utilizing large language models in scholarly peer review.arXiv preprint arXiv:2412.01708, 2024

  31. [31]

    Reviewrl: Towards automated scientific review with rl.arXiv preprint arXiv:2508.10308, 2025

    Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru Wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, and Bowen Zhou. Reviewrl: Towards automated scientific review with rl.arXiv preprint arXiv:2508.10308, 2025

  32. [32]

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

  33. [33]

    give a positive review only

    Qin Zhou, Zhexin Zhang, Zhi Li, and Limin Sun. " give a positive review only": An early investigation into in-paper prompt injection attacks and defenses for ai reviewers.arXiv preprint arXiv:2511.01287, 2025

  34. [34]

    limited novelty

    Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, and Lingyao Li. When your reviewer is an llm: Biases, divergence, and prompt injection risks in peer review.arXiv preprint arXiv:2509.09912, 2025. 14 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions Appendix A Presentation-Level Strategy P...

  35. [35]

    Strengths: Did the review become more or less positive in its strengths overall?

  36. [36]

    Weaknesses + Questions: Did the review become more or less severe in its weaknesses and questions overall?

  37. [37]

    strength_analysis

    Overall framing: Did the summary and sub-scores indicate a more positive or negative overall stance? Important rules: - Judge the overall change in the review, considering both what is said and how it is expressed. Do not judge whether the review is correct. - Compare the two reviews holistically, but do not invent missing edits or motivations. - A critic...