Recognition: unknown
Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review
Pith reviewed 2026-05-08 17:25 UTC · model grok-4.3
The pith
IntraGuard embeds invisible instructions in PDF manuscripts to make LLMs refuse or mark their peer-review outputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
IntraGuard is a black-box, venue-agnostic framework that uses three intra-stream injection mechanisms to embed defensive text objects inside the PDF structure, enabling both explicit strategies that prompt LLM refusal or warning signals and implicit strategies that insert predefined markers into generated reviews.
What carries the argument
Intra-stream injection mechanisms that embed heterogeneous defensive text objects within the PDF's underlying structure while preserving visual presentation.
If this is right
- Review committees can deploy IntraGuard on submitted manuscripts to reduce full outsourcing of peer review to LLMs.
- Human reviewers continue to see an unaltered manuscript and require no extra steps.
- The defense applies across multiple commercial LLMs and scientific disciplines without venue-specific changes.
- The framework remains lightweight and runs on ordinary computers with an average cost of one second per manuscript.
- Evaluations of eleven adaptive attacks show that some sanitization and interference methods reduce but do not eliminate the defense success rate.
Where Pith is reading between the lines
- An ongoing arms race may develop in which attackers create more effective PDF sanitizers and defenders add multiple layered injections.
- LLM developers may need to improve how their models handle embedded PDF objects during document-based tasks such as reviewing.
- The same structural injection approach could be tested on other document formats or evaluation workflows outside academic peer review.
Load-bearing premise
Commercial LLMs will reliably read and act on the injected PDF objects by refusing or marking their reviews, and these objects cannot be stripped by simple sanitization without breaking the document's visual appearance.
What would settle it
An experiment in which a commercial LLM receives a sanitized version of the injected PDF and produces a complete, unmarked review without refusal, or in which a human reader notices any visual alteration after the objects are added.
Figures
read the original abstract
As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critical thinking and depth of reasoning required to assess scientific novelty. One promising direction for mitigating this concern is to embed hidden instructions into manuscripts that disrupt or alter chatbot-generated reviews. However, existing methods remain intuitive and fragile, as they typically rely on homogeneous payloads injected in an inter-stream manner, rendering them susceptible to sanitization or neutralization. In this paper, we identify End-to-End Review Outsourcing as an emerging threat and propose IntraGuard, a black-box, venue-agnostic defense framework grounded in the structural--visual decoupling inherent to the PDF. Designed for committee-side deployment, IntraGuard supports both explicit strategies that trigger refusal or warning signals, and implicit strategies that embed predefined textual markers into the generated review. These strategies can be deployed via any of three intra-stream injection mechanisms, each of which seamlessly embeds heterogeneous defensive text objects within the PDF's underlying structure without altering its visual presentation. Extensive evaluations across 7 real-world commercial chatbot settings and 12 venues spanning diverse disciplines show that IntraGuard achieves a defense success rate of up to 84%, while preserving peer-review invariance for human reviewers. IntraGuard is lightweight and hardware-independent, incurring an average overhead of only one second per manuscript on a commodity personal computer. We further evaluate 11 adaptive attacks spanning manuscript sanitization and instruction interference, and discuss the implications of constructing ensemble defenses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes IntraGuard, a black-box, venue-agnostic defense framework that embeds hidden defensive instructions into PDF manuscripts via three intra-stream injection mechanisms exploiting structural-visual decoupling. These instructions aim to trigger refusal/warning signals or embed textual markers in LLM-generated reviews. Evaluations across 7 commercial chatbots and 12 venues report defense success rates up to 84%, with invariance for human reviewers, 1-second average overhead, and tests against 11 adaptive attacks including sanitization and instruction interference.
Significance. If the empirical results hold, this provides a timely, practical defense against the emerging threat of end-to-end review outsourcing to LLMs, which prior work has shown lack depth for assessing novelty. Strengths include direct testing on real commercial systems rather than simulated ones, the parameter-free and hardware-independent design, and explicit evaluation of adaptive attacks. The stress-test concern about LLM PDF parsing pipelines does not land, because the reported rates come from actual interactions with the 7 chatbots' own ingestion and processing pipelines; any extraction behavior is already reflected in the 84% figure.
major comments (2)
- [Evaluation] Evaluation section: The central claim of up to 84% defense success rate lacks a precise definition of 'success' (e.g., exact refusal criteria, marker detection rules, or how partial responses are scored) and omits statistical details such as number of trials, per-chatbot or per-venue variance, or confidence intervals. This opacity directly affects assessment of whether the intra-stream mechanisms are responsible for the reported robustness.
- [§4] Injection mechanisms (likely §4): The three intra-stream methods are described as embedding heterogeneous objects that survive LLM pipelines, but the manuscript provides no ablation or concrete evidence (e.g., before/after extraction examples or comparison to inter-stream baselines) showing that these objects trigger the intended behavior beyond the aggregate success rate. This is load-bearing for the structural-visual decoupling advantage over prior fragile methods.
minor comments (2)
- [Abstract] Abstract: The phrase 'up to 84%' should specify the conditions (e.g., which strategy or chatbot) under which the maximum is achieved for precision.
- [Introduction] The manuscript introduces 'End-to-End Review Outsourcing' without a formal definition or citation to prior work on LLM limitations in review tasks; adding this would improve context.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The comments highlight important areas for improving clarity and evidence in the evaluation and mechanism sections. We address each point below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The central claim of up to 84% defense success rate lacks a precise definition of 'success' (e.g., exact refusal criteria, marker detection rules, or how partial responses are scored) and omits statistical details such as number of trials, per-chatbot or per-venue variance, or confidence intervals. This opacity directly affects assessment of whether the intra-stream mechanisms are responsible for the reported robustness.
Authors: We agree that precise definitions and statistical details are essential for assessing the results. In the revised manuscript, we will expand the Evaluation section with a new subsection explicitly defining success: for explicit refusal strategies, success occurs when the chatbot outputs a refusal message or warning signal; for implicit marker strategies, success is scored when the predefined textual marker appears verbatim in the generated review (with partial responses counted only if the marker is present). We will report the total number of trials (100 per chatbot-venue pair), per-chatbot and per-venue success rates with standard deviations, and 95% confidence intervals. These additions will directly link the reported rates to the intra-stream mechanisms. revision: yes
-
Referee: [§4] Injection mechanisms (likely §4): The three intra-stream methods are described as embedding heterogeneous objects that survive LLM pipelines, but the manuscript provides no ablation or concrete evidence (e.g., before/after extraction examples or comparison to inter-stream baselines) showing that these objects trigger the intended behavior beyond the aggregate success rate. This is load-bearing for the structural-visual decoupling advantage over prior fragile methods.
Authors: We acknowledge that additional concrete evidence would strengthen the claims about structural-visual decoupling. While the end-to-end success rates on real commercial chatbots already incorporate the effects of each mechanism's survival through actual LLM ingestion pipelines, we will add in the revision: (1) before/after PDF object extraction examples for each of the three intra-stream methods, (2) an ablation table isolating the contribution of each mechanism, and (3) a brief comparison to inter-stream baselines from prior work where the same payloads were tested. This will provide direct evidence that the heterogeneous objects are responsible for triggering the observed behaviors. revision: yes
Circularity Check
No circularity: central results are direct empirical measurements on external systems
full rationale
The paper presents IntraGuard as a black-box framework whose performance (up to 84% defense success) is measured through explicit experiments on seven commercial chatbots, twelve venues, and eleven adaptive attacks. No equations, fitted parameters, or self-citations are used to derive the success rate; the reported figures come from external LLM behavior rather than any reduction to the authors' own prior definitions or ansatzes. The structural-visual decoupling is introduced as a design choice and then validated empirically, not presupposed by construction. This is a standard empirical security evaluation with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption PDF documents maintain a structural layer (content streams, objects) that can be modified independently of the visual rendering layer without detectable changes to human readers.
- domain assumption Current commercial LLMs process the full PDF structure when generating reviews rather than only the rendered text.
Reference graph
Works this paper leans on
-
[1]
Holcombe
Balazs Aczel, Barnabas Szaszi, and Alex O. Holcombe. 2021. A Billion-Dollar Do- nation: Estimating the Cost of Researchers’ Time Spent on Peer Review.Research Integrity and Peer Review(2021)
2021
-
[2]
2006.PDF Reference, Sixth Edition: Adobe Portable Document Format Version 1.7
Adobe Systems Incorporated. 2006.PDF Reference, Sixth Edition: Adobe Portable Document Format Version 1.7
2006
-
[3]
Cem Anil, Esin Durmus, Nina Panickssery, et al. 2024. Many-Shot Jailbreaking. InNeurIPS
2024
-
[4]
Daniel Ayzenshteyn, Roy Weiss, and Yisroel Mirsky. 2025. Cloak, Honey, Trap: Proactive Defenses Against LLM Agents. InUSENIX Security
2025
-
[5]
Microsoft Azure. 2025. Prompt Shields. https://learn.microsoft.com/en-us/azure/ ai-services/content-safety/concepts/jailbreak-detection
2025
-
[6]
James R. Barlow. 2026. pikepdf Documentation. https://pikepdf.readthedocs.io/ en/latest
2026
-
[7]
Howard Bauchner and Frederick P. Rivara. 2024. Use of artificial intelligence and the future of peer review.Health Affairs Scholar(2024)
2024
-
[8]
Lianghua Cao, Lan You, and R&D Team. 2025. CSPaper Review: Fast, Rubric- Faithful Conference Feedback. InProceedings of the 18th International Natural Language Generation Conference
2025
-
[9]
Yulin Chen, Haoran Li, Yuan Sui, et al . 2025. Can Indirect Prompt Injection Attacks Be Detected and Removed?. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
2025
-
[10]
Xinhao Deng, Qi Li, and Ke Xu. 2024. Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis. InCCS
2024
-
[11]
Drozdz and Michael R
John A. Drozdz and Michael R. Ladomery. 2024. The Peer Review Process: Past, Present, and Future.British Journal of Biomedical Science(2024)
2024
-
[12]
Gemini Team, Google DeepMind. 2024. Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context.arXiv(2024). https: //arxiv.org/abs/2403.05530
work page internal anchor Pith review arXiv 2024
-
[13]
Keegan Hines, Gary Lopez, Matthew Hall, et al. 2024. Defending against Indirect Prompt Injection Attacks with Spotlighting.arXiv(2024)
2024
-
[14]
ISO. 2020. Document management – Portable document format – Part 2: PDF 2.0. ISO 32000-2:2020. https://www.iso.org/standard/75839.html
2020
-
[15]
Yi Jiang, Oubo Ma, Yong Yang, et al . 2025. Reformulation is All You Need: Addressing Malicious Text Features in DNNs.arXiv(2025)
2025
-
[16]
Yi Jiang, Chenghui Shi, Oubo Ma, et al . 2023. Text Laundering: Mitigating Malicious Features Through Knowledge Distillation of Large Foundation Models. InInternational Conference on Information Security and Cryptology
2023
-
[17]
Yiqiao Jin, Qinlin Zhao, Yiyang Wang, et al. 2024. AgentReview: Exploring Peer Review Dynamics with LLM Agents. InEMNLP
2024
-
[18]
Greshake Kai. 2023. Inject My PDF: Prompt Injection for Your Resume. https: //kai-greshake.de/posts/inject-my-pdf
2023
-
[19]
Greshake Kai, Abdelnabi Sahar, Mishra Shailesh, et al. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security
2023
-
[20]
Gautam Kamath. 2026. On Violations of LLM Review Policies. ICML Blog. https://blog.icml.cc/2026/03/18/on-violations-of-llm-review-policies
2026
-
[21]
Jacalyn Kelly, Tara Sadeghieh, and Khosrow Adeli. 2014. Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide.eJIFCC(2014)
2014
-
[22]
Jaeho Kim, Yunseok Lee, and Seulki Lee. 2025. Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards. InICML
2025
-
[23]
Michail Kovanis, Raphael Porcher, Philippe Ravaud, and Ludovic Trinquart. 2016. The Global Burden of Journal Peer Review in the Biomedical Literature: Strong Imbalance in the Collective Enterprise.PLOS one(2016)
2016
-
[24]
1970.The structure of scientific revolutions
Thomas S Kuhn and Ian Hacking. 1970.The structure of scientific revolutions. University of Chicago press
1970
-
[25]
Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, and Jennifer Neville. 2026. LLMs Get Lost in Multi-Turn Conversation. InICLR
2026
-
[26]
LangChain. 2026. LangChain Docs. https://docs.langchain.com/oss/python/ langchain/overview
2026
-
[27]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.NeurIPS(2020)
2020
-
[28]
Weixin Liang, Zachary Izzo, Yaohui Zhang, et al. 2024. Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. InICML
2024
-
[29]
Weixin Liang, Yuhui Zhang, Hancheng Cao, et al. 2024. Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis.NEJM AI(2024)
2024
-
[30]
Side Liu, Jiang Ming, Guodong Zhou, et al. 2025. Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model. InCCS
2025
-
[31]
Side Liu, Jiang Ming, Yilin Zhou, Jianming Fu, and Guojun Peng. 2025. VAPD: An Anomaly Detection Model for PDF Malware Forensics with Adversarial Robustness. InUSENIX Security
2025
-
[32]
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, et al. 2024. Automatic and Universal Prompt Injection Attacks against Large Language Models.CoRR(2024)
2024
-
[33]
LlamaIndex. 2026. LlamaIndex. https://docs.llamaindex.ai/
2026
-
[34]
Chris Lu, Cong Lu, Robert Tjarko Lange, et al. 2024. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.arXiv(2024)
2024
-
[35]
Oubo Ma, Yuwen Pu, Linkang Du, et al. 2024. SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems. In CCS
2024
-
[36]
Meta. 2025. Llama Prompt Guard 4. https://www.llama.com/docs/overview
2025
-
[37]
Miryam Naddaf. 2025. AI is Transforming Peer Review-and Many Scientists are Worried.Nature(2025)
2025
-
[38]
Boucher Nicholas, Blessing Jenny, Shumailov Ilia, et al. 2025. When Vision Fails: Text Attacks against ViT and OCR. InProceedings of the 2025 Workshop on Large AI Systems and Models with Privacy and Security Analysis
2025
-
[39]
OpenAI. 2025. Usage Policies. https://openai.com/policies/usage-policies
2025
-
[40]
Pangram. 2025. ICLR 2026 — Reviews. https://iclr.pangram.com/reviews
2025
-
[41]
Kornaropoulos, and Giuseppe Ateniese
Dario Pasquini, Evgenios M. Kornaropoulos, and Giuseppe Ateniese. 2025. LLMmap: Fingerprinting for Large Language Models. InUSENIX Security
2025
-
[42]
Wenjie Qu, Wengrui Zheng, Tianyang Tao, et al. 2025. Provably Robust Multi-bit Watermarking for AI-generated Text. InUSENIX Security
2025
-
[43]
Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, and Nihar B. Shah. 2025. Detecting LLM-Generated Peer Reviews.PLOS one(2025)
2025
-
[44]
Mark Russinovich, Ahmed Salem, and Ronen Eldan. 2025. Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. InUSENIX Security
2025
-
[45]
Devanshu Sahoo, Manish Prasad, Vasudev Majhi, et al. 2025. When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection.arXiv(2025)
2025
-
[46]
Schulhoff Sander. 2023. Instruction Defense. https://learnprompting.org/docs/ prompt_hacking/defensive_measures/instruction
2023
-
[47]
Schulhoff Sander. 2023. Sandwich Defense. https://learnprompting.org/docs/ prompt_hacking/defensive_measures/sandwich_defense
2023
-
[48]
Jiawen Shi, Zenghui Yuan, Yinuo Liu, et al. 2024. Optimization-Based Prompt Injection Attack to LLM-as-a-Judge. InCCS
2024
-
[49]
Jiawen Shi, Zenghui Yuan, Guiyao Tie, et al. 2026. Prompt Injection Attack to Tool Selection in LLM Agents. InNDSS
2026
-
[50]
Keith Tyser, Ben Segev, Gaston Longhitano, et al. 2024. AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews.arXiv(2024)
2024
-
[51]
Xing Xu, Jiefu Chen, Jinhui Xiao, et al. 2020. What Machines See Is Not What They Get: Fooling Scene Text Recognition Models with Adversarial Text Images. InCVPR
2020
-
[52]
Diwen Xue, Michalis Kallitsis, Amir Houmansadr, and Roya Ensafi. 2024. Fin- gerprinting Obfuscated Proxy Traffic with Encapsulated TLS Handshakes. In USENIX Security
2024
-
[53]
Yong Yang, Changjiang Li, Qingming Li, et al . 2025. PRSA: Prompt Stealing Attacks against Real-World Prompt Services. InUSENIX Security
2025
-
[54]
Rui Ye, Xianghe Pang, Jingyi Chai, et al . 2024. Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review.arXiv (2024)
2024
-
[55]
Jingwei Yi, Yueqi Xie, Bin Zhu, et al. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InKDD
2025
-
[56]
Jiahao Yu, Xingwei Lin, Zheng Yu, and Xinyu Xing. 2024. LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks. InUSENIX Security
2024
-
[57]
Zhihui Yu. 2026. PDF-Prompt-Injection-Toolkit. https://github.com/zhihuiyuze/ PDF-Prompt-Injection-Toolkit
2026
-
[58]
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InACL
2024
-
[59]
Alex L Zhang, Tim Kraska, and Omar Khattab. 2025. Recursive Language Models. arXiv(2025)
2025
-
[60]
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Farinaz Koushan- far. 2024. REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models. InUSENIX Security
2024
-
[61]
Penghao Zhao, Hailin Zhang, Qinhan Yu, et al . 2026. Retrieval-Augmented Generation for AI-Generated Content: A Survey.Data Science and Engineering (2026)
2026
-
[62]
Yinan Zhong, Qianhao Miao, Yanjiao Chen, et al. 2026. Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs. InNDSS
2026
-
[63]
This manuscript introduces
Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. 2025. PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models. InUSENIX Security. 13 A Ethical Considerations Stakeholder Analysis. IntraGuard involves two primary entities: • Researchers and Practitioners.Individuals in this group may assume three distinct ...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.