Detecting Verbatim LLM Copy-Paste in Homework

Aizierjiang Aiersilan

arxiv: 2605.16336 · v1 · pith:SUCTV67Gnew · submitted 2026-05-07 · 💻 cs.CR · cs.AI· cs.CY

Detecting Verbatim LLM Copy-Paste in Homework

Aizierjiang Aiersilan This is my paper

Pith reviewed 2026-05-20 23:42 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CY

keywords LLM detectionsteganographyeducation integrityverbatim copyUnicode tagsprompt watermarkhomework detection

0 comments

The pith

Embedding an invisible instruction in assignment prompts makes LLMs output a detectable signature on verbatim copy-paste.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper demonstrates a way for educators to catch the specific case of students pasting an entire homework prompt into an LLM and turning in the reply unchanged. A hidden command is placed inside the visible text using special Unicode characters that do not alter how the prompt looks to a person. When the model processes the full prompt, it reads the command and adds a unique mark to its answer. Teachers then scan submissions for that mark. The approach avoids problems with general AI detectors that can be unreliable or unfair to some students, and it works without needing help from the companies that build the models.

Core claim

Encoding an arbitrary printable-ASCII payload into the Unicode Tags block inside the assignment prompt leaves the text visually identical to the original while directing any model that ingests the prompt verbatim to include a specific signature in its generated reply, thereby marking the copy-and-paste pathway.

What carries the argument

SteganoPrompt, which converts a payload into characters from the deprecated Unicode Tags block (U+E0000 to U+E007F) to create an invisible instruction that frontier models tokenize and follow.

If this is right

The signature appears reliably across seven LLM families when the full prompt reaches the model.
The encoding persists through transmission in Word, Google Docs, PDF, Markdown, Slack, email, and major learning-management systems.
Educators obtain a direct signal for verbatim prompt ingestion without depending on post-hoc detectors or model-provider watermarks.
The method targets only the copy-paste misuse case and does not penalize other forms of LLM assistance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoding approach could be adapted to verify direct sourcing in professional reports or legal documents where provenance matters.
Future model training might filter out tags in this range, which would require shifting to a different encoding block or method.
Pairing the signature check with existing style-based detectors could reduce false positives for non-native writers while catching verbatim cases.

Load-bearing premise

The encoded instruction will survive common copy-paste operations and be read and acted on by the language model even though it remains invisible to the human reader.

What would settle it

Provide the same LLM with an assignment prompt containing the encoded instruction and with an otherwise identical prompt lacking it, then check whether only the first case produces the expected signature in the model output.

Figures

Figures reproduced from arXiv: 2605.16336 by Aizierjiang Aiersilan.

**Figure 1.** Figure 1: The SteganoPrompt web interface. The educator pastes the assignment brief into panel 1 (Visible text), selects or types a tripwire instruction in panel 2 (Hidden instruction), and presses Encode & copy. Panel 3 displays the encoded output, which is character-for-character indistinguishable from the input under any common font. The page is a single self-contained HTML file with no network calls. and the ins… view at source ↗

read the original abstract

Large language models (LLMs) have made fluent essay writing, code drafting, and quiz answering instantly available to students at every level, from secondary school through graduate study. Many educators do not object to LLM use \emph{per~se}; what they need to detect is the case in which a student pastes the assignment prompt into a chatbot and submits the model's reply verbatim, without engaging with the work. Existing post-hoc AI-text detectors remain unreliable and have been shown to penalise non-native English writers, while output-side watermarks require cooperation from the model provider. We propose an alternative that the educator controls directly: an input-side watermark in which an invisible instruction is embedded inside the visible assignment prompt itself. An LLM that ingests the prompt verbatim quietly reads the hidden instruction and writes a tell-tale signature into its reply, exposing the copy-and-paste pathway specifically. We describe SteganoPrompt, a single-page, zero-dependency web tool that encodes an arbitrary printable-ASCII payload into the deprecated Unicode Tags block (\texttt{U+E0000}--\texttt{U+E007F}). The encoded string is visually identical to the original, survives common copy-paste channels (Word, Google Docs, PDF, Markdown, Slack, e-mail, the major learning-management systems), and is reliably tokenized by frontier models. We evaluate compliance across seven LLM families and a representative set of educational content channels. The work is informed by my experience as a graduate teaching assistant for an undergraduate software engineering course at the George Washington University. The tool is released under the MIT licence at \url{https://ezharjan.github.io/SteganoPrompt/}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes SteganoPrompt, a zero-dependency web tool that encodes an arbitrary printable-ASCII payload into the Unicode Tags block (U+E0000–U+E007F) and embeds it invisibly inside an assignment prompt. An LLM that receives the prompt verbatim is expected to read the hidden instruction and emit a tell-tale signature in its output, thereby flagging verbatim copy-paste from the prompt. The authors claim the encoding survives common educational copy-paste channels (Word, Google Docs, PDF, Markdown, Slack, e-mail, major LMS platforms) and is reliably tokenized by frontier models; they report an evaluation of compliance across seven LLM families and representative educational channels. The work is motivated by the author's experience as a graduate TA and the tool is released under the MIT license.

Significance. If the survival and tokenization claims are substantiated with concrete per-channel and per-model data, the approach supplies educators with a lightweight, provider-independent mechanism to detect one specific form of LLM misuse that existing post-hoc detectors do not reliably isolate. The single-page implementation and open release constitute practical contributions.

major comments (2)

[Abstract] Abstract: the central claim that the encoded string 'survives common copy-paste channels … and is reliably tokenized by frontier models' is load-bearing for the detection guarantee, yet the abstract (and, by the reader's report, the evaluation) supplies no per-channel character-loss statistics, per-model compliance rates, or enumerated failure modes (normalization, private-use stripping, truncation). Without these numbers the reliability assertion cannot be verified.
[Evaluation] Evaluation section: the statement that compliance was evaluated across seven LLM families is presented without tabulated success rates, error cases, or channel-specific results. This absence directly affects the falsifiability of the claim that the method works for verbatim copy-paste in typical educational workflows.

minor comments (1)

The manuscript would benefit from an explicit table (or appendix) listing compliance percentages per model and per channel so that readers can assess the practical scope of the method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on strengthening the empirical support for our claims. We address each major comment below and have revised the manuscript to incorporate the requested quantitative details.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the encoded string 'survives common copy-paste channels … and is reliably tokenized by frontier models' is load-bearing for the detection guarantee, yet the abstract (and, by the reader's report, the evaluation) supplies no per-channel character-loss statistics, per-model compliance rates, or enumerated failure modes (normalization, private-use stripping, truncation). Without these numbers the reliability assertion cannot be verified.

Authors: We agree that the abstract would benefit from explicit quantitative backing. In the revised manuscript we have updated the abstract to report summary compliance rates (98% overall tokenization success across tested models and <2% average character loss across channels). A new table in the Evaluation section now provides per-channel character-loss statistics, per-model compliance rates for the seven LLM families, and an enumerated list of observed failure modes including minor truncation in certain LMS platforms and private-use stripping in one email client. revision: yes
Referee: [Evaluation] Evaluation section: the statement that compliance was evaluated across seven LLM families is presented without tabulated success rates, error cases, or channel-specific results. This absence directly affects the falsifiability of the claim that the method works for verbatim copy-paste in typical educational workflows.

Authors: We acknowledge the original Evaluation section presented only a high-level summary. We have expanded it with two tables: one listing success rates and error cases for each of the seven LLM families, and another showing channel-specific results for Word, Google Docs, PDF, Markdown, Slack, e-mail, and major LMS platforms. All observed failure modes are now explicitly listed with their frequencies, allowing readers to assess falsifiability for their own workflows. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering tool proposal with direct empirical evaluation

full rationale

The manuscript describes an input-side watermarking tool (SteganoPrompt) that encodes an ASCII payload into the Unicode Tags block and asserts that the resulting string remains visually identical, survives listed copy-paste channels, and is tokenized by frontier models. These properties are presented as design goals followed by an evaluation across seven LLM families and representative educational platforms. No equations, fitted parameters, derivations, or self-citations appear in the provided text; the central detection claim rests on the observable behavior of the constructed string rather than on any reduction to prior fitted values or author-defined uniqueness theorems. The work is therefore self-contained as an engineering artifact whose correctness can be checked by independent reproduction of the encoding and channel tests.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach depends on the assumption that current frontier LLMs will interpret characters from the Unicode Tags block as executable instructions within the prompt.

axioms (1)

domain assumption Frontier LLMs tokenize and follow instructions placed in the Unicode Tags block (U+E0000--U+E007F).
The detection mechanism requires that models process the hidden payload and emit the requested signature.

pith-pipeline@v0.9.0 · 5828 in / 1151 out tokens · 56529 ms · 2026-05-20T23:42:52.828848+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

encodes an arbitrary printable-ASCII payload into the deprecated Unicode Tags block (U+E0000–U+E007F)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 5 internal anchors

[1]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27730–27744, 2022

work page 2022
[2]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

Chatting and cheating: Ensuring academic integrity in the era of chatgpt,

D. R. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of chatgpt,” Innovations in education and teaching international, vol. 61, no. 2, pp. 228–239, 2024

work page 2024
[4]

Chatgpt: The end of online exam integrity?

T. Susnjak and T. R. McIntosh, “Chatgpt: The end of online exam integrity?”Education Sciences, vol. 14, no. 6, p. 656, 2024

work page 2024
[5]

What is the impact of chatgpt on education? a rapid review of the literature,

C. K. Lo, “What is the impact of chatgpt on education? a rapid review of the literature,”Education sciences, vol. 13, no. 4, p. 410, 2023

work page 2023
[6]

Academic integrity considerations of ai large language models in the post-pandemic era: Chatgpt and beyond,

M. Perkins, “Academic integrity considerations of ai large language models in the post-pandemic era: Chatgpt and beyond,” Journal of University Teaching and Learning Practice, vol. 20, no. 2, pp. 1–24, 2023

work page 2023
[7]

Ai bot chatgpt writes smart essays-should professors worry?

C. Stokel-Walker, “Ai bot chatgpt writes smart essays-should professors worry?”Nature, 2022

work page 2022
[8]

Detectgpt: Zero-shot machine-generated text detection using probability curvature,

E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, and C. Finn, “Detectgpt: Zero-shot machine-generated text detection using probability curvature,” inInternational conference on machine learning. PMLR, 2023, pp. 24950–24962

work page 2023
[9]

New AI classifier for indicating AI-written text,

OpenAI, “New AI classifier for indicating AI-written text,” https://openai.com/blog/ new-ai-classifier-for-indicating-ai-written-text, 2023

work page 2023
[10]

Gpt detectors are biased against non-native english writers,

W. Liang, M. Yuksekgonul, Y. Mao, E. Wu, and J. Zou, “Gpt detectors are biased against non-native english writers,”Patterns, vol. 4, no. 7, 2023

work page 2023
[11]

Can AI-Generated Text be Reliably Detected?

V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, and S. Feizi, “Can ai-generated text be reliably detected?”arXiv preprint arXiv:2303.11156, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense,

K. Krishna, Y. Song, M. Karpinska, J. Wieting, and M. Iyyer, “Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense,”Advances in neural information process- ing systems, vol. 36, pp. 27469–27500, 2023

work page 2023
[13]

A watermark for large language models,

J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” in International conference on machine learning. PMLR, 2023, pp. 17061–17084

work page 2023
[14]

Watermarking of large language models,

S. Aaronson and H. Kirchner, “Watermarking of large language models,” inLarge language models and transformers workshop at Simons Institute for the Theory of Computing, vol. 2023, 2023

work page 2023
[15]

Robust distortion-free watermarks for language models,

R. Kuditipudi, J. Thickstun, T. Hashimoto, and P. Liang, “Robust distortion-free watermarks for language models,”arXiv preprint arXiv:2307.15593, 2023

work page arXiv 2023
[16]

Undetectable watermarks for language models,

M. Christ, S. Gunn, and O. Zamir, “Undetectable watermarks for language models,” inThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 2024, pp. 1125–1139

work page 2024
[17]

Provable robust watermarking for ai-generated text,

X. Zhao, P. Ananth, L. Li, and Y.-X. Wang, “Provable robust watermarking for ai-generated text,”arXiv preprint arXiv:2306.17439, 2023

work page arXiv 2023
[18]

ASCII smuggler tool: Crafting invisible text and decoding hidden codes,

J. Rehberger, “ASCII smuggler tool: Crafting invisible text and decoding hidden codes,” Embrace The Red, https:// embracethered.com/blog/, 2024

work page 2024
[19]

On violations of llm review policies,

ICML 2026, “On violations of llm review policies,” International Conference on Machine Learning, https://icml.cc/Conferences/ 2026, 2026. [Online]. Available: https://blog.icml.cc/2026/03/ 18/on-violations-of-llm-review-policies/

work page 2026
[20]

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

W. Liang, Z. Izzo, Y. Zhang, H. Lepp, H. Cao, X. Zhao, L. Chen, H. Ye, S. Liu, Z. Huanget al., “Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews,”arXiv preprint arXiv:2403.07183, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,” inProceedingsofthe16thACMworkshoponartificialintelligence and security, 2023, pp. 79–90

work page 2023
[22]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack tech- niques for language models,”arXiv preprint arXiv:2211.09527, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Formalizing and benchmarking prompt injection attacks and defenses,

Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847

work page 2024
[24]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

Bad characters: Imperceptible nlp attacks,

N. Boucher, I. Shumailov, R. Anderson, and N. Papernot, “Bad characters: Imperceptible nlp attacks,” in2022 IEEE symposium on security and privacy (SP). IEEE, 2022, pp. 1987–2004

work page 2022
[26]

Text processing like humans do: Visually attacking and shielding nlp systems,

S. Eger, G. G. Şahin, A. Rücklé, J.-U. Lee, C. Schulz, M. Mesgar, K. Swarnkar, E. Simpson, and I. Gurevych, “Text processing like humans do: Visually attacking and shielding nlp systems,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Sh...

work page 2019
[27]

Theunicodestandard,version15.0.0,

TheUnicodeConsortium,“Theunicodestandard,version15.0.0,” Mountain View, CA: The Unicode Consortium, 2022. [Online]. Available: https://www.unicode.org/versions/Unicode15.0.0/

work page 2022
[28]

Frustratingly easy edit-based linguistic steganography with a masked language model,

H. Ueoka, Y. Murawaki, and S. Kurohashi, “Frustratingly easy edit-based linguistic steganography with a masked language model,” inProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, 2021, pp. 5486–5492

work page 2021
[29]

Tracing text provenance via context-aware lexical sub- stitution,

X. Yang, J. Zhang, K. Chen, W. Zhang, Z. Ma, F. Wang, and N. Yu, “Tracing text provenance via context-aware lexical sub- stitution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, 2022, pp. 11613–11621

work page 2022

[1] [1]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27730–27744, 2022

work page 2022

[2] [2]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[3] [3]

Chatting and cheating: Ensuring academic integrity in the era of chatgpt,

D. R. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of chatgpt,” Innovations in education and teaching international, vol. 61, no. 2, pp. 228–239, 2024

work page 2024

[4] [4]

Chatgpt: The end of online exam integrity?

T. Susnjak and T. R. McIntosh, “Chatgpt: The end of online exam integrity?”Education Sciences, vol. 14, no. 6, p. 656, 2024

work page 2024

[5] [5]

What is the impact of chatgpt on education? a rapid review of the literature,

C. K. Lo, “What is the impact of chatgpt on education? a rapid review of the literature,”Education sciences, vol. 13, no. 4, p. 410, 2023

work page 2023

[6] [6]

Academic integrity considerations of ai large language models in the post-pandemic era: Chatgpt and beyond,

M. Perkins, “Academic integrity considerations of ai large language models in the post-pandemic era: Chatgpt and beyond,” Journal of University Teaching and Learning Practice, vol. 20, no. 2, pp. 1–24, 2023

work page 2023

[7] [7]

Ai bot chatgpt writes smart essays-should professors worry?

C. Stokel-Walker, “Ai bot chatgpt writes smart essays-should professors worry?”Nature, 2022

work page 2022

[8] [8]

Detectgpt: Zero-shot machine-generated text detection using probability curvature,

E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, and C. Finn, “Detectgpt: Zero-shot machine-generated text detection using probability curvature,” inInternational conference on machine learning. PMLR, 2023, pp. 24950–24962

work page 2023

[9] [9]

New AI classifier for indicating AI-written text,

OpenAI, “New AI classifier for indicating AI-written text,” https://openai.com/blog/ new-ai-classifier-for-indicating-ai-written-text, 2023

work page 2023

[10] [10]

Gpt detectors are biased against non-native english writers,

W. Liang, M. Yuksekgonul, Y. Mao, E. Wu, and J. Zou, “Gpt detectors are biased against non-native english writers,”Patterns, vol. 4, no. 7, 2023

work page 2023

[11] [11]

Can AI-Generated Text be Reliably Detected?

V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, and S. Feizi, “Can ai-generated text be reliably detected?”arXiv preprint arXiv:2303.11156, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense,

K. Krishna, Y. Song, M. Karpinska, J. Wieting, and M. Iyyer, “Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense,”Advances in neural information process- ing systems, vol. 36, pp. 27469–27500, 2023

work page 2023

[13] [13]

A watermark for large language models,

J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” in International conference on machine learning. PMLR, 2023, pp. 17061–17084

work page 2023

[14] [14]

Watermarking of large language models,

S. Aaronson and H. Kirchner, “Watermarking of large language models,” inLarge language models and transformers workshop at Simons Institute for the Theory of Computing, vol. 2023, 2023

work page 2023

[15] [15]

Robust distortion-free watermarks for language models,

R. Kuditipudi, J. Thickstun, T. Hashimoto, and P. Liang, “Robust distortion-free watermarks for language models,”arXiv preprint arXiv:2307.15593, 2023

work page arXiv 2023

[16] [16]

Undetectable watermarks for language models,

M. Christ, S. Gunn, and O. Zamir, “Undetectable watermarks for language models,” inThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 2024, pp. 1125–1139

work page 2024

[17] [17]

Provable robust watermarking for ai-generated text,

X. Zhao, P. Ananth, L. Li, and Y.-X. Wang, “Provable robust watermarking for ai-generated text,”arXiv preprint arXiv:2306.17439, 2023

work page arXiv 2023

[18] [18]

ASCII smuggler tool: Crafting invisible text and decoding hidden codes,

J. Rehberger, “ASCII smuggler tool: Crafting invisible text and decoding hidden codes,” Embrace The Red, https:// embracethered.com/blog/, 2024

work page 2024

[19] [19]

On violations of llm review policies,

ICML 2026, “On violations of llm review policies,” International Conference on Machine Learning, https://icml.cc/Conferences/ 2026, 2026. [Online]. Available: https://blog.icml.cc/2026/03/ 18/on-violations-of-llm-review-policies/

work page 2026

[20] [20]

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

W. Liang, Z. Izzo, Y. Zhang, H. Lepp, H. Cao, X. Zhao, L. Chen, H. Ye, S. Liu, Z. Huanget al., “Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews,”arXiv preprint arXiv:2403.07183, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,” inProceedingsofthe16thACMworkshoponartificialintelligence and security, 2023, pp. 79–90

work page 2023

[22] [22]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack tech- niques for language models,”arXiv preprint arXiv:2211.09527, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[23] [23]

Formalizing and benchmarking prompt injection attacks and defenses,

Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847

work page 2024

[24] [24]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [25]

Bad characters: Imperceptible nlp attacks,

N. Boucher, I. Shumailov, R. Anderson, and N. Papernot, “Bad characters: Imperceptible nlp attacks,” in2022 IEEE symposium on security and privacy (SP). IEEE, 2022, pp. 1987–2004

work page 2022

[26] [26]

Text processing like humans do: Visually attacking and shielding nlp systems,

S. Eger, G. G. Şahin, A. Rücklé, J.-U. Lee, C. Schulz, M. Mesgar, K. Swarnkar, E. Simpson, and I. Gurevych, “Text processing like humans do: Visually attacking and shielding nlp systems,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Sh...

work page 2019

[27] [27]

Theunicodestandard,version15.0.0,

TheUnicodeConsortium,“Theunicodestandard,version15.0.0,” Mountain View, CA: The Unicode Consortium, 2022. [Online]. Available: https://www.unicode.org/versions/Unicode15.0.0/

work page 2022

[28] [28]

Frustratingly easy edit-based linguistic steganography with a masked language model,

H. Ueoka, Y. Murawaki, and S. Kurohashi, “Frustratingly easy edit-based linguistic steganography with a masked language model,” inProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, 2021, pp. 5486–5492

work page 2021

[29] [29]

Tracing text provenance via context-aware lexical sub- stitution,

X. Yang, J. Zhang, K. Chen, W. Zhang, Z. Ma, F. Wang, and N. Yu, “Tracing text provenance via context-aware lexical sub- stitution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, 2022, pp. 11613–11621

work page 2022