Homoglyph-based Adversarial Perturbation of Introductory Computer Science Theory Problems
Pith reviewed 2026-05-21 01:31 UTC · model grok-4.3
The pith
Replacing a few characters with their visual look-alikes perturbs introductory CS theory problems so that current AI models fail while humans understand them unchanged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Homoglyph-based adversarial perturbation modifies the question by substituting a small number of characters with their homoglyph equivalents. This leaves the semantic meaning intact for human readers but causes current AI models to produce incorrect answers on the perturbed versions of introductory computer science theory problems. The experimental results confirm that such problems can be effectively altered this way, and the authors supply an interactive tool for convenient application of the method.
What carries the argument
Homoglyph-based adversarial perturbation: the targeted replacement of characters in a problem statement with visually similar but distinct Unicode symbols that preserve readability and meaning for humans yet break the pattern recognition of current large language models.
If this is right
- Instructors can generate multiple distinct versions of the same homework set with minimal effort.
- Students who submit AI-generated answers on perturbed problems will receive incorrect solutions.
- Graders continue to evaluate the original intended question without needing to decode the substitutions.
- The approach works on theoretical rather than programming or calculation problems in introductory courses.
Where Pith is reading between the lines
- The same substitution technique could be tested on non-CS subjects that rely on written problem statements.
- Future AI models trained to normalize homoglyphs might reduce the method's effectiveness over time.
- Combining homoglyph perturbation with other defenses, such as requiring process explanations, could create layered protections.
Load-bearing premise
Homoglyph substitutions keep the intended meaning and readability intact for human readers and course graders while making the problems unsolvable by current AI models.
What would settle it
Give the same set of original and homoglyph-perturbed CS theory problems to both AI models and to human students or graders, then measure whether AI solution accuracy drops sharply while human accuracy and comprehension remain essentially unchanged.
Figures
read the original abstract
Different AI tools such as ChatGPT, Gemini, and Claude are becoming very popular. Although they are helpful for many day-to-day tasks, they can be used in unexpected ways. For example, the learning objectives of a course may not be achieved if students use these tools to solve their homework problems. This paper proposes a simple method to address this issue in the lazy student model. The method uses homoglyph-based adversarial perturbation to first modify the question without changing the semantic meaning of the question. Then a few characters are perturbed by their homoglyphs. Our experimental result shows the theoretical problems of introductory computer science courses can be effectively perturbed. We also propose an interactive tool to conveniently use our method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a homoglyph-based adversarial perturbation method to modify introductory computer science theory problems so that AI tools (ChatGPT, Gemini, Claude) fail to solve them correctly while preserving semantic meaning for human readers and graders. It claims that experimental results demonstrate the effectiveness of this approach on theoretical problems and introduces an interactive tool for applying the perturbations.
Significance. If the central empirical claim holds under proper validation, the work could offer a lightweight, practical defense for educators against AI-assisted cheating in CS theory homework (e.g., automata, regex, proofs). The method is simple and does not require retraining models, which is a potential strength if reproducibility and human validation are added.
major comments (2)
- [Abstract] Abstract: the claim that 'Our experimental result shows the theoretical problems of introductory computer science courses can be effectively perturbed' is load-bearing for the paper's contribution yet is unsupported by any reported details on the number of problems tested, the specific AI models and versions evaluated, quantitative success/failure rates, or controls confirming that perturbations preserve meaning for human graders while breaking AI performance.
- [Method / Experimental Results] The weakest assumption (that homoglyph substitutions leave semantic content intact for course staff while destroying it for current LLMs) is not tested; in formal theory problems even visually similar glyphs can change interpretation (e.g., in automata diagrams or regex syntax), and no inter-rater agreement scores, comprehension tests with actual graders, or comparison to unicode-normalized baselines are provided.
minor comments (2)
- Add a table or section summarizing per-problem or per-model results to make the experimental claims verifiable.
- Clarify the 'lazy student model' referenced in the abstract; the term is not standard and should be defined or referenced.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments identify important areas for strengthening the presentation of our experimental claims and validation of the core assumptions. We address each point below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'Our experimental result shows the theoretical problems of introductory computer science courses can be effectively perturbed' is load-bearing for the paper's contribution yet is unsupported by any reported details on the number of problems tested, the specific AI models and versions evaluated, quantitative success/failure rates, or controls confirming that perturbations preserve meaning for human graders while breaking AI performance.
Authors: We agree that the abstract would be strengthened by including these specifics rather than a high-level claim. In the revised manuscript we will update the abstract to report the number of problems evaluated, the exact models and versions tested, the observed quantitative success rates for both original and perturbed problems, and the human-grader controls used to confirm semantic preservation. These details appear in the Experimental Results section; we will ensure the abstract accurately summarizes them. revision: yes
-
Referee: [Method / Experimental Results] The weakest assumption (that homoglyph substitutions leave semantic content intact for course staff while destroying it for current LLMs) is not tested; in formal theory problems even visually similar glyphs can change interpretation (e.g., in automata diagrams or regex syntax), and no inter-rater agreement scores, comprehension tests with actual graders, or comparison to unicode-normalized baselines are provided.
Authors: We acknowledge that direct empirical validation of semantic preservation for humans versus LLMs is currently limited in the manuscript. We will add a dedicated subsection describing a human evaluation with course staff, including inter-rater agreement statistics and comprehension scores. We will also include concrete examples showing that the chosen homoglyph substitutions avoid altering syntactic elements in automata diagrams and regex, together with a unicode-normalization baseline comparison demonstrating that normalization restores LLM performance. A full-scale study with statistical power will be reported in the revision; initial pilot results will be added now. revision: partial
Circularity Check
No circularity: empirical method with no derivations or self-referential reductions
full rationale
The paper proposes a homoglyph perturbation method for CS theory problems and asserts that experimental results demonstrate effectiveness. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text. The central claim is an empirical assertion about perturbation success rather than a mathematical result that reduces to its own inputs by construction. No self-citations are invoked as load-bearing premises, and the method is presented directly without renaming known results or smuggling ansatzes. This is a standard non-circular empirical proposal.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Homoglyph substitutions preserve semantic meaning for human readers and graders.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our method uses homoglyph-based adversarial perturbation to first modify the question without changing the semantic meaning... Then a few characters are perturbed by their homoglyphs.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We have taken a set of homework problems from a well-known discrete mathematics book... all the questions can be perturbed easily by only applying a few perturbations.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Cheating using ai and copy-pasting from llms: New realities in higher education
Airil Haimi Mohd Adnan, Mohamad Safwat Ashahri Mohd Salim, Dianna Suzieanna Mohamad Shah, AsmahanimHajiMohamadYusuf, MohdNurFitriMohdSalim, andMohdHaniffMohdTahir. Cheating using ai and copy-pasting from llms: New realities in higher education. InInternational Conference on Business and Technology, pages 399–410. Springer, 2025
work page 2025
-
[2]
Binglin Chen, Colleen M Lewis, Matthew West, and Craig Zilles. Plagiarism in the age of generative AI: cheating method change and learning loss in an intro to CS course. InProceedings of the Eleventh ACM Conference on Learning@ Scale, pages 75–85, 2024
work page 2024
-
[3]
Neuro-symbolic ai in 2024: A systematic review.arXiv preprint arXiv:2501.05435, 2025
Brandon C Colelough and William Regli. Neuro-symbolic ai in 2024: A systematic review.arXiv preprint arXiv:2501.05435, 2025
-
[4]
Hiding in plain sight: Tweets with hate speech masked by homoglyphs
Portia Cooper, Mihai Surdeanu, and Eduardo Blanco. Hiding in plain sight: Tweets with hate speech masked by homoglyphs. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 2922–2929, 2023
work page 2023
-
[5]
SilverSpeak: Evading AI-generated text detectors using homo- glyphs
Aldan Creo and Shushanta Pudasaini. SilverSpeak: Evading AI-generated text detectors using homo- glyphs. InProceedings of the 1st Workshop on GenAI Content Detection (GenAIDetect), pages 1–46, 2025
work page 2025
-
[6]
Mathematics and its applications.Higher Education
Kenneth H Rosen Discrete. Mathematics and its applications.Higher Education. 4th edition. McGraw- Hill, 2007
work page 2007
-
[7]
Large language models are neurosymbolic reasoners
Meng Fang, Shilong Deng, Yudi Zhang, Zijing Shi, Ling Chen, Mykola Pechenizkiy, and Jun Wang. Large language models are neurosymbolic reasoners. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17985–17993, 2024
work page 2024
-
[8]
The homograph attack.Communications of the ACM, 45(2):128, 2002
Evgeniy Gabrilovich and Alex Gontmakher. The homograph attack.Communications of the ACM, 45(2):128, 2002
work page 2002
- [9]
-
[10]
Math-perturb: Benchmarking llms’ math reasoning abilities against hard perturbations
Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, et al. Math-perturb: Benchmarking llms’ math reasoning abilities against hard perturbations.arXiv preprint arXiv:2502.06453, 2025
-
[11]
Fighting unicode-obfuscated spam
Changwei Liu and Sid Stamm. Fighting unicode-obfuscated spam. InProceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 45–59, 2007
work page 2007
-
[12]
Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, and Marisa Llorens Salvador. Survey on ai- generatedplagiarismdetection: Theimpactoflargelanguagemodelsonacademicintegrity: S.pudasaini et al.Journal of Academic Ethics, 23(3):1137–1170, 2025
work page 2025
-
[13]
Gokul Puthumanaillam, Timothy Bretl, and Melkior Ornik. The lazy student’s dream: Chatgpt passing an engineering course on its own.IFAC-PapersOnLine, 59(7):213–218, 2025
work page 2025
-
[14]
ImpedingLLM-assistedcheatinginintroductoryprogrammingassignmentsviaadversarialperturbation
Saiful Salim, Rubin Yang, Alexander Cooper, Suryashree Ray, Saumya Debray, and Sazzadur Rahaman. ImpedingLLM-assistedcheatinginintroductoryprogrammingassignmentsviaadversarialperturbation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 445–463, 2024
work page 2024
-
[15]
Ronnie de Souza Santos, Italo Santos, Mariana Bento, Giuseppe Destefanis, Cleyton Magalhães, and Mairieli Wessel. Llm use, cheating, and academic integrity in software engineering education.arXiv preprint arXiv:2603.17060, 2026. 14
-
[16]
Visual spoofing in content-based spam detection
Mark Sokolov, Kehinde Olufowobi, and Nic Herndon. Visual spoofing in content-based spam detection. In13th International Conference on Security of Information and Networks, pages 1–5, 2020
work page 2020
-
[17]
Attacking neural text detectors
Max Wolff and Stuart Wolff. Attacking neural text detectors. InICLR 2020 Workshop on Trustwory Machine Learning, 2020. 15
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.