pith. sign in

arxiv: 2605.16286 · v1 · pith:NZHNOFJCnew · submitted 2026-04-13 · 💻 cs.CY · cs.AI

Homoglyph-based Adversarial Perturbation of Introductory Computer Science Theory Problems

Pith reviewed 2026-05-21 01:31 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords homoglyphsadversarial perturbationAI cheatingcomputer science educationtheoretical problemshomework protectionUnicode substitutions
0
0 comments X

The pith

Replacing a few characters with their visual look-alikes perturbs introductory CS theory problems so that current AI models fail while humans understand them unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes and tests a simple technique that swaps selected characters in computer science theory homework questions with homoglyphs, characters that look almost identical but have different Unicode values. The goal is to create versions that AI tools like ChatGPT, Gemini, and Claude cannot solve correctly, while the original meaning stays obvious to students and graders. Experiments on theoretical problems from introductory courses show the perturbations succeed at this separation. The authors also release an interactive tool to apply the changes quickly. If the method works as described, instructors gain a low-effort way to protect homework from AI-assisted copying without altering the actual questions or their difficulty.

Core claim

Homoglyph-based adversarial perturbation modifies the question by substituting a small number of characters with their homoglyph equivalents. This leaves the semantic meaning intact for human readers but causes current AI models to produce incorrect answers on the perturbed versions of introductory computer science theory problems. The experimental results confirm that such problems can be effectively altered this way, and the authors supply an interactive tool for convenient application of the method.

What carries the argument

Homoglyph-based adversarial perturbation: the targeted replacement of characters in a problem statement with visually similar but distinct Unicode symbols that preserve readability and meaning for humans yet break the pattern recognition of current large language models.

If this is right

  • Instructors can generate multiple distinct versions of the same homework set with minimal effort.
  • Students who submit AI-generated answers on perturbed problems will receive incorrect solutions.
  • Graders continue to evaluate the original intended question without needing to decode the substitutions.
  • The approach works on theoretical rather than programming or calculation problems in introductory courses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same substitution technique could be tested on non-CS subjects that rely on written problem statements.
  • Future AI models trained to normalize homoglyphs might reduce the method's effectiveness over time.
  • Combining homoglyph perturbation with other defenses, such as requiring process explanations, could create layered protections.

Load-bearing premise

Homoglyph substitutions keep the intended meaning and readability intact for human readers and course graders while making the problems unsolvable by current AI models.

What would settle it

Give the same set of original and homoglyph-perturbed CS theory problems to both AI models and to human students or graders, then measure whether AI solution accuracy drops sharply while human accuracy and comprehension remain essentially unchanged.

Figures

Figures reproduced from arXiv: 2605.16286 by Aidan Alexander, Chitrangada Juneja, Miro Vanek, Napaluck Tontrasathien, Reyan Ahmed, Saumya Debray, Sazzadur Rahaman.

Figure 1
Figure 1. Figure 1: Despite the fact that several parts of the question are perturbed, as shown in this figure, most [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A common type of response from popular LLM models, including ChatGPT and Gemini, indicating [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustrating possible modifications of a question without changing the semantic meaning. The [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Some perturbation might be effective, but the model just replaced those perturbations using some [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustrating different homoglyphs of the numeric character “7”. Some homoglyphs are readable by [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustrating the unrecognizability of a homoglyph of numeric character “7”. This particular example [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustrating an example where we added a perturbation of “7” as the coefficient of “x”. The model [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustrating the distribution of the number of characters across the 164 questions considered. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Illustrating the distribution of the number of attempts to fool the models across the 164 questions [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustrating the distribution of the number of perturbed characters to fool the models across the [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Illustrating the unrecognizability of a homoglyph of numeric character “6”. This particular [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Illustrating the observation that the more perturbations can become less effective. Here we [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Illustrating the observation that less perturbations can become more effective. Here we perturb [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Illustrating the observation that not always perturbing numeric characters are effective; in this [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Illustrating the observation that perturbing numeric characters are effective. In this question we [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Illustrating the interactive tool. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗
read the original abstract

Different AI tools such as ChatGPT, Gemini, and Claude are becoming very popular. Although they are helpful for many day-to-day tasks, they can be used in unexpected ways. For example, the learning objectives of a course may not be achieved if students use these tools to solve their homework problems. This paper proposes a simple method to address this issue in the lazy student model. The method uses homoglyph-based adversarial perturbation to first modify the question without changing the semantic meaning of the question. Then a few characters are perturbed by their homoglyphs. Our experimental result shows the theoretical problems of introductory computer science courses can be effectively perturbed. We also propose an interactive tool to conveniently use our method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a homoglyph-based adversarial perturbation method to modify introductory computer science theory problems so that AI tools (ChatGPT, Gemini, Claude) fail to solve them correctly while preserving semantic meaning for human readers and graders. It claims that experimental results demonstrate the effectiveness of this approach on theoretical problems and introduces an interactive tool for applying the perturbations.

Significance. If the central empirical claim holds under proper validation, the work could offer a lightweight, practical defense for educators against AI-assisted cheating in CS theory homework (e.g., automata, regex, proofs). The method is simple and does not require retraining models, which is a potential strength if reproducibility and human validation are added.

major comments (2)
  1. [Abstract] Abstract: the claim that 'Our experimental result shows the theoretical problems of introductory computer science courses can be effectively perturbed' is load-bearing for the paper's contribution yet is unsupported by any reported details on the number of problems tested, the specific AI models and versions evaluated, quantitative success/failure rates, or controls confirming that perturbations preserve meaning for human graders while breaking AI performance.
  2. [Method / Experimental Results] The weakest assumption (that homoglyph substitutions leave semantic content intact for course staff while destroying it for current LLMs) is not tested; in formal theory problems even visually similar glyphs can change interpretation (e.g., in automata diagrams or regex syntax), and no inter-rater agreement scores, comprehension tests with actual graders, or comparison to unicode-normalized baselines are provided.
minor comments (2)
  1. Add a table or section summarizing per-problem or per-model results to make the experimental claims verifiable.
  2. Clarify the 'lazy student model' referenced in the abstract; the term is not standard and should be defined or referenced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify important areas for strengthening the presentation of our experimental claims and validation of the core assumptions. We address each point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'Our experimental result shows the theoretical problems of introductory computer science courses can be effectively perturbed' is load-bearing for the paper's contribution yet is unsupported by any reported details on the number of problems tested, the specific AI models and versions evaluated, quantitative success/failure rates, or controls confirming that perturbations preserve meaning for human graders while breaking AI performance.

    Authors: We agree that the abstract would be strengthened by including these specifics rather than a high-level claim. In the revised manuscript we will update the abstract to report the number of problems evaluated, the exact models and versions tested, the observed quantitative success rates for both original and perturbed problems, and the human-grader controls used to confirm semantic preservation. These details appear in the Experimental Results section; we will ensure the abstract accurately summarizes them. revision: yes

  2. Referee: [Method / Experimental Results] The weakest assumption (that homoglyph substitutions leave semantic content intact for course staff while destroying it for current LLMs) is not tested; in formal theory problems even visually similar glyphs can change interpretation (e.g., in automata diagrams or regex syntax), and no inter-rater agreement scores, comprehension tests with actual graders, or comparison to unicode-normalized baselines are provided.

    Authors: We acknowledge that direct empirical validation of semantic preservation for humans versus LLMs is currently limited in the manuscript. We will add a dedicated subsection describing a human evaluation with course staff, including inter-rater agreement statistics and comprehension scores. We will also include concrete examples showing that the chosen homoglyph substitutions avoid altering syntactic elements in automata diagrams and regex, together with a unicode-normalization baseline comparison demonstrating that normalization restores LLM performance. A full-scale study with statistical power will be reported in the revision; initial pilot results will be added now. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical method with no derivations or self-referential reductions

full rationale

The paper proposes a homoglyph perturbation method for CS theory problems and asserts that experimental results demonstrate effectiveness. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text. The central claim is an empirical assertion about perturbation success rather than a mathematical result that reduces to its own inputs by construction. No self-citations are invoked as load-bearing premises, and the method is presented directly without renaming known results or smuggling ansatzes. This is a standard non-circular empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach depends on the untested premise that homoglyph changes fool AI parsers without altering human interpretation of the problem statement.

axioms (1)
  • domain assumption Homoglyph substitutions preserve semantic meaning for human readers and graders.
    Invoked when the paper states that the question is modified without changing its semantic meaning.

pith-pipeline@v0.9.0 · 5669 in / 1161 out tokens · 40370 ms · 2026-05-21T01:31:55.104447+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Cheating using ai and copy-pasting from llms: New realities in higher education

    Airil Haimi Mohd Adnan, Mohamad Safwat Ashahri Mohd Salim, Dianna Suzieanna Mohamad Shah, AsmahanimHajiMohamadYusuf, MohdNurFitriMohdSalim, andMohdHaniffMohdTahir. Cheating using ai and copy-pasting from llms: New realities in higher education. InInternational Conference on Business and Technology, pages 399–410. Springer, 2025

  2. [2]

    Plagiarism in the age of generative AI: cheating method change and learning loss in an intro to CS course

    Binglin Chen, Colleen M Lewis, Matthew West, and Craig Zilles. Plagiarism in the age of generative AI: cheating method change and learning loss in an intro to CS course. InProceedings of the Eleventh ACM Conference on Learning@ Scale, pages 75–85, 2024

  3. [3]

    Neuro-symbolic ai in 2024: A systematic review.arXiv preprint arXiv:2501.05435, 2025

    Brandon C Colelough and William Regli. Neuro-symbolic ai in 2024: A systematic review.arXiv preprint arXiv:2501.05435, 2025

  4. [4]

    Hiding in plain sight: Tweets with hate speech masked by homoglyphs

    Portia Cooper, Mihai Surdeanu, and Eduardo Blanco. Hiding in plain sight: Tweets with hate speech masked by homoglyphs. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 2922–2929, 2023

  5. [5]

    SilverSpeak: Evading AI-generated text detectors using homo- glyphs

    Aldan Creo and Shushanta Pudasaini. SilverSpeak: Evading AI-generated text detectors using homo- glyphs. InProceedings of the 1st Workshop on GenAI Content Detection (GenAIDetect), pages 1–46, 2025

  6. [6]

    Mathematics and its applications.Higher Education

    Kenneth H Rosen Discrete. Mathematics and its applications.Higher Education. 4th edition. McGraw- Hill, 2007

  7. [7]

    Large language models are neurosymbolic reasoners

    Meng Fang, Shilong Deng, Yudi Zhang, Zijing Shi, Ling Chen, Mykola Pechenizkiy, and Jun Wang. Large language models are neurosymbolic reasoners. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17985–17993, 2024

  8. [8]

    The homograph attack.Communications of the ACM, 45(2):128, 2002

    Evgeniy Gabrilovich and Alex Gontmakher. The homograph attack.Communications of the ACM, 45(2):128, 2002

  9. [9]

    Yuren Hao, Xiang Wan, and Chengxiang Zhai. An investigation of robustness of llms in mathematical reasoning: Benchmarking with mathematically-equivalent transformation of advanced mathematical problems.arXiv preprint arXiv:2508.08833, 2025

  10. [10]

    Math-perturb: Benchmarking llms’ math reasoning abilities against hard perturbations

    Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, et al. Math-perturb: Benchmarking llms’ math reasoning abilities against hard perturbations.arXiv preprint arXiv:2502.06453, 2025

  11. [11]

    Fighting unicode-obfuscated spam

    Changwei Liu and Sid Stamm. Fighting unicode-obfuscated spam. InProceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 45–59, 2007

  12. [12]

    Survey on ai- generatedplagiarismdetection: Theimpactoflargelanguagemodelsonacademicintegrity: S.pudasaini et al.Journal of Academic Ethics, 23(3):1137–1170, 2025

    Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, and Marisa Llorens Salvador. Survey on ai- generatedplagiarismdetection: Theimpactoflargelanguagemodelsonacademicintegrity: S.pudasaini et al.Journal of Academic Ethics, 23(3):1137–1170, 2025

  13. [13]

    The lazy student’s dream: Chatgpt passing an engineering course on its own.IFAC-PapersOnLine, 59(7):213–218, 2025

    Gokul Puthumanaillam, Timothy Bretl, and Melkior Ornik. The lazy student’s dream: Chatgpt passing an engineering course on its own.IFAC-PapersOnLine, 59(7):213–218, 2025

  14. [14]

    ImpedingLLM-assistedcheatinginintroductoryprogrammingassignmentsviaadversarialperturbation

    Saiful Salim, Rubin Yang, Alexander Cooper, Suryashree Ray, Saumya Debray, and Sazzadur Rahaman. ImpedingLLM-assistedcheatinginintroductoryprogrammingassignmentsviaadversarialperturbation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 445–463, 2024

  15. [15]

    Llm use, cheating, and academic integrity in software engineering education.arXiv preprint arXiv:2603.17060, 2026

    Ronnie de Souza Santos, Italo Santos, Mariana Bento, Giuseppe Destefanis, Cleyton Magalhães, and Mairieli Wessel. Llm use, cheating, and academic integrity in software engineering education.arXiv preprint arXiv:2603.17060, 2026. 14

  16. [16]

    Visual spoofing in content-based spam detection

    Mark Sokolov, Kehinde Olufowobi, and Nic Herndon. Visual spoofing in content-based spam detection. In13th International Conference on Security of Information and Networks, pages 1–5, 2020

  17. [17]

    Attacking neural text detectors

    Max Wolff and Stuart Wolff. Attacking neural text detectors. InICLR 2020 Workshop on Trustwory Machine Learning, 2020. 15