Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution

Robert Dilworth

arxiv: 2604.10271 · v3 · submitted 2026-04-11 · 💻 cs.CR · cs.CL· cs.IR

Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution

Robert Dilworth This is my paper

Pith reviewed 2026-05-10 15:20 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.IR

keywords homoglyph substitutionstylometryadversarial stylometryprivacyUnicodeauthorship attributionforensic linguistics

0 comments

The pith

Homoglyph substitution degrades stylometric systems by replacing characters with visually similar alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that swapping letters in text for look-alikes from other Unicode scripts can weaken stylometric tools that infer author traits such as age range or country from writing patterns. This matters because social-media posts already allow statistical recovery of personal details comparable to what leaks from ID documents, and simply avoiding text disclosure is impractical. The proposed method keeps the text readable while targeting the character features that stylometry depends on. A sympathetic reader would see this as a lightweight privacy tool for everyday online writing.

Core claim

Performing homoglyph substitution on text degrades stylometric systems, allowing authors to reduce the leakage of personal information such as estimated age and geographic location that these systems can otherwise extract from voluntary text disclosures.

What carries the argument

Homoglyph substitution, defined as the replacement of characters with visually similar alternatives drawn from different Unicode code points (for example, Latin 'h' with Cyrillic 'h'), which targets and disrupts the character-level patterns that stylometric classifiers use.

If this is right

Stylometric authorship attribution and trait inference become measurably less reliable on the altered text.
Individuals can reduce the personal information extractable from their online writing while preserving visual readability.
Adversarial stylometry provides a practical defense against forensic analysis of voluntary text disclosures.
Text can be altered to hinder stylometric recovery of demographic signals such as age group or location.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Stylometric tools may require explicit Unicode normalization steps to remain effective against this class of obfuscation.
An iterative arms race could develop between substitution techniques and improved detection or normalization methods.
The approach might generalize to other character-based privacy protections in digital communication.

Load-bearing premise

Stylometric systems depend on character-level or Unicode-sensitive features that homoglyph substitution will reliably disrupt without being normalized away by standard preprocessing or creating new detectable signals.

What would settle it

An experiment in which stylometric accuracy on the modified text remains statistically unchanged from the original, or in which routine Unicode normalization restores full performance.

Figures

Figures reproduced from arXiv: 2604.10271 by Robert Dilworth.

**Figure 2.** Figure 2: Kagi Translate acts as a conduit for adversarial stylometry, rending au [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: A Taxonomic Overview of the Adversarial Attacks: [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: TraceTarnish: Our stylometric attack script–a gestalt modular framework where each component contributes to a whole that is greater than the sum of its parts; incorporating homoglyph functionality resulted in the following processing pipeline for razing authorship: Translation → Obfuscation → Imitation → Injection [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: An enumeration of the adversarial attacks examined by [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: The sentences to be evaluated, representing 100% Injection for each ex [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: A plot capturing the results of the homoglyph-based Injection-optimality [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: TraceTarnish, in its current state, implements an Injection amalgam, interspersing both homoglyphs and zero-width characters into text to shroud authorship. To demonstrate the efficiency of the Injection component, we rerun Experiment #1, incrementally introducing both homoglyphs and zero-width characters in a stepwise fashion. The following string represents 100% Injection, with the “bad characters” h… view at source ↗

**Figure 9.** Figure 9: Distance measures in stylometry are mathematical methods used to quan [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

read the original abstract

In what way could a data breach involving government-issued IDs such as passports, driver's licenses, etc., rival a random voluntary disclosure on a nondescript social-media platform? At first glance, the former appears more significant, and that is a valid assessment. The disclosed data could contain an individual's date of birth and address; for all intents and purposes, a leak of that data would be disastrous. Given the threat, the latter scenario involving an innocuous online post seems comparatively harmless--or does it? From that post and others like it, a forensic linguist could stylometrically uncover equivalent pieces of information, estimating an age range for the author (adolescent or adult) and narrowing down their geographical location (specific country). While not an exact science--the determinations are statistical--stylometry can reveal comparable, though noticeably diluted, information about an individual. To prevent an ID from being breached, simply sharing it as little as possible suffices. Preventing the leakage of personal information from written text requires a more complex solution: adversarial stylometry. In this paper, we explore how performing homoglyph substitution--the replacement of characters with visually similar alternatives (e.g., "h" $\texttt{[U+0068]}$ $\rightarrow$ "h" $\texttt{[U+04BB]}$)--on text can degrade stylometric systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Homoglyph substitution for stylometry defense is an obvious idea that needs testing to be useful.

read the letter

The main takeaway is that this paper suggests homoglyph substitution as a simple way to degrade stylometric analysis of text for privacy, but it offers no experiments or data to show whether the idea actually works in practice. The core proposal is to replace characters with visually similar ones from other scripts, like swapping a Latin 'h' for a Cyrillic equivalent, to break author-style signals in writing. This is framed around real privacy risks, such as stylometry pulling age or location estimates from social media posts, which the paper contrasts with more obvious ID leaks. That motivation section is clear and direct. The new angle is applying the known homoglyph trick specifically to adversarial stylometry, rather than just security contexts like phishing. It does a reasonable job explaining the substitution process without overcomplicating it. The big gap is the lack of any evaluation. No datasets, no stylometric baselines, no accuracy drops measured before and after substitution. The claim depends on stylometric systems relying on raw Unicode codepoints that don't get normalized or script-detected in preprocessing, which is a shaky assumption. Mixed-script text could easily create its own detectable patterns instead. Without code or results, it's impossible to tell if this holds up or just adds noise that gets filtered. This is aimed at researchers working on text privacy or authorship obfuscation techniques. Someone building quick prototypes might pick up the concept as a starting point, but it won't give them validated methods to use. I would send it for peer review. Referees can flag the missing tests and point to standard stylometry pipelines that might neutralize this, which would help turn the idea into something more solid.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes homoglyphic substitution—replacing Latin characters with visually similar glyphs from other Unicode blocks (e.g., U+0068 'h' to U+04BB 'h')—as a technique for adversarial stylometry to degrade author attribution and profiling performance of stylometric systems.

Significance. If empirically validated, the approach could supply a lightweight, accessible method for textual privacy protection against stylometric inference of attributes such as age or location. It extends prior adversarial stylometry work but currently offers only a descriptive claim without demonstrated effectiveness or robustness.

major comments (2)

Abstract: the central claim that homoglyph substitution degrades stylometric performance is unsupported by any experimental results, datasets, evaluation metrics, or implementation details; the manuscript provides no evidence that the substitution reliably disrupts feature extractors or avoids introducing new detectable signals such as elevated non-Latin script frequencies.
Abstract: the argument assumes stylometric systems operate on raw Unicode codepoints without normalization (NFKC/NFD), script detection, or tokenization that collapses visually identical glyphs; no analysis or test is presented to show the substitution survives these standard preprocessing steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important gaps in empirical support and robustness analysis. We agree that the current manuscript is primarily conceptual and will incorporate experiments, implementation details, and preprocessing evaluations in the revised version to strengthen the claims.

read point-by-point responses

Referee: Abstract: the central claim that homoglyph substitution degrades stylometric performance is unsupported by any experimental results, datasets, evaluation metrics, or implementation details; the manuscript provides no evidence that the substitution reliably disrupts feature extractors or avoids introducing new detectable signals such as elevated non-Latin script frequencies.

Authors: We acknowledge that the present version offers a descriptive proposal without quantitative validation. The manuscript introduces homoglyphic substitution as an adversarial stylometry technique but does not report experiments, datasets, or metrics. In revision, we will add empirical evaluations on standard stylometric corpora (e.g., using author attribution accuracy and attribute inference F1 scores), detail the substitution algorithm and parameters, and explicitly test for introduced signals such as non-Latin character frequency distributions to demonstrate that the method does not create easily detectable artifacts. revision: yes
Referee: Abstract: the argument assumes stylometric systems operate on raw Unicode codepoints without normalization (NFKC/NFD), script detection, or tokenization that collapses visually identical glyphs; no analysis or test is presented to show the substitution survives these standard preprocessing steps.

Authors: This is a fair and substantive critique. The current text does not examine how homoglyph substitution interacts with common text normalization pipelines. We will revise the manuscript to include a dedicated analysis section that evaluates survival rates under NFKC/NFD normalization, script detection heuristics, and various tokenizers (e.g., word-level, subword, and Unicode-aware). Where the substitution is neutralized, we will discuss mitigation strategies or clearly delineate the threat model under which the technique remains effective. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive claim with no derivations or fitted elements

full rationale

The paper presents an exploratory idea that homoglyph substitution can degrade stylometric systems. No equations, parameters, predictions, or derivation chains appear in the provided text. The abstract and description frame the work as an investigation rather than a mathematical result derived from prior self-referential steps. None of the enumerated circularity patterns (self-definitional, fitted-input-as-prediction, self-citation load-bearing, etc.) apply, as there are no load-bearing logical reductions to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are specified or required for the high-level claim.

pith-pipeline@v0.9.0 · 5532 in / 977 out tokens · 39508 ms · 2026-05-10T15:20:38.105681+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

[1]

R Documenta- tionhttps://search.r-project.org/CRAN/refmans/stylo/html/imposters.h tml

Authorship verification classifier known as the imposters method. R Documenta- tionhttps://search.r-project.org/CRAN/refmans/stylo/html/imposters.h tml

work page
[2]

Alvi, F.: Monolingual Plagiarism Detection and Paraphrase Type Identification. Ph.D. thesis, University of Sheffield (8 2020),https://etheses.whiterose.ac.u k/id/eprint/27552/

work page 2020
[3]

669–675 (2017).https://doi.org/10.1007/978-3-319-56608 -5_64,https://eprints.whiterose.ac.uk/id/eprint/112665/1/paper_247v2 .pdf

Alvi, F., Stevenson, M., Clough, P.: Plagiarism Detection in Texts Obfuscated with Homoglyphs, pp. 669–675 (2017).https://doi.org/10.1007/978-3-319-56608 -5_64,https://eprints.whiterose.ac.uk/id/eprint/112665/1/paper_247v2 .pdf

work page doi:10.1007/978-3-319-56608 2017
[4]

Amodei, D.: Statement from dario amodei on our discussions with the department of war (2 2026),https://www.anthropic.com/news/statement-department-o f-war

work page 2026
[5]

Ayuso, J.W.: Can a comma solve a crime? The DialIssue 22: Language(11 2024),https://www.thedial.world/articles/news/issue-22/forensic-lingu ists-solve-crimes

work page 2024
[6]

Bhalerao, R., Al-Rubaie, M., Bhaskar, A., Markov, I.: Data-driven mitigation of adversarial text perturbation (2 2022),https://arxiv.org/abs/2202.09483

work page arXiv 2022
[7]

Creo, A., Pudasaini, S.: Silverspeak: Evading ai-generated text detectors using homoglyphs (1 2025),https://arxiv.org/abs/2406.11239,https://github.c om/ACMCMC/silverspeak

work page arXiv 2025
[8]

Dans, E.: Stylometry and the right to anonymity (8 2013),https://medium.com /enrique-dans/stylometry-and-the-right-to-anonymity-a084556770eb

work page 2013
[9]

Dilworth, R.: Tuning for tracetarnish: Techniques, trends, and testing tangible traits (12 2025),https://arxiv.org/abs/2512.03465

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Dilworth, R.: Unveiling unicode’s unseen underpinnings in undermining authorship attribution (10 2025),https://arxiv.org/abs/2508.15840

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Dilworth, R.: Stegostylo: Squelching stylometric scrutiny through steganographic stitching (1 2026),https://arxiv.org/abs/2601.09056

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Dugan, L., Hwang, A., Trhlik, F., Ludan, J.M., Zhu, A., Xu, H., Ippolito, D., Callison-Burch, C.: Raid: A shared benchmark for robust evaluation of machine- generated text detectors (6 2024),https://arxiv.org/abs/2405.07940

work page arXiv 2024
[13]

Dunbar, M.: Tennessee grandmother jailed after ai facial recognition error links her to fraud (3 2026),https://www.theguardian.com/us-news/2026/mar/12/te nnessee-grandmother-ai-fraud Doppelg¨ anger Injection 29

work page 2026
[14]

Master’s thesis, University of Twente (7 2018),http s://essay.utwente.nl/fileshare/file/75908/Ekambaranathan_MA_EEMCS.pdf

Ekambaranathan, A., Peter, A., Meiklejohn, S.: Using Stylometry to Track Cyber- criminals in Darknet Forums. Master’s thesis, University of Twente (7 2018),http s://essay.utwente.nl/fileshare/file/75908/Ekambaranathan_MA_EEMCS.pdf

work page 2018
[15]

In: Rahimi, A., Lane, W., Zuccon, G

Gagiano, R., Kim, M.M.H., Zhang, X., Biggs, J.: Robustness analysis of grover for machine-generated news detection. In: Rahimi, A., Lane, W., Zuccon, G. (eds.) Proceedings of the 19th Annual Workshop of the Australasian Language Technol- ogy Association. pp. 119–127. Australasian Language Technology Association (12 2021),https://aclanthology.org/2021.alta-1.12/

work page 2021
[16]

Guariglia, M.: The anthropic-dod conflict: Privacy protections shouldn’t depend on the decisions of a few powerful people (3 2026),https://www.eff.org/deepli nks/2026/03/anthropic-dod-conflict-privacy-protections-shouldnt-depen d-decisions-few-powerful

work page 2026
[17]

ACM SIGKDD Explorations Newsletter26, 21–43 (1 2025).https://doi.org/10.1145/3715073.3715076,https://dl.acm.org/d oi/10.1145/3715073.3715076

Huang, B., Chen, C., Shu, K.: Authorship attribution in the era of llms: Problems, methodologies, and challenges. ACM SIGKDD Explorations Newsletter26, 21–43 (1 2025).https://doi.org/10.1145/3715073.3715076,https://dl.acm.org/d oi/10.1145/3715073.3715076

work page doi:10.1145/3715073.3715076 2025
[18]

Keswani, Y., Trivedi, H., Mehta, P., Majumder, P.: Author masking through trans- lation (1 2016),https://ceur-ws.org/Vol-1609/16090890.pdf

work page 2016
[19]

Lermen, S., Paleka, D., Swanson, J., Aerni, M., Carlini, N., Tram` er, F.: Large-scale online deanonymization with llms (2 2026),https://arxiv.org/abs/2602.16800

work page arXiv 2026
[20]

Macko, D., Moro, R., Uchendu, A., Srba, I., Lucas, J.S., Yamashita, M., Tripto, N.I., Lee, D., Simko, J., Bielikova, M.: Authorship obfuscation in multilingual machine-generated text detection (10 2024).https://doi.org/10.18653/v1/20 24.findings-emnlp.369,https://arxiv.org/abs/2401.07867

work page doi:10.18653/v1/20 2024
[21]

Makari, I.: Glassworm is back: A new wave of invisible unicode attacks hits hun- dreds of repositories (3 2026),https://www.aikido.dev/blog/glassworm-retur ns-unicode-attack-github-npm-vscode

work page 2026
[22]

Mosquera, A.: Alejandro mosquera at politices 2022: Towards robust spanish au- thor profiling and lessons learned from adversarial attacks. In: y G´ omez, M.M., Gonzalo, J., Rangel, F., Casavantes, M.,´Angel ´Alvarez Carmona, M., Bel-Enguix, G., Escalante, H.J., Freitas, L., Miranda-Escalada, A., Rodr´ ıguez-S´ anchez, F., Ros´ a, A., Sobrevilla-Cabezudo,...

work page 2022
[23]

Padfield, J.: Are we living in 1984, brave new world, or fahrenheit 451? (3 2026), https://www.youtube.com/watch?v=w-bMvIgofIc

work page 1984
[24]

Paz, R.: Poisoned typeface: How simple font rendering poisons every ai assistant, and only microsoft cares (3 2026),https://layerxsecurity.com/blog/poisoned -typeface-a-simple-font-rendering-poisons-every-ai-assistant-and-onl y-microsoft-cares/

work page 2026
[25]

Rumpf, A.: Slight misspeller (2021),https://adam-rumpf.github.io/programs/ slight_misspeller.html,https://github.com/adam-rumpf/slight-misspelle r

work page 2021
[26]

Journal of the American Society for Information Science and Technology60, 538–556 (3 2009)

Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology60, 538–556 (3 2009). https://doi.org/10.1002/asi.21001,https://onlinelibrary.wiley.com/do i/10.1002/asi.21001 30 Robert Dilworth

work page doi:10.1002/asi.21001 2009
[27]

Proceedings of the International Conference for Young Researchers in Informatics, Mathematics and Engineering1852, 1–7 (4 2017),https://www.lituanistika.lt/content/77652

Stanik¯ unas, D., Mandravickait˙ e, J., Krilaviˇ cius, T.: Comparison of distance and similarity measures for stylometric analysis of lithuanian texts. Proceedings of the International Conference for Young Researchers in Informatics, Mathematics and Engineering1852, 1–7 (4 2017),https://www.lituanistika.lt/content/77652

work page 2017
[28]

Stropkay, H.F., Chen, J., Latifi, M.J., Rockmore, D.N., Manning, J.R.: A stylo- metric application of large language models (10 2025),https://arxiv.org/abs/ 2510.21958

work page arXiv 2025
[29]

Sundar, M.: How to hide secrets in strings—modern text hiding in javascript (5 2020),https://blog.bitsrc.io/how-to-hide-secrets-in-strings-modern-t ext-hiding-in-javascript-613a9faa5787,https://github.com/KuroLabs/st egcloak

work page 2020
[30]

Teja, L.D.M.S.S., Krishna, N.S.G., Khan, U., Khan, M.H., Mishra, A.: Damasha: Detecting ai in mixed adversarial texts via segmentation with human-interpretable attribution (1 2026),https://arxiv.org/abs/2512.04838

work page arXiv 2026
[31]

SIGKDD Explor

Uchendu,A.,Le,T.,Lee,D.:Attributionandobfuscationofneuraltextauthorship: A data mining perspective. ACM SIGKDD Explorations Newsletter25, 1–18 (6 2023).https://doi.org/10.1145/3606274.3606276,https://dl.acm.org/doi /10.1145/3606274.3606276

work page doi:10.1145/3606274.3606276 2023
[32]

Wang, Y., Feng, S., Hou, A.B., Pu, X., Shen, C., Liu, X., Tsvetkov, Y., He, T.: Stumblingblocks:Stresstestingtherobustnessofmachine-generatedtextdetectors under attacks (2 2024),https://arxiv.org/abs/2402.11638

work page arXiv 2024
[33]

Wolff, M., Wolff, S.: Attacking neural text detectors (1 2022),https://arxiv.or g/abs/2002.11768

work page arXiv 2022
[34]

Zhang, Y., Wang, X., Liu, J., Wang, W., Ma, Z., Jia, X.: Style attack disguise: When fonts become a camouflage for adversarial intent (10 2025),https://arxi v.org/abs/2510.19641

work page arXiv 2025
[35]

Zhao, P., Zhu, W., Jiao, P., Gao, D., Wu, O.: Data poisoning in deep learning: A survey (3 2025),https://arxiv.org/abs/2503.22759

work page arXiv 2025

[1] [1]

R Documenta- tionhttps://search.r-project.org/CRAN/refmans/stylo/html/imposters.h tml

Authorship verification classifier known as the imposters method. R Documenta- tionhttps://search.r-project.org/CRAN/refmans/stylo/html/imposters.h tml

work page

[2] [2]

Alvi, F.: Monolingual Plagiarism Detection and Paraphrase Type Identification. Ph.D. thesis, University of Sheffield (8 2020),https://etheses.whiterose.ac.u k/id/eprint/27552/

work page 2020

[3] [3]

669–675 (2017).https://doi.org/10.1007/978-3-319-56608 -5_64,https://eprints.whiterose.ac.uk/id/eprint/112665/1/paper_247v2 .pdf

Alvi, F., Stevenson, M., Clough, P.: Plagiarism Detection in Texts Obfuscated with Homoglyphs, pp. 669–675 (2017).https://doi.org/10.1007/978-3-319-56608 -5_64,https://eprints.whiterose.ac.uk/id/eprint/112665/1/paper_247v2 .pdf

work page doi:10.1007/978-3-319-56608 2017

[4] [4]

Amodei, D.: Statement from dario amodei on our discussions with the department of war (2 2026),https://www.anthropic.com/news/statement-department-o f-war

work page 2026

[5] [5]

Ayuso, J.W.: Can a comma solve a crime? The DialIssue 22: Language(11 2024),https://www.thedial.world/articles/news/issue-22/forensic-lingu ists-solve-crimes

work page 2024

[6] [6]

Bhalerao, R., Al-Rubaie, M., Bhaskar, A., Markov, I.: Data-driven mitigation of adversarial text perturbation (2 2022),https://arxiv.org/abs/2202.09483

work page arXiv 2022

[7] [7]

Creo, A., Pudasaini, S.: Silverspeak: Evading ai-generated text detectors using homoglyphs (1 2025),https://arxiv.org/abs/2406.11239,https://github.c om/ACMCMC/silverspeak

work page arXiv 2025

[8] [8]

Dans, E.: Stylometry and the right to anonymity (8 2013),https://medium.com /enrique-dans/stylometry-and-the-right-to-anonymity-a084556770eb

work page 2013

[9] [9]

Dilworth, R.: Tuning for tracetarnish: Techniques, trends, and testing tangible traits (12 2025),https://arxiv.org/abs/2512.03465

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Dilworth, R.: Unveiling unicode’s unseen underpinnings in undermining authorship attribution (10 2025),https://arxiv.org/abs/2508.15840

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Dilworth, R.: Stegostylo: Squelching stylometric scrutiny through steganographic stitching (1 2026),https://arxiv.org/abs/2601.09056

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

Dugan, L., Hwang, A., Trhlik, F., Ludan, J.M., Zhu, A., Xu, H., Ippolito, D., Callison-Burch, C.: Raid: A shared benchmark for robust evaluation of machine- generated text detectors (6 2024),https://arxiv.org/abs/2405.07940

work page arXiv 2024

[13] [13]

Dunbar, M.: Tennessee grandmother jailed after ai facial recognition error links her to fraud (3 2026),https://www.theguardian.com/us-news/2026/mar/12/te nnessee-grandmother-ai-fraud Doppelg¨ anger Injection 29

work page 2026

[14] [14]

Master’s thesis, University of Twente (7 2018),http s://essay.utwente.nl/fileshare/file/75908/Ekambaranathan_MA_EEMCS.pdf

Ekambaranathan, A., Peter, A., Meiklejohn, S.: Using Stylometry to Track Cyber- criminals in Darknet Forums. Master’s thesis, University of Twente (7 2018),http s://essay.utwente.nl/fileshare/file/75908/Ekambaranathan_MA_EEMCS.pdf

work page 2018

[15] [15]

In: Rahimi, A., Lane, W., Zuccon, G

Gagiano, R., Kim, M.M.H., Zhang, X., Biggs, J.: Robustness analysis of grover for machine-generated news detection. In: Rahimi, A., Lane, W., Zuccon, G. (eds.) Proceedings of the 19th Annual Workshop of the Australasian Language Technol- ogy Association. pp. 119–127. Australasian Language Technology Association (12 2021),https://aclanthology.org/2021.alta-1.12/

work page 2021

[16] [16]

Guariglia, M.: The anthropic-dod conflict: Privacy protections shouldn’t depend on the decisions of a few powerful people (3 2026),https://www.eff.org/deepli nks/2026/03/anthropic-dod-conflict-privacy-protections-shouldnt-depen d-decisions-few-powerful

work page 2026

[17] [17]

ACM SIGKDD Explorations Newsletter26, 21–43 (1 2025).https://doi.org/10.1145/3715073.3715076,https://dl.acm.org/d oi/10.1145/3715073.3715076

Huang, B., Chen, C., Shu, K.: Authorship attribution in the era of llms: Problems, methodologies, and challenges. ACM SIGKDD Explorations Newsletter26, 21–43 (1 2025).https://doi.org/10.1145/3715073.3715076,https://dl.acm.org/d oi/10.1145/3715073.3715076

work page doi:10.1145/3715073.3715076 2025

[18] [18]

Keswani, Y., Trivedi, H., Mehta, P., Majumder, P.: Author masking through trans- lation (1 2016),https://ceur-ws.org/Vol-1609/16090890.pdf

work page 2016

[19] [19]

Lermen, S., Paleka, D., Swanson, J., Aerni, M., Carlini, N., Tram` er, F.: Large-scale online deanonymization with llms (2 2026),https://arxiv.org/abs/2602.16800

work page arXiv 2026

[20] [20]

Macko, D., Moro, R., Uchendu, A., Srba, I., Lucas, J.S., Yamashita, M., Tripto, N.I., Lee, D., Simko, J., Bielikova, M.: Authorship obfuscation in multilingual machine-generated text detection (10 2024).https://doi.org/10.18653/v1/20 24.findings-emnlp.369,https://arxiv.org/abs/2401.07867

work page doi:10.18653/v1/20 2024

[21] [21]

Makari, I.: Glassworm is back: A new wave of invisible unicode attacks hits hun- dreds of repositories (3 2026),https://www.aikido.dev/blog/glassworm-retur ns-unicode-attack-github-npm-vscode

work page 2026

[22] [22]

Mosquera, A.: Alejandro mosquera at politices 2022: Towards robust spanish au- thor profiling and lessons learned from adversarial attacks. In: y G´ omez, M.M., Gonzalo, J., Rangel, F., Casavantes, M.,´Angel ´Alvarez Carmona, M., Bel-Enguix, G., Escalante, H.J., Freitas, L., Miranda-Escalada, A., Rodr´ ıguez-S´ anchez, F., Ros´ a, A., Sobrevilla-Cabezudo,...

work page 2022

[23] [23]

Padfield, J.: Are we living in 1984, brave new world, or fahrenheit 451? (3 2026), https://www.youtube.com/watch?v=w-bMvIgofIc

work page 1984

[24] [24]

Paz, R.: Poisoned typeface: How simple font rendering poisons every ai assistant, and only microsoft cares (3 2026),https://layerxsecurity.com/blog/poisoned -typeface-a-simple-font-rendering-poisons-every-ai-assistant-and-onl y-microsoft-cares/

work page 2026

[25] [25]

Rumpf, A.: Slight misspeller (2021),https://adam-rumpf.github.io/programs/ slight_misspeller.html,https://github.com/adam-rumpf/slight-misspelle r

work page 2021

[26] [26]

Journal of the American Society for Information Science and Technology60, 538–556 (3 2009)

Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology60, 538–556 (3 2009). https://doi.org/10.1002/asi.21001,https://onlinelibrary.wiley.com/do i/10.1002/asi.21001 30 Robert Dilworth

work page doi:10.1002/asi.21001 2009

[27] [27]

Proceedings of the International Conference for Young Researchers in Informatics, Mathematics and Engineering1852, 1–7 (4 2017),https://www.lituanistika.lt/content/77652

Stanik¯ unas, D., Mandravickait˙ e, J., Krilaviˇ cius, T.: Comparison of distance and similarity measures for stylometric analysis of lithuanian texts. Proceedings of the International Conference for Young Researchers in Informatics, Mathematics and Engineering1852, 1–7 (4 2017),https://www.lituanistika.lt/content/77652

work page 2017

[28] [28]

Stropkay, H.F., Chen, J., Latifi, M.J., Rockmore, D.N., Manning, J.R.: A stylo- metric application of large language models (10 2025),https://arxiv.org/abs/ 2510.21958

work page arXiv 2025

[29] [29]

Sundar, M.: How to hide secrets in strings—modern text hiding in javascript (5 2020),https://blog.bitsrc.io/how-to-hide-secrets-in-strings-modern-t ext-hiding-in-javascript-613a9faa5787,https://github.com/KuroLabs/st egcloak

work page 2020

[30] [30]

Teja, L.D.M.S.S., Krishna, N.S.G., Khan, U., Khan, M.H., Mishra, A.: Damasha: Detecting ai in mixed adversarial texts via segmentation with human-interpretable attribution (1 2026),https://arxiv.org/abs/2512.04838

work page arXiv 2026

[31] [31]

SIGKDD Explor

Uchendu,A.,Le,T.,Lee,D.:Attributionandobfuscationofneuraltextauthorship: A data mining perspective. ACM SIGKDD Explorations Newsletter25, 1–18 (6 2023).https://doi.org/10.1145/3606274.3606276,https://dl.acm.org/doi /10.1145/3606274.3606276

work page doi:10.1145/3606274.3606276 2023

[32] [32]

Wang, Y., Feng, S., Hou, A.B., Pu, X., Shen, C., Liu, X., Tsvetkov, Y., He, T.: Stumblingblocks:Stresstestingtherobustnessofmachine-generatedtextdetectors under attacks (2 2024),https://arxiv.org/abs/2402.11638

work page arXiv 2024

[33] [33]

Wolff, M., Wolff, S.: Attacking neural text detectors (1 2022),https://arxiv.or g/abs/2002.11768

work page arXiv 2022

[34] [34]

Zhang, Y., Wang, X., Liu, J., Wang, W., Ma, Z., Jia, X.: Style attack disguise: When fonts become a camouflage for adversarial intent (10 2025),https://arxi v.org/abs/2510.19641

work page arXiv 2025

[35] [35]

Zhao, P., Zhu, W., Jiao, P., Gao, D., Wu, O.: Data poisoning in deep learning: A survey (3 2025),https://arxiv.org/abs/2503.22759

work page arXiv 2025