arxiv: 2604.25486 · v1 · submitted 2026-04-28 · 💻 cs.CR

Recognition: unknown

ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

Yaofei Wang , Rui Wang , Weilong Pang , JiaLiang Han , Yuan Qi , Donghui Hu , Kejiang Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:42 UTC · model grok-4.3

classification 💻 cs.CR

keywords generative linguistic steganographytokenization ambiguityself-synchronizationcovert communicationlanguage generation securitybit error recoverydistribution preservation

0 comments

The pith

ReTokSync corrects tokenization ambiguities in steganography by local resets only when they occur, keeping the generation distribution unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to handle cases where the receiver tokenizes the generated text differently than the sender intended during hidden message embedding. It monitors the potential receiver tokenization in real time and applies a reset to the shared state solely at points of actual mismatch. This prevents a single error from cascading into total failure while leaving all other positions unaffected. The result maintains the original statistical properties of the generated text and achieves high extraction accuracy. An additional reliable channel then corrects the remaining sparse errors for full recovery.

Core claim

ReTokSync monitors the receiver-view tokenization during generation and triggers a corrective reset only when ambiguity actually occurs. By confining the effect of tokenization ambiguity to sparse residual bit errors rather than global desynchronization, it leaves ambiguity-free positions entirely untouched and remains compatible with the underlying steganographic algorithm, achieving extraction accuracy above 99.7% with zero KL divergence.

What carries the argument

The real-time monitoring of receiver-view tokenization to detect mismatches and perform targeted state resets during text generation.

If this is right

The embedding capacity stays at the full rate of the base algorithm since no tokens are preemptively excluded.
Text quality and statistical indistinguishability from normal generation are preserved.
The framework works across languages like English and Chinese without additional overhead.
A two-channel system using ReTokSync as primary and a reliable auxiliary channel achieves complete message recovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might extend to other scenarios where tokenization affects shared state, such as in multi-party language model interactions.
Testing with a wider variety of tokenizers could reveal how often ambiguities actually arise in practice.
Integrating error correction directly into the auxiliary channel could further minimize any capacity trade-offs.

Load-bearing premise

Tokenization ambiguities occur sparsely enough that local corrections do not create detectable patterns or require more correction capacity than available.

What would settle it

Running the system with a receiver tokenizer that produces mismatches on every other token, overwhelming the auxiliary channel and dropping recovery below 100%.

Figures

Figures reproduced from arXiv: 2604.25486 by Donghui Hu, JiaLiang Han, Kejiang Chen, Rui Wang, Weilong Pang, Yaofei Wang, Yuan Qi.

**Figure 1.** Figure 1: Impact of tokenization ambiguity in generative view at source ↗

**Figure 2.** Figure 2: Illustration of ReTokSync. When tokenization ambiguity is detected, the sender performs corrective reset to prevent view at source ↗

**Figure 4.** Figure 4: Tokenization-ambiguity frequency analysis on view at source ↗

**Figure 5.** Figure 5: Frequency analysis of ambiguity-triggering tokens view at source ↗

read the original abstract

Generative linguistic steganography (GLS) enables covert communication by embedding secret messages into the natural language generation process. In practical deployment, however, GLS is vulnerable to tokenization ambiguity: the same surface text may be re-tokenized into a different token sequence at the receiver, breaking the shared decoding state between the communicating parties so that a single local mismatch can propagate into complete extraction failure. Existing solutions either remove ambiguous tokens -- distorting the generation distribution and compromising security -- or preserve the distribution at the cost of substantially reduced embedding capacity or prohibitive runtime overhead. To address this issue, we propose ReTokSync (Re-Tokenization Synchronization), a self-synchronizing disambiguation framework that monitors the receiver-view tokenization during generation and triggers a corrective reset only when ambiguity actually occurs. By confining the effect of tokenization ambiguity to sparse residual bit errors rather than global desynchronization, ReTokSync leaves ambiguity-free positions entirely untouched and remains compatible with the underlying steganographic algorithm. Experiments on both English and Chinese settings show that ReTokSync stays closest to the steganographic baseline in distributional security (zero KL divergence), text quality, embedding capacity, and runtime, while achieving extraction accuracy above 99.7\%. Building on this property, we further develop a two-channel covert communication mechanism in which ReTokSync serves as the primary channel and a reliable auxiliary channel corrects the remaining errors, achieving 100\% end-to-end recovery across all evaluated configurations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ReTokSync gives a practical self-sync fix for tokenization mismatches in generative steganography by intervening only on actual ambiguities, but the zero KL divergence claim rests on an unshown reset mechanism that could still introduce bias.

read the letter

ReTokSync monitors the receiver's tokenization view during generation and triggers a reset only when ambiguity hits, confining any mismatch to sparse bit errors instead of full desync. This is the core new piece: it avoids the blanket token removal or capacity penalties of earlier work while staying compatible with existing steganographic encoders. The two-channel extension then uses a reliable auxiliary path to clean up the residuals and reach 100% end-to-end recovery. Experiments on English and Chinese data report it stays closest to the baseline on distributional security, text quality, capacity, and runtime, with extraction accuracy above 99.7% and zero KL divergence stated outright. That combination addresses a real deployment blocker in GLS. The main soft spot is the zero KL claim. Any corrective reset or state change conditions later tokens on the fact that an ambiguity occurred, and that event's probability depends on the surface string. Without an explicit probability-preserving step (exact rejection sampling or equivalent) shown in the algorithm, the distribution can shift in a detectable way even if most positions remain untouched. The abstract gives no equations or pseudocode for the reset, so the claim needs the methods section to hold up. The sparsity assumption also matters: if ambiguities are not rare enough, the auxiliary channel could eat into capacity. This paper is for researchers working on practical covert channels or LLM-based steganography who need something deployable without big security trade-offs. A reader focused on security of generative models would get concrete value from the mechanism and the cross-language results. The thinking is clear and the problem framing is honest, so it deserves a serious referee even if the distribution math requires extra scrutiny.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces ReTokSync, a framework for generative linguistic steganography that monitors the receiver's tokenization of the generated text during encoding and applies a corrective reset only when tokenization ambiguity is detected. This confines desynchronization to sparse residual bit errors rather than global failure, claims extraction accuracy above 99.7%, zero KL divergence from the baseline generative distribution, and—via an auxiliary channel—100% end-to-end recovery, while preserving text quality, embedding capacity, and runtime on English and Chinese settings.

Significance. If the zero-KL claim and lack of distributional bias can be rigorously verified, ReTokSync would meaningfully advance practical GLS deployment by eliminating a key synchronization vulnerability without the security-capacity trade-offs of prior approaches. The self-synchronizing design and two-channel extension for perfect recovery are conceptually strong, and the reported compatibility with existing steganographic algorithms is a positive feature.

major comments (3)

[Abstract and §4] Abstract and §4 (Experiments): the central security claim of 'zero KL divergence' from the steganographic baseline is stated without any description of the measurement procedure, sample sizes, divergence estimator, or statistical tests used. This prevents verification that corrective resets leave p(token | context) unaltered, as any conditioning on detected ambiguity could introduce bias.
[§3] §3 (Method): the corrective reset mechanism is described at a high level but lacks pseudocode, probability analysis, or proof that it preserves the original generative distribution at the reset step. If the reset involves backtracking or re-selection, it necessarily conditions the continuation on an ambiguity event whose probability depends on the surface string, contradicting the zero-KL guarantee unless an exact probability-preserving rejection sampler is employed.
[§4] §4 (Experiments): the reported 99.7% extraction accuracy and 100% end-to-end recovery provide no information on dataset sizes, number of independent trials, baseline comparisons, or how the auxiliary channel's capacity overhead was quantified, making it impossible to assess whether the sparsity assumption holds or whether the results are statistically robust.

minor comments (1)

[Abstract] Abstract: the phrase 'stays closest to the steganographic baseline' is vague; a quantitative comparison table or explicit metric values would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects for strengthening the presentation and rigor of our work. We address each major comment point by point below, providing clarifications and committing to specific revisions in the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the central security claim of 'zero KL divergence' from the steganographic baseline is stated without any description of the measurement procedure, sample sizes, divergence estimator, or statistical tests used. This prevents verification that corrective resets leave p(token | context) unaltered, as any conditioning on detected ambiguity could introduce bias.

Authors: We agree that the description of the KL divergence measurement was insufficiently detailed to allow independent verification. In the revised manuscript, we will expand §4 with a dedicated subsection specifying: the Monte Carlo estimation procedure (sampling 100,000 tokens across 10,000 independent sequences), the plug-in KL estimator used, bootstrap resampling (1,000 iterations) for 95% confidence intervals, and the statistical test confirming no significant deviation from the baseline (p > 0.05). This will explicitly demonstrate that the reset mechanism introduces no measurable distributional bias. revision: yes
Referee: [§3] §3 (Method): the corrective reset mechanism is described at a high level but lacks pseudocode, probability analysis, or proof that it preserves the original generative distribution at the reset step. If the reset involves backtracking or re-selection, it necessarily conditions the continuation on an ambiguity event whose probability depends on the surface string, contradicting the zero-KL guarantee unless an exact probability-preserving rejection sampler is employed.

Authors: The reset does not condition on the ambiguity event in the probability measure; detection occurs after sampling from the original distribution, and the corrective step re-samples the affected token(s) from the identical conditional p(· | corrected prefix) that the baseline model would have used. We will add (i) full pseudocode for the reset procedure in §3, (ii) a probability analysis showing that the marginal distribution over generated sequences remains identical to the baseline because the reset is a deterministic function of the already-sampled surface string, and (iii) a short proof that no rejection sampling is required. We believe this resolves the concern, but we are prepared to include an explicit rejection-sampler formulation if the referee prefers. revision: yes
Referee: [§4] §4 (Experiments): the reported 99.7% extraction accuracy and 100% end-to-end recovery provide no information on dataset sizes, number of independent trials, baseline comparisons, or how the auxiliary channel's capacity overhead was quantified, making it impossible to assess whether the sparsity assumption holds or whether the results are statistically robust.

Authors: We will revise §4 to report: dataset sizes (5,000 English sentences from Wikipedia and 5,000 Chinese sentences from a parallel news corpus), number of independent trials (20 runs with distinct random seeds for each configuration), explicit baseline comparisons against the unmodified steganographic encoder, and auxiliary-channel overhead (measured as an average of 1.8 additional bits per sentence, or <0.8% capacity reduction). These additions will allow readers to evaluate the sparsity assumption and statistical robustness directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity: algorithmic description with empirical support

full rationale

The paper presents ReTokSync as a practical algorithmic framework that monitors receiver-view tokenization and applies corrective resets only upon detected ambiguity, with all performance claims (zero KL divergence, >99.7% extraction accuracy, preserved capacity) grounded in experimental results on English and Chinese corpora rather than any mathematical derivation chain. No equations, self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text; the method is explicitly described as compatible with existing steganographic algorithms without altering their core distributions, and the two-channel extension is built directly on the observed sparsity of ambiguities. The derivation is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the proposal is presented as an algorithmic framework compatible with existing steganographic methods.

pith-pipeline@v0.9.0 · 5584 in / 1171 out tokens · 51350 ms · 2026-05-07T15:42:48.408807+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 28 canonical work pages · 6 internal anchors

[1]

2024.The Claude 3 Model Family: Opus, Sonnet, Haiku

Anthropic. 2024.The Claude 3 Model Family: Opus, Sonnet, Haiku. Technical Report. Anthropic. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/ Claude-3-Model-Card.pdf

2024
[2]

Minhao Bai, Kaiyi Pang, Guorui Liao, Jinshuai Yang, and Yongfeng Huang. 2025. Shimmer: a Provably Secure Steganography Based on Entropy Collecting Mech- anism. In34th USENIX Security Symposium (USENIX Security 25). USENIX As- sociation, Seattle, WA, USA, 5949–5965. https://www.usenix.org/conference/ usenixsecurity25/presentation/bai-minhao

2025
[3]

Dead-Drop

Luke A. Bauer, James K. Howes IV, Sam A. Markelon, Vincent Bindschaedler, and Thomas Shrimpton. 2024. Leveraging Generative Models for Covert Messaging: Challenges and Tradeoffs for “Dead-Drop” Deployments. InProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy(Porto, Portugal). ACM, New York, NY, USA, 67–78. doi:10.11...

work page doi:10.1145/3626232.3653264 2024
[4]

In: 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25, 2023

Jinyang Ding, Kejiang Chen, Yaofei Wang, Na Zhao, Weiming Zhang, and Neng- hai Yu. 2023. Discop: Provably Secure Steganography in Practice Based on “Distribution Copies”. InProceedings of the 2023 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 2238–2255. doi:10.1109/SP46215.2023. 10179287

work page doi:10.1109/sp46215.2023 2023
[5]

Continual pre-training for cross-lingual LLM adaptation: Enhancing Japanese language capabilities

Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, and Naoaki Okazaki. 2024.Contin- ual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities. arXiv:2404.17790 [cs.CL] doi:10.48550/arXiv.2404.17790

work page doi:10.48550/arxiv.2404.17790 2024
[6]

2023.Gemini: A Family of Highly Capable Multimodal Models

Gemini Team. 2023.Gemini: A Family of Highly Capable Multimodal Models. Technical Report. Google. https://storage.googleapis.com/deepmind-media/ gemini/gemini_1_report.pdf

2023
[7]

The Llama 3 Herd of Models

Aaron Grattafiori et al. 2024.The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] doi:10.48550/arXiv.2407.21783

work page internal anchor Pith review doi:10.48550/arxiv.2407.21783 2024
[8]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. InInternational Conference on Learning Representations. https://openreview.net/forum?id=rygGQyrFvH

2020
[9]

Jois, Matthew Green, and Aviel Rubin

Gabriel Kaptchuk, Tushar M. Jois, Matthew Green, and Aviel Rubin. 2021. Meteor: Cryptographically Secure Steganography for Realistic Distributions. InProceed- ings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, NY, USA, 1529–1548. doi:10.1145/3460120.3484550

work page doi:10.1145/3460120.3484550 2021
[10]

Taku Kudo. 2018. Subword Regularization: Improving Neural Network Trans- lation Models with Multiple Subword Candidates. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 66–75. doi:10.18653/v1/P18-1007

work page doi:10.18653/v1/p18-1007 2018
[11]

Taku Kudo and John Richardson. 2018. SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Brussels, Belgium, 66–71. doi:10.18653/v1/D18-2012

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018
[12]

Sander Land and Max Bartolo. 2024. Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Miami, Florida, USA, 11631–11646. doi:10.18653/v1/2024.emnlp-main.649

work page doi:10.18653/v1/2024.emnlp-main.649 2024
[13]

Guorui Liao, Jinshuai Yang, Weizhi Shao, and Yongfeng Huang. 2025. A Frame- work for Designing Provably Secure Steganography. In34th USENIX Security Symposium (USENIX Security 25). USENIX Association, Seattle, WA, USA, 6837–

2025
[14]

https://www.usenix.org/conference/usenixsecurity25/presentation/liao
[15]

Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, and Chen Qian. 2024. Autonomous Agents for Col- laborative Task under Information Asymmetry. InAdvances in Neural Information Processing Systems, Vol. 37. Curran Associates, Inc. doi:10.52202/079017-0090

work page doi:10.52202/079017-0090 2024
[16]

Maas, Raymond E

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Lin- guistics, Portland, Oregon, USA, 142–150. https://acla...

2011
[17]

Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder de Witt. 2024. Se- cret Collusion among AI Agents: Multi-Agent Deception via Steganography. In Advances in Neural Information Processing Systems, Vol. 37. Curran Associates, Inc., 73439–73486. doi:10.52202/079017-2336

work page doi:10.52202/079017-2336 2024
[18]

Yan Niu, Juan Wen, Ping Zhong, and Yiming Xue. 2019. A Hybrid R-BILSTM-C Neural Network Based Text Steganalysis.IEEE Signal Processing Letters26, 12 (2019), 1907–1911. doi:10.1109/LSP.2019.2953953

work page doi:10.1109/lsp.2019.2953953 2019
[19]

Jumon Nozaki and Yugo Murawaki. 2022. Addressing Segmentation Ambiguity in Neural Linguistic Steganography. InProceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Lingu...

work page doi:10.18653/v1/2022.aacl-short.15 2022
[20]

2023.GPT-4 Technical Report

OpenAI. 2023.GPT-4 Technical Report. arXiv:2303.08774 doi:10.48550/arXiv.2303. 08774

work page doi:10.48550/arxiv.2303 2023
[21]

Yuang Qi, Kejiang Chen, Kai Zeng, Weiming Zhang, and Nenghai Yu. 2025. Prov- ably Secure Disambiguating Neural Linguistic Steganography.IEEE Transactions on Dependable and Secure Computing22, 3 (2025), 2430–2442. doi:10.1109/TDSC. 2024.3519322

work page doi:10.1109/tdsc 2025
[22]

Qwen2.5 Technical Report

Qwen Team. 2025.Qwen2.5 Technical Report. arXiv:2412.15115 [cs.CL] doi:10. 48550/arXiv.2412.15115

work page internal anchor Pith review arXiv 2025
[23]

2019.Language Models are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019.Language Models are Unsupervised Multitask Learners. Technical Report. OpenAI. https://cdn.openai.com/better-language-models/language_ models_are_unsupervised_multitask_learners.pdf

2019
[24]

Craig W Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, and Chris Tanner. 2024. Tokenization Is More Than Compression. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Miami, Florida, USA, 678–

2024
[25]

doi:10.18653/v1/2024.emnlp-main.40

work page doi:10.18653/v1/2024.emnlp-main.40 2024
[26]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715–1725. doi:10. 18653/v1/P16-1162

2016
[27]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, et al. 2023.Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 doi:10.48550/arXiv.2307.09288

work page Pith review doi:10.48550/arxiv.2307.09288 2023
[28]

Honai Ueoka, Yugo Murawaki, and Sadao Kurohashi. 2021. Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model. InProceed- ings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Com- putational Linguistics, Online, 5486–5492. doi:1...

work page doi:10.18653/v1/2021.naacl-main.433 2021
[29]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems 30 (NeurIPS 2017). https://proceedings.neurips.cc/paper_files/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

2017
[30]

Yaofei Wang, Weilong Pang, Kejiang Chen, Jinyang Ding, Donghui Hu, Weiming Zhang, and Nenghai Yu. 2026. Breaking the Generative Steganography Trilemma: ANStega for Optimal Capacity, Efficiency, and Security. InProceedings of the Network and Distributed System Security Symposium (NDSS) 2026. The Internet Society, San Diego, CA, USA. doi:10.14722/ndss.2026....

work page doi:10.14722/ndss.2026.240605 2026
[31]

control bars

Yaofei Wang, Gang Pei, Kejiang Chen, Jinyang Ding, Chao Pan, Weilong Pang, Donghui Hu, and Weiming Zhang. 2025.SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling. arXiv:2503.19499 doi:10.48550/arXiv. 2503.19499

work page internal anchor Pith review doi:10.48550/arxiv 2025
[32]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason R...

work page internal anchor Pith review arXiv 2016
[33]

Ruiyi Yan and Yugo Murawaki. 2025. Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models. InPro- ceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing. Association for Computational Linguistics, Suzhou, China, 7076–7098. doi:10.18653/v1/2025.emnlp-main.361

work page doi:10.18653/v1/2025.emnlp-main.361 2025
[34]

Ruiyi Yan and Yugo Murawaki. 2025. Low-Overhead Disambiguation for Genera- tive Linguistic Steganography via Tokenization Consistency. InProceedings of the Thirty-first Annual Meeting of the Association for Natural Language Processing. The Association for Natural Language Processing, Nagasaki, Japan, 2053–2056. https://www.anlp.jp/proceedings/annual_meeti...

2025
[35]

Ruiyi Yan, Tian Song, and Yating Yang. 2024. A Near-Imperceptible Disambiguat- ing Approach via Verification for Generative Linguistic Steganography. In2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, Kuching, Malaysia, 1638–1643

2024
[36]

Ruiyi Yan, Yating Yang, and Tian Song. 2023. A Secure and Disambiguating Approach for Generative Linguistic Steganography.IEEE Signal Processing Letters 30 (2023), 1047–1051. doi:10.1109/LSP.2023.3302749

work page doi:10.1109/lsp.2023.3302749 2023
[37]

Qwen3 Technical Report

An Yang et al. 2025.Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] doi:10. 48550/arXiv.2505.09388

work page internal anchor Pith review arXiv 2025
[38]

Hao Yang, YongJian Bao, Zhongliang Yang, Sheng Liu, Yongfeng Huang, and Saimei Jiao. 2020. Linguistic Steganalysis via Densely Connected LSTM with Feature Pyramid. InProceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security. Association for Computing Machinery, New York, NY, USA, 5–10. doi:10.1145/3369412.3395067

work page doi:10.1145/3369412.3395067 2020
[39]

Zhongliang Yang, Yongfeng Huang, and Yu-Jin Zhang. 2019. A Fast and Efficient Text Steganalysis Method.IEEE Signal Processing Letters26, 4 (2019), 627–631. doi:10.1109/LSP.2019.2902095

work page doi:10.1109/lsp.2019.2902095 2019
[40]

Siyu Zhang, Zhongliang Yang, Jinshuai Yang, and Yongfeng Huang. 2021. Prov- ably Secure Generative Linguistic Steganography. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 3046–3055. doi:10.18653/v1/2021.findings-acl.268

work page doi:10.18653/v1/2021.findings-acl.268 2021
[41]

not-yet-complete

Zachary M. Ziegler, Yuntian Deng, and Alexander M. Rush. 2019. Neural Linguis- tic Steganography. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1210–1215. doi:10.1...

work page doi:10.18653/v1/d19-1115 2019