ReTokSync resolves tokenization ambiguity in generative linguistic steganography via targeted self-synchronizing resets, achieving over 99.7% extraction accuracy and 100% recovery with an auxiliary channel while matching baseline security and quality.
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
other 1
citation-polarity summary
years
2026 3roles
other 1polarities
unclear 1representative citing papers
Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.
citing papers explorer
-
ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography
ReTokSync resolves tokenization ambiguity in generative linguistic steganography via targeted self-synchronizing resets, achieving over 99.7% extraction accuracy and 100% recovery with an auxiliary channel while matching baseline security and quality.
-
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization
Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.
- Tokenization with Split Trees