Recognition: unknown
Text Steganography with Dynamic Codebook and Multimodal Large Language Model
Pith reviewed 2026-05-10 00:41 UTC · model grok-4.3
The pith
Dynamic codebooks from multimodal LLMs enable flexible black-box text steganography
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that their method, using a dynamic codebook from multimodal LLM and shared config along with encrypted mapping and reject-sampling optimization, outperforms existing white-box text steganography in embedding capacity and text quality while achieving better practicality and flexibility than black-box paradigms on popular online social networks.
What carries the argument
The dynamic codebook constructed via shared session configuration and multimodal large language model, together with the encrypted steganographic mapping and reject-sampling feedback optimization for message embedding and extraction.
If this is right
- Outperforms white-box methods in embedding capacity and text quality.
- Achieves better practicality and flexibility than existing black-box methods in social networks.
- Secret messages are accurately extracted using the feedback optimization mechanism based on reject sampling.
Where Pith is reading between the lines
- If multimodal LLMs advance in generating more diverse captions, the steganographic embedding capacity would naturally increase.
- The method could potentially apply to other forms of text generation beyond captions, such as dialogue or article summaries.
- Social network platforms might develop new detection techniques if such dynamic steganography becomes common.
Load-bearing premise
The multimodal LLM reliably produces captions where the hidden mapping survives real social network transmission, and the reject-sampling loop converges quickly for practical use without revealing the steganography.
What would settle it
Running the method to generate and post captions on actual social networks, then attempting extraction from the retrieved posts to check if the secret messages are recovered at high success rates.
Figures
read the original abstract
With the popularity of the large language models (LLMs), text steganography has achieved remarkable performance. However, existing methods still have some issues: (1) For the white-box paradigm, this steganography behavior is prone to exposure due to sharing the off-the-shelf language model between Alice and Bob.(2) For the black-box paradigm, these methods lack flexibility and practicality since Alice and Bob should share the fixed codebook while sharing a specific extracting prompt for each steganographic sentence. In order to improve the security and practicality, we introduce a black-box text steganography with a dynamic codebook and multimodal large language model. Specifically, we first construct a dynamic codebook via some shared session configuration and a multimodal large language model. Then an encrypted steganographic mapping is designed to embed secret messages during the steganographic caption generation. Furthermore, we introduce a feedback optimization mechanism based on reject sampling to ensure accurate extraction of secret messages. Experimental results show that the proposed method outperforms existing white-box text steganography methods in terms of embedding capacity and text quality. Meanwhile, the proposed method has achieved better practicality and flexibility than the existing black-box paradigm in some popular online social networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a black-box text steganography scheme that constructs a dynamic codebook from a shared session configuration and a multimodal LLM, embeds secret messages via an encrypted steganographic mapping during caption generation, and uses a reject-sampling feedback loop to guarantee extraction. It claims superior embedding capacity and text quality relative to white-box baselines together with improved practicality and flexibility over fixed-codebook black-box methods when deployed on real social-network platforms.
Significance. If the experimental claims are substantiated, the dynamic-codebook plus MLLM approach would address two long-standing limitations in text steganography—model-sharing exposure in white-box settings and per-message prompt/codebook rigidity in black-box settings—potentially enabling more secure and usable covert communication on platforms that apply compression and re-encoding. The reject-sampling mechanism is a concrete engineering contribution that could be reused in other LLM-based steganographic pipelines.
major comments (2)
- [Abstract] Abstract: the central claim that the method 'outperforms existing white-box text steganography methods in terms of embedding capacity and text quality' is unsupported by any quantitative metrics, baseline comparisons, error bars, or description of the experimental protocol; without these data the performance advantage cannot be evaluated.
- [Abstract] Abstract: the practicality claim—that the scheme achieves 'better practicality and flexibility than the existing black-box paradigm in some popular online social networks'—rests on two unverified load-bearing assumptions: (1) that MLLM-generated captions containing the encrypted dynamic-codebook mapping survive platform transmission artifacts (compression, resizing, re-encoding), and (2) that the reject-sampling loop converges in a small number of iterations without producing detectable query patterns. No iteration counts, transmission-success rates, or platform-specific results are supplied.
minor comments (1)
- [Abstract] The abstract introduces the terms 'dynamic codebook' and 'encrypted steganographic mapping' without a concise one-sentence definition; a brief gloss in the abstract would improve readability for readers outside the immediate subfield.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We agree that the abstract requires strengthening to better support our claims with concrete evidence. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the method 'outperforms existing white-box text steganography methods in terms of embedding capacity and text quality' is unsupported by any quantitative metrics, baseline comparisons, error bars, or description of the experimental protocol; without these data the performance advantage cannot be evaluated.
Authors: We agree that the abstract would benefit from including specific quantitative support to make the performance claim self-contained. The manuscript body reports experimental comparisons against white-box baselines, including embedding capacity in bits per character and text quality via perplexity and human evaluation scores. We will revise the abstract to summarize these key metrics, name the baselines, and briefly note the evaluation protocol, thereby allowing direct assessment of the claimed advantages. revision: yes
-
Referee: [Abstract] Abstract: the practicality claim—that the scheme achieves 'better practicality and flexibility than the existing black-box paradigm in some popular online social networks'—rests on two unverified load-bearing assumptions: (1) that MLLM-generated captions containing the encrypted dynamic-codebook mapping survive platform transmission artifacts (compression, resizing, re-encoding), and (2) that the reject-sampling loop converges in a small number of iterations without producing detectable query patterns. No iteration counts, transmission-success rates, or platform-specific results are supplied.
Authors: We acknowledge that the current abstract does not supply the requested supporting data on transmission robustness or reject-sampling convergence. The manuscript describes the reject-sampling feedback mechanism but does not report platform-specific success rates or iteration statistics. For the revision we will add experimental results quantifying transmission success under compression and re-encoding on representative social networks together with average iteration counts for the feedback loop, thereby verifying the practicality assumptions. revision: yes
Circularity Check
No significant circularity; method relies on external MLLM behavior and empirical results
full rationale
The paper describes a black-box text steganography approach that constructs a dynamic codebook from a shared session configuration plus multimodal LLM output, applies an encrypted mapping during caption generation, and uses reject sampling to ensure extraction accuracy. Performance claims (higher embedding capacity, better text quality, improved practicality/flexibility on social networks) are asserted via experimental results rather than any derivation that reduces by construction to fitted parameters, self-defined quantities, or a self-citation chain. No equations or steps in the abstract or method outline exhibit self-definitional loops, fitted-input-as-prediction, uniqueness theorems imported from the authors' prior work, or ansatz smuggling. The approach treats the MLLM and shared config as external inputs, making the central claims independent of internal circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multimodal LLMs can generate coherent, natural captions whose internal structure can be steered to carry hidden bits without obvious artifacts.
invented entities (2)
-
dynamic codebook
no independent evidence
-
encrypted steganographic mapping
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski,...
work page internal anchor Pith review arXiv 2022
-
[2]
Ross J Anderson and Fabien AP Petitcolas. 1998. On the limits of steganography. IEEE Journal on selected areas in communications16, 4 (1998), 474–481
1998
-
[3]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. 2025. Qwen2.5-VL Technical Rep...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Ching-Yun Chang and Stephen Clark. 2014. Practical Linguistic Steganography using Contextual Synonym Substitution and a Novel Vertex Coding Method. Computational Linguistics40, 2 (2014), 403–448. doi:10.1162/COLI_a_00176
-
[5]
Mark Chapman and George I. Davida. 1997. Hiding the Hidden: A Software System for Concealing Ciphertext as Innocuous Text. InInformation and Com- munications Security (ICICS 1997) (Lecture Notes in Computer Science, Vol. 1334). Springer, Berlin, Heidelberg, 335–345. doi:10.1007/BFb0028489
-
[6]
2007.Digital watermarking and steganography
Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica Fridrich, and Ton Kalker. 2007.Digital watermarking and steganography. Morgan kaufmann
2007
-
[7]
Falcon Dai and Zheng Cai. 2019. Towards Near-imperceptible Steganographic Text. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4303–4308. doi:10.18653/v1/P19-1422
-
[8]
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven C. H. Hoi
-
[9]
InAdvances in Neural Information Processing Systems, Vol
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. InAdvances in Neural Information Processing Systems, Vol. 36. 49250–49267. https://papers.nips.cc/paper_files/paper/2023/hash/ 9a6a435e75419a836fe47ab6793623e6-Abstract-Conference.html
2023
-
[10]
Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob N. Foerster, and Martin Strohmeier. 2023. Perfectly Secure Steganography Using Mini- mum Entropy Coupling. InThe Eleventh International Conference on Learning Representations (ICLR 2023). https://openreview.net/forum?id=HQ67mj5rJdR arXiv:2210.14889
-
[11]
In: 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25, 2023
Jinyang Ding, Kejiang Chen, Yaofei Wang, Na Zhao, Weiming Zhang, and Neng- hai Yu. 2023. Discop: Provably Secure Steganography in Practice Based on “Distribution Copies”. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2238–2255. doi:10.1109/SP46215.2023.10179287
-
[12]
Tina Fang, Martin Jaggi, and Katerina Argyraki. 2017. Generating Stegano- graphic Text with LSTMs. InProceedings of ACL 2017, Student Research Work- shop. Association for Computational Linguistics, Vancouver, Canada, 100–106. https://aclanthology.org/P17-3017/
2017
-
[13]
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi
-
[14]
Clipscore: A reference-free evaluation metric for image captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 7514–7528. doi:10.18653/v1/2021.emnlp-main.595
-
[15]
Yu-Shin Huang, Peter Just, Hanyun Yin, Krishna Narayanan, Ruihong Huang, and Chao Tian. 2026. OD-Stega: LLM-Based Relatively Secure Steganography via Optimized Distributions. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Vera Demberg, Kentaro Inui, and Lluís Marquez (...
-
[16]
Jois, Matthew Green, and Aviel D
Gabriel Kaptchuk, Tushar M. Jois, Matthew Green, and Aviel D. Rubin. 2021. METEOR: Cryptographically Secure Steganography for Realistic Distributions. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). ACM, Virtual Event, Republic of Korea, 1529–1548. doi:10. 1145/3460120.3484550
-
[17]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Boot- strapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202). PMLR, 19730–19742. https://proceedings.mlr.press/v202/li23q.html
2023
-
[18]
Ke Lin, Yiyang Luo, Zijian Zhang, and Luo Ping. 2024. Zero-shot Generative Linguistic Steganography. InProceedings of the 2024 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Li...
-
[19]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruc- tion Tuning.arXiv preprint arXiv:2304.08485(2023). arXiv:2304.08485 [cs.CV] https://arxiv.org/abs/2304.08485
work page internal anchor Pith review arXiv 2023
-
[20]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach.CoRRabs/1907.11692 (2019). arXiv:1907.11692 https://arxiv.org/abs/1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[21]
Wanli Peng, Jinyu Zhang, Yiming Xue, and Zhenghong Yang. 2021. Real-Time Text Steganalysis Based on Multi-Stage Transfer Learning.IEEE Signal Processing Letters28 (2021), 1510–1514. doi:10.1109/LSP.2021.3097241
-
[22]
Niels Provos and Peter Honeyman. 2003. Hide and seek: An introduction to steganography.IEEE security & privacy1, 3 (2003), 32–44
2003
-
[23]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Con- ference on Machine Learning (Proceedings of Machi...
2021
-
[24]
2019.Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019.Language Models are Unsupervised Multitask Learners. Technical Report. OpenAI. https://cdn.openai.com/better-language-models/language- models.pdf
2019
-
[25]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embed- dings using Siamese BERT-Networks.arXiv preprint arXiv:1908.10084(2019). arXiv:1908.10084 https://arxiv.org/abs/1908.10084
work page internal anchor Pith review arXiv 2019
-
[26]
Honai Ueoka, Yugo Murawaki, and Sadao Kurohashi. 2021. Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model. InProceed- ings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Com- putational Linguistics, Online, 5486–5492. doi:1...
-
[27]
Yaofei Wang, Gang Pei, Kejiang Chen, Jinyang Ding, Chao Pan, Weilong Pang, Donghui Hu, and Weiming Zhang. 2025. SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling. In34th USENIX Security Symposium (USENIX Security 25). USENIX Association. https://www.usenix.org/conference/ usenixsecurity25/presentation/wang-yaofei
2025
-
[28]
Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, and Wanli Peng. 2024. Generative Text Steganography with Large Language Model. InProceedings of the 32nd ACM International Conference on Multimedia (MM ’24). ACM, Melbourne, VIC, Australia, 10345–10353. doi:10.1145/3664647.3680562
-
[29]
Ruiyi Yan and Yugo Murawaki. 2025. Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models. InProceed- ings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguisti...
2025
-
[30]
doi:10.18653/v1/2025.emnlp-main.361
-
[31]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tianyi T...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2024
-
[32]
Zhongliang Yang, Xiaoqing Guo, Zi-Ming Chen, Yongfeng Huang, and Yu-Jin Zhang. 2019. RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks.IEEE Transactions on Information Forensics and Security14, 5 (2019), 1280–1295. doi:10.1109/TIFS.2018.2871746
-
[33]
Siyu Zhang, Zhongliang Yang, Jinshuai Yang, and Yongfeng Huang. 2021. Prov- ably Secure Generative Linguistic Steganography. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 3046–3055. doi:10.18653/v1/2021.findings-acl.268
-
[34]
Zachary Ziegler, Yuntian Deng, and Alexander Rush. 2019. Neural Linguistic Steganography. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1210–1215. doi:10.18653/v1/D19-1115
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.