FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

Jialin Yan; Shilin Wang; Tanfeng Sun; Xinghao Jiang; Xinpeng Zhang; Yu Cheng; Zhaoxia Yin

arxiv: 2505.22266 · v3 · submitted 2025-05-28 · 💻 cs.SD · cs.MM· eess.AS

FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

Jialin Yan , Yu Cheng , Zhaoxia Yin , Xinpeng Zhang , Shilin Wang , Tanfeng Sun , Xinghao Jiang This is my paper

Pith reviewed 2026-05-19 13:32 UTC · model grok-4.3

classification 💻 cs.SD cs.MMeess.AS

keywords audio steganographyadversarial perturbationsfixed decoder networkanti-steganalysisrobustnessPSNR gainsecret message extraction

0 comments

The pith

FGAS embeds secret messages as adversarial perturbations in audio using a shared fixed decoder network to improve quality and security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FGAS, a method for audio steganography that generates adversarial perturbations to carry secret messages and embeds them directly into cover audio. The receiver uses only the structure and key of a fixed decoder network to extract the message without any per-instance retraining. This setup keeps the resulting stego audio close to the original in both sound quality and statistical properties. A reader would care because it lowers computational demands while raising the bar for detection and surviving routine audio edits.

Core claim

The paper claims that adversarial perturbations optimized to carry the secret message can be added to cover audio so that a lightweight fixed decoder, whose parameters are shared in advance, extracts the message reliably from the stego audio while the perturbations preserve perceptual fidelity and statistical similarity to the cover.

What carries the argument

Audio Adversarial Perturbation Generation (A2PG) strategy together with an optional robust extension and a lightweight fixed decoder network.

If this is right

Stego audio shows an average PSNR gain exceeding 10 dB over existing state-of-the-art methods.
The scheme remains robust under common audio processing operations including compression and added noise.
Anti-steganalysis resistance improves, producing classification error rates roughly 2 percent higher than prior methods at high embedding capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-time hiding applications become more practical because the decoder never needs to be retrained for each new cover.
The fixed-decoder pattern could be tested for hiding data inside other media such as video or sensor streams.
Performance against newer machine-learning steganalysis tools would be a natural next measurement.

Load-bearing premise

Adversarial perturbations can be generated to embed the secret message while remaining close enough to the cover audio in perception and statistics for the fixed decoder to extract it accurately without retraining or noticeable changes.

What would settle it

Demonstrating that extraction accuracy falls below reliable levels on new audio samples without decoder retraining, or that steganalysis classifiers reach error rates no higher than current methods, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2505.22266 by Jialin Yan, Shilin Wang, Tanfeng Sun, Xinghao Jiang, Xinpeng Zhang, Yu Cheng, Zhaoxia Yin.

**Figure 2.** Figure 2: FGAS scheme: (a): Alice (The Sender) uses the proposed audio Adversarial Perturbation Generation (A [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Detailed structure of fixed decoder network and the process of perturbation decoding into secret message. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The impact of ϵ on detection accuracy and recovery accuracy. Note that for ϵ ≤ 8 × 10−3 , the PSNR exceeds 100 dB and PEAQ reaches 4.54, ensuring superior imperceptibility. IV. EXPERIMENT A. Experiment Setups 1) Implementation Details: The experiments are conducted on four diverse datasets: TIMIT1 , LJSpeech2 , GTZAN3 , and Audioset4 . We collect 3,000 mono, 16-bit audio clips from each dataset, all of whi… view at source ↗

**Figure 3.** Figure 3: After determining the design of the decoder network De(·) using this method and employing the shared key kr , both the sender and receiver can independently construct an identical decoder network. This method significantly reduces the amount of information exchange required, thus enhancing both the imperceptibility and the anti-steganalysis performance of the steganographic system. TABLE II PERFORMANCE DIM… view at source ↗

**Figure 5.** Figure 5: Time domain waveforms of audio clips and perturbations on the [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Frequency domain waveforms of audio clips and perturbations on the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Energy Maps of audio clips and perturbations on the TIMIT dataset: [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

The rapid development of Artificial Intelligence Generated Content (AIGC) has made high-fidelity generated audio widely available across the Internet, driving the advancement of audio steganography. Benefiting from advances in deep learning, current audio steganography schemes are mainly based on encoder-decoder network architectures. While these methods guarantee a certain level of perceptual quality for stego audio, they typically face high computational cost and long implementation time, as well as poor anti-steganalysis performance. To address the aforementioned issues, we pioneer a Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation (FGAS). Adversarial perturbations carrying a secret message are embedded into the cover audio to generate stego audio. The receiver only needs to share the structure and key of the fixed decoder network to accurately extract the secret message from the stego audio. In FGAS, we propose an Audio Adversarial Perturbation Generation (A2PG) strategy with an optional robust extension and design a lightweight fixed decoder. The fixed decoder guarantees reliable extraction of the hidden message, while adversarial perturbations are optimized to keep the stego audio perceptually and statistically close to the cover audio, thereby improving anti-steganalysis performance. The experimental results show that FGAS significantly improves stego audio quality, achieving an average PSNR gain of over 10 dB compared to SOTA methods. Furthermore, FGAS demonstrates strong robustness against common audio processing attacks. Moreover, FGAS exhibits superior anti-steganalysis performance across different relative payloads; under high-capacity embedding, it achieves a classification error rate about 2% higher, indicating stronger anti-steganalysis performance than current SOTA methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FGAS swaps to a fixed decoder plus adversarial perturbations for audio steganography and reports clear numerical gains in quality and detection resistance.

read the letter

The one or two things to know about this paper are that it proposes using a fixed decoder network instead of the typical encoder-decoder pair for audio steganography, and it generates adversarial perturbations to embed the secret message. This is positioned as a way to reduce computational costs and improve performance. The paper does a good job outlining the Audio Adversarial Perturbation Generation strategy and designing a lightweight fixed decoder that allows reliable message extraction by sharing only the network structure and key. The optional robust extension for handling common audio processing attacks shows some thought toward real-world use. The reported results, including an average PSNR gain of over 10 dB compared to state-of-the-art methods and superior anti-steganalysis performance with about 2% higher classification error rates under high-capacity embedding, indicate potential practical benefits if the experiments are robust. On the soft spots, the abstract provides limited information on the specific datasets used, the exact baselines compared against, and the optimization procedures for the perturbations. This makes it difficult to fully assess the generalizability of the improvements. The central challenge is ensuring that the adversarial perturbations can be optimized to be small enough for perceptual and statistical closeness while allowing accurate extraction by the fixed decoder across various cover audios and messages. If the full paper includes detailed experiments demonstrating consistent success without excessive perturbation magnitudes, that would strengthen the soundness. The concern about possible circularity in tuning is minor here since the claims are about end performance. This paper is primarily for the community working on audio steganography and information hiding techniques using deep learning. Readers looking for alternative architectures to traditional methods would find value in the fixed decoder concept and the perturbation-based embedding. I think it deserves a serious referee. The work engages honestly with the literature on encoder-decoder limitations and presents testable improvements, so peer review would help refine the presentation and verify the experimental rigor.

Referee Report

3 major / 2 minor

Summary. The paper proposes FGAS, a fixed-decoder audio steganography scheme that generates adversarial perturbations via an A2PG strategy to embed secret messages into cover audio. A single lightweight decoder (shared by structure and key) extracts the message at the receiver. The method includes an optional robust extension against post-embedding attacks. Experiments are reported to show >10 dB average PSNR improvement over SOTA, strong robustness to common audio processing, and superior anti-steganalysis (approximately 2% higher classification error under high-capacity embedding).

Significance. If the central claims hold, the fixed-decoder plus adversarial-perturbation design would represent a meaningful departure from conventional encoder-decoder audio steganography by eliminating per-instance retraining while simultaneously improving perceptual quality and statistical undetectability. The approach could lower computational cost for high-fidelity AIGC audio hiding and strengthen practical security against steganalysis.

major comments (3)

[§3.2] §3.2 (A2PG optimization): The manuscript provides no quantitative evidence (success rate, histogram of achieved perturbation norms, or failure cases) that the perturbation search consistently locates a solution inside the imperceptibility ball for arbitrary message-cover pairs. Because the fixed decoder is never retrained per instance, any non-negligible fraction of failures would force larger perturbations, directly contradicting the reported >10 dB average PSNR gain and the assumption of reliable extraction.
[§4] §4 (Experimental protocol): No description is given of the audio corpora (sampling rate, duration, number of files, train/test split), the precise SOTA baselines, the optimization hyperparameters of A2PG, or any statistical tests (confidence intervals, p-values) supporting the PSNR, robustness, and classification-error claims. These omissions make it impossible to assess whether the headline numerical improvements are robust or conditioned on post-hoc selection.
[§3.3] §3.3 (fixed decoder): The claim that the lightweight fixed decoder “guarantees reliable extraction” is not supported by capacity analysis, bit-error-rate curves versus payload, or ablation on decoder depth/width. Without such data it is unclear whether the reported anti-steganalysis advantage stems from the perturbation design or simply from a decoder that tolerates only low-capacity messages.

minor comments (2)

[Abstract] The abstract and §1 use “relative payloads” without an explicit definition or formula; a short clarifying sentence would improve readability.
[Figures] Several figure captions omit units on the y-axis or fail to state the number of averaged trials; this affects interpretability of the PSNR and steganalysis plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity and completeness.

read point-by-point responses

Referee: [§3.2] §3.2 (A2PG optimization): The manuscript provides no quantitative evidence (success rate, histogram of achieved perturbation norms, or failure cases) that the perturbation search consistently locates a solution inside the imperceptibility ball for arbitrary message-cover pairs. Because the fixed decoder is never retrained per instance, any non-negligible fraction of failures would force larger perturbations, directly contradicting the reported >10 dB average PSNR gain and the assumption of reliable extraction.

Authors: We thank the referee for this important observation. The >10 dB average PSNR improvement reported in our experiments is the result of successful A2PG optimizations that kept perturbations within the imperceptibility constraints for the message-cover pairs evaluated. To provide explicit quantitative support for the consistency of the search process, we will add success rates, histograms of perturbation norms, and discussion of any edge cases in the revised manuscript. revision: yes
Referee: [§4] §4 (Experimental protocol): No description is given of the audio corpora (sampling rate, duration, number of files, train/test split), the precise SOTA baselines, the optimization hyperparameters of A2PG, or any statistical tests (confidence intervals, p-values) supporting the PSNR, robustness, and classification-error claims. These omissions make it impossible to assess whether the headline numerical improvements are robust or conditioned on post-hoc selection.

Authors: We agree that the experimental section requires substantially more detail for reproducibility and to allow proper evaluation of the results. In the revised manuscript we will expand §4 with complete descriptions of the audio corpora (sampling rates, durations, file counts, and splits), the exact SOTA baselines implemented, the A2PG optimization hyperparameters, and appropriate statistical tests including confidence intervals for the key metrics. revision: yes
Referee: [§3.3] §3.3 (fixed decoder): The claim that the lightweight fixed decoder “guarantees reliable extraction” is not supported by capacity analysis, bit-error-rate curves versus payload, or ablation on decoder depth/width. Without such data it is unclear whether the reported anti-steganalysis advantage stems from the perturbation design or simply from a decoder that tolerates only low-capacity messages.

Authors: The lightweight fixed decoder is designed to achieve reliable extraction at the payload capacities used throughout our experiments, and the anti-steganalysis gains arise from the joint effect of the adversarial perturbations and this decoder. To strengthen the supporting evidence, we will add capacity analysis, bit-error-rate curves versus payload, and ablation studies varying decoder depth and width in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes FGAS as a new architecture combining a fixed lightweight decoder with an A2PG optimization strategy for embedding messages via adversarial perturbations. The headline performance claims (PSNR gain >10 dB, robustness to attacks, improved anti-steganalysis error rates) are presented as empirical outcomes of experiments on the implemented method. No equations, parameter-fitting steps, or self-citations in the abstract or method description reduce these results to quantities defined by the inputs or by construction. The perturbation optimization is a core algorithmic component whose success is evaluated externally rather than presupposed by the reported metrics. The derivation chain therefore remains self-contained against external benchmarks and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard deep-learning optimization assumptions and the premise that a fixed decoder can serve as a reliable extractor once its structure is shared; no new physical entities are introduced.

free parameters (1)

perturbation strength and optimization hyperparameters
Chosen to balance message embedding capacity against perceptual and statistical similarity in the A2PG strategy.

axioms (1)

domain assumption Adversarial perturbations can be optimized to carry secret messages while preserving audio statistics and perceptual quality sufficiently for reliable extraction by a fixed decoder.
Invoked in the design of the A2PG strategy and the claim of improved anti-steganalysis performance.

pith-pipeline@v0.9.0 · 5859 in / 1288 out tokens · 46888 ms · 2026-05-19T13:32:09.208915+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Separable reversible data hiding in encrypted images based on systematic polar code and flag bit transmission channel model,

K. Chen, Q. Guan, W. Zhang, N. Yu, and W. Lu, “Separable reversible data hiding in encrypted images based on systematic polar code and flag bit transmission channel model,”IEEE Trans. Dependable Secure Com- put., vol. 22, no. 6, pp. 6844–6861, 2025

work page 2025
[2]

Reversible data hiding in encrypted images based on pixel-level masked autoencoder and polar code,

Z. Cheng, K. Chen, and Q. Guan, “Reversible data hiding in encrypted images based on pixel-level masked autoencoder and polar code,”Signal Process., vol. 226, p. 109664, 2025

work page 2025
[3]

Provably secure public-key steganography based on elliptic curve cryptography,

X. Zhang, K. Chen, J. Ding, Y . Yang, W. Zhang, and N. Yu, “Provably secure public-key steganography based on elliptic curve cryptography,”IEEE Trans. Inf. Foren- sics Security, vol. 19, pp. 3148–3163, 2024

work page 2024
[4]

Rethinking prefix-based steganogra- phy for enhanced security and efficiency,

C. Pan, D. Hu, Y . Wang, K. Chen, Y . Peng, X. Rong, C. Gu, and M. Li, “Rethinking prefix-based steganogra- phy for enhanced security and efficiency,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 3287–3301, 2025

work page 2025
[5]

Non-binary polar codes for steganography,

Q. Guan, K. Chen, W. Lu, W. Zhang, and N. Yu, “Non-binary polar codes for steganography,”IEEE Trans. Dependable Secure Comput., pp. 1–18, 2025

work page 2025
[6]

A gan framework for asymmetric embedding costs learning in jpeg steganography,

B. Li, W. Luo, P. Zheng, S. Tan, and J. Huang, “A gan framework for asymmetric embedding costs learning in jpeg steganography,” inProc. ICME, 2025, pp. 1–6

work page 2025
[7]

Robust steganography with boundary-preserving overflow alleviation and adap- tive error correction,

Y . Cheng, Z. Luo, and Z. Yin, “Robust steganography with boundary-preserving overflow alleviation and adap- tive error correction,”Expert Syst. Appl., vol. 281, p. 127598, 2025

work page 2025
[8]

Establishing robust generative image steganography via popular stable diffusion,

X. Hu, S. Li, Q. Ying, W. Peng, X. Zhang, and Z. Qian, “Establishing robust generative image steganography via popular stable diffusion,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 8094–8108, 2024

work page 2024
[9]

Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,

T. Yang, H. Wu, B. Yi, G. Feng, and X. Zhang, “Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,”IEEE Trans. Dependable Secure Comput., vol. 21, no. 1, pp. 139–152, 2023

work page 2023
[10]

A robust coverless video steganography based on the similarity of inter-frames,

L. Meng, X. Jiang, T. Sun, Z. Zhao, and Q. Xu, “A robust coverless video steganography based on the similarity of inter-frames,”IEEE Trans. Multimedia, vol. 26, pp. 5996–6011, 2023

work page 2023
[11]

Ctnet: A convolutional transformer network for color image steganalysis,

K. Wei, W. Luo, S. Tan, and J.-W. Huang, “Ctnet: A convolutional transformer network for color image steganalysis,”J. Comput. Sci. Technol., vol. 40, no. 2, pp. 413–427, 2025

work page 2025
[12]

Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling,

K. Wei, W. Luo, and J. Huang, “Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 9970–9983, 2024

work page 2024
[13]

Residual guided coordinate attention for selection channel aware image steganalysis,

K. Wei, W. Luo, M. Liu, and M. Ye, “Residual guided coordinate attention for selection channel aware image steganalysis,”Multimedia Syst., vol. 29, no. 4, pp. 2125– 2135, 2023

work page 2023
[14]

Color im- age steganography using generative adversarial networks with a phased training strategy,

S. Zhou, M. Ye, W. Luo, X. Liao, and K. Wei, “Color im- age steganography using generative adversarial networks with a phased training strategy,” inProc. IH&MMSec, 2025, pp. 142–152

work page 2025
[15]

Steganography embedding cost learning with generative multi-adversarial network,

D. Huang, W. Luo, M. Liu, W. B. Tang, and J. Huang, “Steganography embedding cost learning with generative multi-adversarial network,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 15–29, 2024

work page 2024
[16]

Double-layered dual-syndrome trellis codes utilizing channel knowledge for robust steganography,

Q. Guan, P. Liu, W. Zhang, W. Lu, and X. Zhang, “Double-layered dual-syndrome trellis codes utilizing channel knowledge for robust steganography,”IEEE Trans. Inf. Forensics Security, vol. 18, pp. 501–516, 2023

work page 2023
[17]

A novel residual- guided learning method for image steganography,

M. Ye, D. Huang, K. Wei, and W. Luo, “A novel residual- guided learning method for image steganography,” in Proc. ICASSP, 2024, pp. 4565–4569

work page 2024
[18]

Derivative-based steganographic distortion and its non- additive extensions for audio,

K. Chen, H. Zhou, W. Li, K. Yang, W. Zhang, and N. Yu, “Derivative-based steganographic distortion and its non- additive extensions for audio,”IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2027–2032, 2019

work page 2027
[19]

Adaptive audio steganog- raphy based on advanced audio coding and syndrome- trellis coding,

W. Luo, Y . Zhang, and H. Li, “Adaptive audio steganog- raphy based on advanced audio coding and syndrome- trellis coding,” inProc. IWDW, 2017, pp. 177–186

work page 2017
[20]

Minimizing addi- tive distortion in steganography using syndrome-trellis codes,

T. Filler, J. Judas, and J. Fridrich, “Minimizing addi- tive distortion in steganography using syndrome-trellis codes,”IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 920–935, 2011

work page 2011
[21]

Efficient audio steganography using generalized audio intrinsic energy with micro-amplitude modification suppression,

W. Su, J. Ni, X. Hu, and B. Li, “Efficient audio steganography using generalized audio intrinsic energy with micro-amplitude modification suppression,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 6559–6572, 2024

work page 2024
[22]

Audio steganography using lsb encoding technique with increased capacity and bit error rate optimization,

S. Roy, J. Parida, A. K. Singh, and A. S. Sairam, “Audio steganography using lsb encoding technique with increased capacity and bit error rate optimization,” in Proc. CSCIT, 2012, pp. 372–376

work page 2012
[23]

Audio steganography using dual randomness lsb method,

J. Vimal and A. M. Alex, “Audio steganography using dual randomness lsb method,” inProc. ICCICCT, 2014, pp. 941–944

work page 2014
[24]

Enhancing lsb using binary message size encoding for high capacity, transparent and secure audio steganography–an innova- tive approach,

M. M. Mahmoud and H. T. Elshoush, “Enhancing lsb using binary message size encoding for high capacity, transparent and secure audio steganography–an innova- tive approach,”IEEE Access, vol. 10, pp. 29 954–29 971, 2022

work page 2022
[25]

Approaching optimal embedding in audio steganography with gan,

J. Yang, H. Zheng, X. Kang, and Y .-Q. Shi, “Approaching optimal embedding in audio steganography with gan,” in Proc. ICASSP, 2020, pp. 2827–2831

work page 2020
[26]

Audio steganog- raphy based on iterative adversarial attacks against con- volutional neural networks,

J. Wu, B. Chen, W. Luo, and Y . Fang, “Audio steganog- raphy based on iterative adversarial attacks against con- volutional neural networks,”IEEE Trans. Inf. Forensics Security, vol. 15, pp. 2282–2294, 2020

work page 2020
[27]

A ro- bust coverless audio steganography based on differential privacy clustering,

Y . Feng, L. Xu, X. Lu, G. Zhang, and W. Rao, “A ro- bust coverless audio steganography based on differential privacy clustering,”IEEE Trans. Multimedia, vol. 27, pp. 5669–5684, 2025

work page 2025
[28]

Cover reproducible steganography via deep genera- tive models,

K. Chen, H. Zhou, Y . Wang, M. Li, W. Zhang, and N. Yu, “Cover reproducible steganography via deep genera- tive models,”IEEE Trans. Dependable Secure Comput., vol. 20, no. 5, pp. 3787–3798, 2022

work page 2022
[29]

Mutual information-optimized steganalysis for generative steganography,

M. Hu and H. Wang, “Mutual information-optimized steganalysis for generative steganography,”IEEE Trans. Inf. Forensics Security, pp. 1852–1865, 2025. 13

work page 2025
[30]

A com- parative study of audio steganography schemes,

F. Hemeida, W. Alexan, and S. Mamdouh, “A com- parative study of audio steganography schemes,”Int. J. Comput. Digit. Syst., vol. 10, pp. 555–562, 2021

work page 2021
[31]

Pixinwav: Residual steganography for hiding pixels in audio,

M. Geleta, C. Punti, K. McGuinness, J. Pons, C. Canton, and X. Giro-i Nieto, “Pixinwav: Residual steganography for hiding pixels in audio,” inProc. ICASSP, 2022, pp. 2485–2489

work page 2022
[32]

Distribution-preserving steganography based on text-to-speech generative models,

K. Chen, H. Zhou, H. Zhao, D. Chen, W. Zhang, and N. Yu, “Distribution-preserving steganography based on text-to-speech generative models,”IEEE Trans. Depend- able Secure Comput., vol. 19, no. 5, pp. 3343–3356, 2021

work page 2021
[33]

Securing fixed neural network steganography,

Z. Luo, S. Li, G. Li, Z. Qian, and X. Zhang, “Securing fixed neural network steganography,” inProc. ACM MM, 2023, pp. 7943–7951

work page 2023
[34]

Cover-separable fixed neural network steganography via deep generative models,

G. Li, S. S. Li, Z. Qian, and X. Zhang, “Cover-separable fixed neural network steganography via deep generative models,” inProc. ACM MM, 2024, pp. 10 238–10 247

work page 2024
[35]

Rfnns: Robust fixed neural network steganography with popular deep generative models,

Y . Cheng, J. Zhou, J. Chen, Z. Yin, and X. Zhang, “Rfnns: Robust fixed neural network steganography with popular deep generative models,”Proc. AAAI, 2026

work page 2026
[36]

Fixed neural network steganography: Train the images, not the network,

V . Kishore, X. Chen, Y . Wang, B. Li, and K. Q. Wein- berger, “Fixed neural network steganography: Train the images, not the network,” inProc. ICLR, 2022

work page 2022
[37]

Evasion attack steganography: Turning vulnerability of machine learning to adversarial attacks into a real-world application,

S. Ghamizi, M. Cordy, M. Papadakis, and Y . Le Traon, “Evasion attack steganography: Turning vulnerability of machine learning to adversarial attacks into a real-world application,” inProc. ICCV, 2021, pp. 31–40

work page 2021
[38]

Audio watermark: Dynamic and harmless watermark for black-box voice dataset copy- right protection,

H. Guo, J. Guo, B. Chen, Y . Wang, X. Chen, H. Huang, Q. Yan, and L. Xiao, “Audio watermark: Dynamic and harmless watermark for black-box voice dataset copy- right protection,” inProc. USENIX Security, 2025, pp. 4601–4620

work page 2025
[39]

Peaq-the itu standard for objective measurement of perceived audio quality,

T. Thiede, W. C. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. G. Beerends, and C. Colomes, “Peaq-the itu standard for objective measurement of perceived audio quality,”J. Audio Eng. Soc., vol. 48, no. 1/2, pp. 3–29, 2000

work page 2000
[40]

Audio steganalysis with convolutional neural network,

B. Chen, W. Luo, and H. Li, “Audio steganalysis with convolutional neural network,” inProc. ACM Workshop Inf. Hiding Multimedia Security (IH&MMSec), 2017, pp. 85–90

work page 2017
[41]

Audio steganalysis with improved convolutional neural network,

Y . Lin, R. Wang, D. Yan, L. Dong, and X. Zhang, “Audio steganalysis with improved convolutional neural network,” inProc. ACM Workshop Inf. Hiding Multime- dia Security (IH&MMSec), 2019, pp. 210–215

work page 2019
[42]

Improved audio steganalytic feature and its applications in audio forensics,

W. Luo, H. Li, Q. Yan, R. Yang, and J. Huang, “Improved audio steganalytic feature and its applications in audio forensics,”ACM Trans. Multimedia Comput., Commun., Appl., vol. 14, no. 2, pp. 1–14, 2018

work page 2018
[43]

Ahcm: Adaptive huffman code mapping for audio steganogra- phy based on psychoacoustic model,

X. Yi, K. Yang, X. Zhao, Y . Wang, and H. Yu, “Ahcm: Adaptive huffman code mapping for audio steganogra- phy based on psychoacoustic model,”IEEE Trans. Inf. Forensics Security, vol. 14, no. 8, pp. 2217–2231, 2019

work page 2019
[44]

Hifi- stego: A high-fidelity embedding audio steganography based on audio features decoupling,

S. Zhang, B. Tian, Y . Gao, X. Liu, and W. Yang, “Hifi- stego: A high-fidelity embedding audio steganography based on audio features decoupling,”IEEE Trans. Audio, Speech, Lang. Process., vol. 33, pp. 2032–2044, 2025

work page 2032
[45]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”Proc. ICLR, 2015

work page 2015
[46]

Wet paper codes with improved embedding efficiency,

J. Fridrich, M. Goljan, and D. Soukal, “Wet paper codes with improved embedding efficiency,”IEEE Trans. Inf. Forensics Security, vol. 1, no. 1, pp. 102–110, 2006

work page 2006

[1] [1]

Separable reversible data hiding in encrypted images based on systematic polar code and flag bit transmission channel model,

K. Chen, Q. Guan, W. Zhang, N. Yu, and W. Lu, “Separable reversible data hiding in encrypted images based on systematic polar code and flag bit transmission channel model,”IEEE Trans. Dependable Secure Com- put., vol. 22, no. 6, pp. 6844–6861, 2025

work page 2025

[2] [2]

Reversible data hiding in encrypted images based on pixel-level masked autoencoder and polar code,

Z. Cheng, K. Chen, and Q. Guan, “Reversible data hiding in encrypted images based on pixel-level masked autoencoder and polar code,”Signal Process., vol. 226, p. 109664, 2025

work page 2025

[3] [3]

Provably secure public-key steganography based on elliptic curve cryptography,

X. Zhang, K. Chen, J. Ding, Y . Yang, W. Zhang, and N. Yu, “Provably secure public-key steganography based on elliptic curve cryptography,”IEEE Trans. Inf. Foren- sics Security, vol. 19, pp. 3148–3163, 2024

work page 2024

[4] [4]

Rethinking prefix-based steganogra- phy for enhanced security and efficiency,

C. Pan, D. Hu, Y . Wang, K. Chen, Y . Peng, X. Rong, C. Gu, and M. Li, “Rethinking prefix-based steganogra- phy for enhanced security and efficiency,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 3287–3301, 2025

work page 2025

[5] [5]

Non-binary polar codes for steganography,

Q. Guan, K. Chen, W. Lu, W. Zhang, and N. Yu, “Non-binary polar codes for steganography,”IEEE Trans. Dependable Secure Comput., pp. 1–18, 2025

work page 2025

[6] [6]

A gan framework for asymmetric embedding costs learning in jpeg steganography,

B. Li, W. Luo, P. Zheng, S. Tan, and J. Huang, “A gan framework for asymmetric embedding costs learning in jpeg steganography,” inProc. ICME, 2025, pp. 1–6

work page 2025

[7] [7]

Robust steganography with boundary-preserving overflow alleviation and adap- tive error correction,

Y . Cheng, Z. Luo, and Z. Yin, “Robust steganography with boundary-preserving overflow alleviation and adap- tive error correction,”Expert Syst. Appl., vol. 281, p. 127598, 2025

work page 2025

[8] [8]

Establishing robust generative image steganography via popular stable diffusion,

X. Hu, S. Li, Q. Ying, W. Peng, X. Zhang, and Z. Qian, “Establishing robust generative image steganography via popular stable diffusion,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 8094–8108, 2024

work page 2024

[9] [9]

Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,

T. Yang, H. Wu, B. Yi, G. Feng, and X. Zhang, “Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,”IEEE Trans. Dependable Secure Comput., vol. 21, no. 1, pp. 139–152, 2023

work page 2023

[10] [10]

A robust coverless video steganography based on the similarity of inter-frames,

L. Meng, X. Jiang, T. Sun, Z. Zhao, and Q. Xu, “A robust coverless video steganography based on the similarity of inter-frames,”IEEE Trans. Multimedia, vol. 26, pp. 5996–6011, 2023

work page 2023

[11] [11]

Ctnet: A convolutional transformer network for color image steganalysis,

K. Wei, W. Luo, S. Tan, and J.-W. Huang, “Ctnet: A convolutional transformer network for color image steganalysis,”J. Comput. Sci. Technol., vol. 40, no. 2, pp. 413–427, 2025

work page 2025

[12] [12]

Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling,

K. Wei, W. Luo, and J. Huang, “Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 9970–9983, 2024

work page 2024

[13] [13]

Residual guided coordinate attention for selection channel aware image steganalysis,

K. Wei, W. Luo, M. Liu, and M. Ye, “Residual guided coordinate attention for selection channel aware image steganalysis,”Multimedia Syst., vol. 29, no. 4, pp. 2125– 2135, 2023

work page 2023

[14] [14]

Color im- age steganography using generative adversarial networks with a phased training strategy,

S. Zhou, M. Ye, W. Luo, X. Liao, and K. Wei, “Color im- age steganography using generative adversarial networks with a phased training strategy,” inProc. IH&MMSec, 2025, pp. 142–152

work page 2025

[15] [15]

Steganography embedding cost learning with generative multi-adversarial network,

D. Huang, W. Luo, M. Liu, W. B. Tang, and J. Huang, “Steganography embedding cost learning with generative multi-adversarial network,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 15–29, 2024

work page 2024

[16] [16]

Double-layered dual-syndrome trellis codes utilizing channel knowledge for robust steganography,

Q. Guan, P. Liu, W. Zhang, W. Lu, and X. Zhang, “Double-layered dual-syndrome trellis codes utilizing channel knowledge for robust steganography,”IEEE Trans. Inf. Forensics Security, vol. 18, pp. 501–516, 2023

work page 2023

[17] [17]

A novel residual- guided learning method for image steganography,

M. Ye, D. Huang, K. Wei, and W. Luo, “A novel residual- guided learning method for image steganography,” in Proc. ICASSP, 2024, pp. 4565–4569

work page 2024

[18] [18]

Derivative-based steganographic distortion and its non- additive extensions for audio,

K. Chen, H. Zhou, W. Li, K. Yang, W. Zhang, and N. Yu, “Derivative-based steganographic distortion and its non- additive extensions for audio,”IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2027–2032, 2019

work page 2027

[19] [19]

Adaptive audio steganog- raphy based on advanced audio coding and syndrome- trellis coding,

W. Luo, Y . Zhang, and H. Li, “Adaptive audio steganog- raphy based on advanced audio coding and syndrome- trellis coding,” inProc. IWDW, 2017, pp. 177–186

work page 2017

[20] [20]

Minimizing addi- tive distortion in steganography using syndrome-trellis codes,

T. Filler, J. Judas, and J. Fridrich, “Minimizing addi- tive distortion in steganography using syndrome-trellis codes,”IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 920–935, 2011

work page 2011

[21] [21]

Efficient audio steganography using generalized audio intrinsic energy with micro-amplitude modification suppression,

W. Su, J. Ni, X. Hu, and B. Li, “Efficient audio steganography using generalized audio intrinsic energy with micro-amplitude modification suppression,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 6559–6572, 2024

work page 2024

[22] [22]

Audio steganography using lsb encoding technique with increased capacity and bit error rate optimization,

S. Roy, J. Parida, A. K. Singh, and A. S. Sairam, “Audio steganography using lsb encoding technique with increased capacity and bit error rate optimization,” in Proc. CSCIT, 2012, pp. 372–376

work page 2012

[23] [23]

Audio steganography using dual randomness lsb method,

J. Vimal and A. M. Alex, “Audio steganography using dual randomness lsb method,” inProc. ICCICCT, 2014, pp. 941–944

work page 2014

[24] [24]

Enhancing lsb using binary message size encoding for high capacity, transparent and secure audio steganography–an innova- tive approach,

M. M. Mahmoud and H. T. Elshoush, “Enhancing lsb using binary message size encoding for high capacity, transparent and secure audio steganography–an innova- tive approach,”IEEE Access, vol. 10, pp. 29 954–29 971, 2022

work page 2022

[25] [25]

Approaching optimal embedding in audio steganography with gan,

J. Yang, H. Zheng, X. Kang, and Y .-Q. Shi, “Approaching optimal embedding in audio steganography with gan,” in Proc. ICASSP, 2020, pp. 2827–2831

work page 2020

[26] [26]

Audio steganog- raphy based on iterative adversarial attacks against con- volutional neural networks,

J. Wu, B. Chen, W. Luo, and Y . Fang, “Audio steganog- raphy based on iterative adversarial attacks against con- volutional neural networks,”IEEE Trans. Inf. Forensics Security, vol. 15, pp. 2282–2294, 2020

work page 2020

[27] [27]

A ro- bust coverless audio steganography based on differential privacy clustering,

Y . Feng, L. Xu, X. Lu, G. Zhang, and W. Rao, “A ro- bust coverless audio steganography based on differential privacy clustering,”IEEE Trans. Multimedia, vol. 27, pp. 5669–5684, 2025

work page 2025

[28] [28]

Cover reproducible steganography via deep genera- tive models,

K. Chen, H. Zhou, Y . Wang, M. Li, W. Zhang, and N. Yu, “Cover reproducible steganography via deep genera- tive models,”IEEE Trans. Dependable Secure Comput., vol. 20, no. 5, pp. 3787–3798, 2022

work page 2022

[29] [29]

Mutual information-optimized steganalysis for generative steganography,

M. Hu and H. Wang, “Mutual information-optimized steganalysis for generative steganography,”IEEE Trans. Inf. Forensics Security, pp. 1852–1865, 2025. 13

work page 2025

[30] [30]

A com- parative study of audio steganography schemes,

F. Hemeida, W. Alexan, and S. Mamdouh, “A com- parative study of audio steganography schemes,”Int. J. Comput. Digit. Syst., vol. 10, pp. 555–562, 2021

work page 2021

[31] [31]

Pixinwav: Residual steganography for hiding pixels in audio,

M. Geleta, C. Punti, K. McGuinness, J. Pons, C. Canton, and X. Giro-i Nieto, “Pixinwav: Residual steganography for hiding pixels in audio,” inProc. ICASSP, 2022, pp. 2485–2489

work page 2022

[32] [32]

Distribution-preserving steganography based on text-to-speech generative models,

K. Chen, H. Zhou, H. Zhao, D. Chen, W. Zhang, and N. Yu, “Distribution-preserving steganography based on text-to-speech generative models,”IEEE Trans. Depend- able Secure Comput., vol. 19, no. 5, pp. 3343–3356, 2021

work page 2021

[33] [33]

Securing fixed neural network steganography,

Z. Luo, S. Li, G. Li, Z. Qian, and X. Zhang, “Securing fixed neural network steganography,” inProc. ACM MM, 2023, pp. 7943–7951

work page 2023

[34] [34]

Cover-separable fixed neural network steganography via deep generative models,

G. Li, S. S. Li, Z. Qian, and X. Zhang, “Cover-separable fixed neural network steganography via deep generative models,” inProc. ACM MM, 2024, pp. 10 238–10 247

work page 2024

[35] [35]

Rfnns: Robust fixed neural network steganography with popular deep generative models,

Y . Cheng, J. Zhou, J. Chen, Z. Yin, and X. Zhang, “Rfnns: Robust fixed neural network steganography with popular deep generative models,”Proc. AAAI, 2026

work page 2026

[36] [36]

Fixed neural network steganography: Train the images, not the network,

V . Kishore, X. Chen, Y . Wang, B. Li, and K. Q. Wein- berger, “Fixed neural network steganography: Train the images, not the network,” inProc. ICLR, 2022

work page 2022

[37] [37]

Evasion attack steganography: Turning vulnerability of machine learning to adversarial attacks into a real-world application,

S. Ghamizi, M. Cordy, M. Papadakis, and Y . Le Traon, “Evasion attack steganography: Turning vulnerability of machine learning to adversarial attacks into a real-world application,” inProc. ICCV, 2021, pp. 31–40

work page 2021

[38] [38]

Audio watermark: Dynamic and harmless watermark for black-box voice dataset copy- right protection,

H. Guo, J. Guo, B. Chen, Y . Wang, X. Chen, H. Huang, Q. Yan, and L. Xiao, “Audio watermark: Dynamic and harmless watermark for black-box voice dataset copy- right protection,” inProc. USENIX Security, 2025, pp. 4601–4620

work page 2025

[39] [39]

Peaq-the itu standard for objective measurement of perceived audio quality,

T. Thiede, W. C. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. G. Beerends, and C. Colomes, “Peaq-the itu standard for objective measurement of perceived audio quality,”J. Audio Eng. Soc., vol. 48, no. 1/2, pp. 3–29, 2000

work page 2000

[40] [40]

Audio steganalysis with convolutional neural network,

B. Chen, W. Luo, and H. Li, “Audio steganalysis with convolutional neural network,” inProc. ACM Workshop Inf. Hiding Multimedia Security (IH&MMSec), 2017, pp. 85–90

work page 2017

[41] [41]

Audio steganalysis with improved convolutional neural network,

Y . Lin, R. Wang, D. Yan, L. Dong, and X. Zhang, “Audio steganalysis with improved convolutional neural network,” inProc. ACM Workshop Inf. Hiding Multime- dia Security (IH&MMSec), 2019, pp. 210–215

work page 2019

[42] [42]

Improved audio steganalytic feature and its applications in audio forensics,

W. Luo, H. Li, Q. Yan, R. Yang, and J. Huang, “Improved audio steganalytic feature and its applications in audio forensics,”ACM Trans. Multimedia Comput., Commun., Appl., vol. 14, no. 2, pp. 1–14, 2018

work page 2018

[43] [43]

Ahcm: Adaptive huffman code mapping for audio steganogra- phy based on psychoacoustic model,

X. Yi, K. Yang, X. Zhao, Y . Wang, and H. Yu, “Ahcm: Adaptive huffman code mapping for audio steganogra- phy based on psychoacoustic model,”IEEE Trans. Inf. Forensics Security, vol. 14, no. 8, pp. 2217–2231, 2019

work page 2019

[44] [44]

Hifi- stego: A high-fidelity embedding audio steganography based on audio features decoupling,

S. Zhang, B. Tian, Y . Gao, X. Liu, and W. Yang, “Hifi- stego: A high-fidelity embedding audio steganography based on audio features decoupling,”IEEE Trans. Audio, Speech, Lang. Process., vol. 33, pp. 2032–2044, 2025

work page 2032

[45] [45]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”Proc. ICLR, 2015

work page 2015

[46] [46]

Wet paper codes with improved embedding efficiency,

J. Fridrich, M. Goljan, and D. Soukal, “Wet paper codes with improved embedding efficiency,”IEEE Trans. Inf. Forensics Security, vol. 1, no. 1, pp. 102–110, 2006

work page 2006