pith. sign in

arxiv: 2505.22266 · v3 · submitted 2025-05-28 · 💻 cs.SD · cs.MM· eess.AS

FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

Pith reviewed 2026-05-19 13:32 UTC · model grok-4.3

classification 💻 cs.SD cs.MMeess.AS
keywords audio steganographyadversarial perturbationsfixed decoder networkanti-steganalysisrobustnessPSNR gainsecret message extraction
0
0 comments X

The pith

FGAS embeds secret messages as adversarial perturbations in audio using a shared fixed decoder network to improve quality and security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FGAS, a method for audio steganography that generates adversarial perturbations to carry secret messages and embeds them directly into cover audio. The receiver uses only the structure and key of a fixed decoder network to extract the message without any per-instance retraining. This setup keeps the resulting stego audio close to the original in both sound quality and statistical properties. A reader would care because it lowers computational demands while raising the bar for detection and surviving routine audio edits.

Core claim

The paper claims that adversarial perturbations optimized to carry the secret message can be added to cover audio so that a lightweight fixed decoder, whose parameters are shared in advance, extracts the message reliably from the stego audio while the perturbations preserve perceptual fidelity and statistical similarity to the cover.

What carries the argument

Audio Adversarial Perturbation Generation (A2PG) strategy together with an optional robust extension and a lightweight fixed decoder network.

If this is right

  • Stego audio shows an average PSNR gain exceeding 10 dB over existing state-of-the-art methods.
  • The scheme remains robust under common audio processing operations including compression and added noise.
  • Anti-steganalysis resistance improves, producing classification error rates roughly 2 percent higher than prior methods at high embedding capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-time hiding applications become more practical because the decoder never needs to be retrained for each new cover.
  • The fixed-decoder pattern could be tested for hiding data inside other media such as video or sensor streams.
  • Performance against newer machine-learning steganalysis tools would be a natural next measurement.

Load-bearing premise

Adversarial perturbations can be generated to embed the secret message while remaining close enough to the cover audio in perception and statistics for the fixed decoder to extract it accurately without retraining or noticeable changes.

What would settle it

Demonstrating that extraction accuracy falls below reliable levels on new audio samples without decoder retraining, or that steganalysis classifiers reach error rates no higher than current methods, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2505.22266 by Jialin Yan, Shilin Wang, Tanfeng Sun, Xinghao Jiang, Xinpeng Zhang, Yu Cheng, Zhaoxia Yin.

Figure 1
Figure 1. Figure 1: Motivations and the proposed framework of FGAS: (a) The com [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FGAS scheme: (a): Alice (The Sender) uses the proposed audio Adversarial Perturbation Generation (A [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed structure of fixed decoder network and the process of perturbation decoding into secret message. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The impact of ϵ on detection accuracy and recovery accuracy. Note that for ϵ ≤ 8 × 10−3 , the PSNR exceeds 100 dB and PEAQ reaches 4.54, ensuring superior imperceptibility. IV. EXPERIMENT A. Experiment Setups 1) Implementation Details: The experiments are conducted on four diverse datasets: TIMIT1 , LJSpeech2 , GTZAN3 , and Audioset4 . We collect 3,000 mono, 16-bit audio clips from each dataset, all of whi… view at source ↗
Figure 3
Figure 3. Figure 3: After determining the design of the decoder network De(·) using this method and employing the shared key kr , both the sender and receiver can independently construct an identical decoder network. This method significantly reduces the amount of information exchange required, thus enhancing both the imperceptibility and the anti-steganalysis performance of the steganographic system. TABLE II PERFORMANCE DIM… view at source ↗
Figure 5
Figure 5. Figure 5: Time domain waveforms of audio clips and perturbations on the [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Frequency domain waveforms of audio clips and perturbations on the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Energy Maps of audio clips and perturbations on the TIMIT dataset: [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

The rapid development of Artificial Intelligence Generated Content (AIGC) has made high-fidelity generated audio widely available across the Internet, driving the advancement of audio steganography. Benefiting from advances in deep learning, current audio steganography schemes are mainly based on encoder-decoder network architectures. While these methods guarantee a certain level of perceptual quality for stego audio, they typically face high computational cost and long implementation time, as well as poor anti-steganalysis performance. To address the aforementioned issues, we pioneer a Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation (FGAS). Adversarial perturbations carrying a secret message are embedded into the cover audio to generate stego audio. The receiver only needs to share the structure and key of the fixed decoder network to accurately extract the secret message from the stego audio. In FGAS, we propose an Audio Adversarial Perturbation Generation (A2PG) strategy with an optional robust extension and design a lightweight fixed decoder. The fixed decoder guarantees reliable extraction of the hidden message, while adversarial perturbations are optimized to keep the stego audio perceptually and statistically close to the cover audio, thereby improving anti-steganalysis performance. The experimental results show that FGAS significantly improves stego audio quality, achieving an average PSNR gain of over 10 dB compared to SOTA methods. Furthermore, FGAS demonstrates strong robustness against common audio processing attacks. Moreover, FGAS exhibits superior anti-steganalysis performance across different relative payloads; under high-capacity embedding, it achieves a classification error rate about 2% higher, indicating stronger anti-steganalysis performance than current SOTA methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes FGAS, a fixed-decoder audio steganography scheme that generates adversarial perturbations via an A2PG strategy to embed secret messages into cover audio. A single lightweight decoder (shared by structure and key) extracts the message at the receiver. The method includes an optional robust extension against post-embedding attacks. Experiments are reported to show >10 dB average PSNR improvement over SOTA, strong robustness to common audio processing, and superior anti-steganalysis (approximately 2% higher classification error under high-capacity embedding).

Significance. If the central claims hold, the fixed-decoder plus adversarial-perturbation design would represent a meaningful departure from conventional encoder-decoder audio steganography by eliminating per-instance retraining while simultaneously improving perceptual quality and statistical undetectability. The approach could lower computational cost for high-fidelity AIGC audio hiding and strengthen practical security against steganalysis.

major comments (3)
  1. [§3.2] §3.2 (A2PG optimization): The manuscript provides no quantitative evidence (success rate, histogram of achieved perturbation norms, or failure cases) that the perturbation search consistently locates a solution inside the imperceptibility ball for arbitrary message-cover pairs. Because the fixed decoder is never retrained per instance, any non-negligible fraction of failures would force larger perturbations, directly contradicting the reported >10 dB average PSNR gain and the assumption of reliable extraction.
  2. [§4] §4 (Experimental protocol): No description is given of the audio corpora (sampling rate, duration, number of files, train/test split), the precise SOTA baselines, the optimization hyperparameters of A2PG, or any statistical tests (confidence intervals, p-values) supporting the PSNR, robustness, and classification-error claims. These omissions make it impossible to assess whether the headline numerical improvements are robust or conditioned on post-hoc selection.
  3. [§3.3] §3.3 (fixed decoder): The claim that the lightweight fixed decoder “guarantees reliable extraction” is not supported by capacity analysis, bit-error-rate curves versus payload, or ablation on decoder depth/width. Without such data it is unclear whether the reported anti-steganalysis advantage stems from the perturbation design or simply from a decoder that tolerates only low-capacity messages.
minor comments (2)
  1. [Abstract] The abstract and §1 use “relative payloads” without an explicit definition or formula; a short clarifying sentence would improve readability.
  2. [Figures] Several figure captions omit units on the y-axis or fail to state the number of averaged trials; this affects interpretability of the PSNR and steganalysis plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (A2PG optimization): The manuscript provides no quantitative evidence (success rate, histogram of achieved perturbation norms, or failure cases) that the perturbation search consistently locates a solution inside the imperceptibility ball for arbitrary message-cover pairs. Because the fixed decoder is never retrained per instance, any non-negligible fraction of failures would force larger perturbations, directly contradicting the reported >10 dB average PSNR gain and the assumption of reliable extraction.

    Authors: We thank the referee for this important observation. The >10 dB average PSNR improvement reported in our experiments is the result of successful A2PG optimizations that kept perturbations within the imperceptibility constraints for the message-cover pairs evaluated. To provide explicit quantitative support for the consistency of the search process, we will add success rates, histograms of perturbation norms, and discussion of any edge cases in the revised manuscript. revision: yes

  2. Referee: [§4] §4 (Experimental protocol): No description is given of the audio corpora (sampling rate, duration, number of files, train/test split), the precise SOTA baselines, the optimization hyperparameters of A2PG, or any statistical tests (confidence intervals, p-values) supporting the PSNR, robustness, and classification-error claims. These omissions make it impossible to assess whether the headline numerical improvements are robust or conditioned on post-hoc selection.

    Authors: We agree that the experimental section requires substantially more detail for reproducibility and to allow proper evaluation of the results. In the revised manuscript we will expand §4 with complete descriptions of the audio corpora (sampling rates, durations, file counts, and splits), the exact SOTA baselines implemented, the A2PG optimization hyperparameters, and appropriate statistical tests including confidence intervals for the key metrics. revision: yes

  3. Referee: [§3.3] §3.3 (fixed decoder): The claim that the lightweight fixed decoder “guarantees reliable extraction” is not supported by capacity analysis, bit-error-rate curves versus payload, or ablation on decoder depth/width. Without such data it is unclear whether the reported anti-steganalysis advantage stems from the perturbation design or simply from a decoder that tolerates only low-capacity messages.

    Authors: The lightweight fixed decoder is designed to achieve reliable extraction at the payload capacities used throughout our experiments, and the anti-steganalysis gains arise from the joint effect of the adversarial perturbations and this decoder. To strengthen the supporting evidence, we will add capacity analysis, bit-error-rate curves versus payload, and ablation studies varying decoder depth and width in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes FGAS as a new architecture combining a fixed lightweight decoder with an A2PG optimization strategy for embedding messages via adversarial perturbations. The headline performance claims (PSNR gain >10 dB, robustness to attacks, improved anti-steganalysis error rates) are presented as empirical outcomes of experiments on the implemented method. No equations, parameter-fitting steps, or self-citations in the abstract or method description reduce these results to quantities defined by the inputs or by construction. The perturbation optimization is a core algorithmic component whose success is evaluated externally rather than presupposed by the reported metrics. The derivation chain therefore remains self-contained against external benchmarks and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard deep-learning optimization assumptions and the premise that a fixed decoder can serve as a reliable extractor once its structure is shared; no new physical entities are introduced.

free parameters (1)
  • perturbation strength and optimization hyperparameters
    Chosen to balance message embedding capacity against perceptual and statistical similarity in the A2PG strategy.
axioms (1)
  • domain assumption Adversarial perturbations can be optimized to carry secret messages while preserving audio statistics and perceptual quality sufficiently for reliable extraction by a fixed decoder.
    Invoked in the design of the A2PG strategy and the claim of improved anti-steganalysis performance.

pith-pipeline@v0.9.0 · 5859 in / 1288 out tokens · 46888 ms · 2026-05-19T13:32:09.208915+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Separable reversible data hiding in encrypted images based on systematic polar code and flag bit transmission channel model,

    K. Chen, Q. Guan, W. Zhang, N. Yu, and W. Lu, “Separable reversible data hiding in encrypted images based on systematic polar code and flag bit transmission channel model,”IEEE Trans. Dependable Secure Com- put., vol. 22, no. 6, pp. 6844–6861, 2025

  2. [2]

    Reversible data hiding in encrypted images based on pixel-level masked autoencoder and polar code,

    Z. Cheng, K. Chen, and Q. Guan, “Reversible data hiding in encrypted images based on pixel-level masked autoencoder and polar code,”Signal Process., vol. 226, p. 109664, 2025

  3. [3]

    Provably secure public-key steganography based on elliptic curve cryptography,

    X. Zhang, K. Chen, J. Ding, Y . Yang, W. Zhang, and N. Yu, “Provably secure public-key steganography based on elliptic curve cryptography,”IEEE Trans. Inf. Foren- sics Security, vol. 19, pp. 3148–3163, 2024

  4. [4]

    Rethinking prefix-based steganogra- phy for enhanced security and efficiency,

    C. Pan, D. Hu, Y . Wang, K. Chen, Y . Peng, X. Rong, C. Gu, and M. Li, “Rethinking prefix-based steganogra- phy for enhanced security and efficiency,”IEEE Trans. Inf. Forensics Security, vol. 20, pp. 3287–3301, 2025

  5. [5]

    Non-binary polar codes for steganography,

    Q. Guan, K. Chen, W. Lu, W. Zhang, and N. Yu, “Non-binary polar codes for steganography,”IEEE Trans. Dependable Secure Comput., pp. 1–18, 2025

  6. [6]

    A gan framework for asymmetric embedding costs learning in jpeg steganography,

    B. Li, W. Luo, P. Zheng, S. Tan, and J. Huang, “A gan framework for asymmetric embedding costs learning in jpeg steganography,” inProc. ICME, 2025, pp. 1–6

  7. [7]

    Robust steganography with boundary-preserving overflow alleviation and adap- tive error correction,

    Y . Cheng, Z. Luo, and Z. Yin, “Robust steganography with boundary-preserving overflow alleviation and adap- tive error correction,”Expert Syst. Appl., vol. 281, p. 127598, 2025

  8. [8]

    Establishing robust generative image steganography via popular stable diffusion,

    X. Hu, S. Li, Q. Ying, W. Peng, X. Zhang, and Z. Qian, “Establishing robust generative image steganography via popular stable diffusion,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 8094–8108, 2024

  9. [9]

    Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,

    T. Yang, H. Wu, B. Yi, G. Feng, and X. Zhang, “Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,”IEEE Trans. Dependable Secure Comput., vol. 21, no. 1, pp. 139–152, 2023

  10. [10]

    A robust coverless video steganography based on the similarity of inter-frames,

    L. Meng, X. Jiang, T. Sun, Z. Zhao, and Q. Xu, “A robust coverless video steganography based on the similarity of inter-frames,”IEEE Trans. Multimedia, vol. 26, pp. 5996–6011, 2023

  11. [11]

    Ctnet: A convolutional transformer network for color image steganalysis,

    K. Wei, W. Luo, S. Tan, and J.-W. Huang, “Ctnet: A convolutional transformer network for color image steganalysis,”J. Comput. Sci. Technol., vol. 40, no. 2, pp. 413–427, 2025

  12. [12]

    Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling,

    K. Wei, W. Luo, and J. Huang, “Color image steganalysis based on pixel difference convolution and enhanced transformer with selective pooling,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 9970–9983, 2024

  13. [13]

    Residual guided coordinate attention for selection channel aware image steganalysis,

    K. Wei, W. Luo, M. Liu, and M. Ye, “Residual guided coordinate attention for selection channel aware image steganalysis,”Multimedia Syst., vol. 29, no. 4, pp. 2125– 2135, 2023

  14. [14]

    Color im- age steganography using generative adversarial networks with a phased training strategy,

    S. Zhou, M. Ye, W. Luo, X. Liao, and K. Wei, “Color im- age steganography using generative adversarial networks with a phased training strategy,” inProc. IH&MMSec, 2025, pp. 142–152

  15. [15]

    Steganography embedding cost learning with generative multi-adversarial network,

    D. Huang, W. Luo, M. Liu, W. B. Tang, and J. Huang, “Steganography embedding cost learning with generative multi-adversarial network,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 15–29, 2024

  16. [16]

    Double-layered dual-syndrome trellis codes utilizing channel knowledge for robust steganography,

    Q. Guan, P. Liu, W. Zhang, W. Lu, and X. Zhang, “Double-layered dual-syndrome trellis codes utilizing channel knowledge for robust steganography,”IEEE Trans. Inf. Forensics Security, vol. 18, pp. 501–516, 2023

  17. [17]

    A novel residual- guided learning method for image steganography,

    M. Ye, D. Huang, K. Wei, and W. Luo, “A novel residual- guided learning method for image steganography,” in Proc. ICASSP, 2024, pp. 4565–4569

  18. [18]

    Derivative-based steganographic distortion and its non- additive extensions for audio,

    K. Chen, H. Zhou, W. Li, K. Yang, W. Zhang, and N. Yu, “Derivative-based steganographic distortion and its non- additive extensions for audio,”IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2027–2032, 2019

  19. [19]

    Adaptive audio steganog- raphy based on advanced audio coding and syndrome- trellis coding,

    W. Luo, Y . Zhang, and H. Li, “Adaptive audio steganog- raphy based on advanced audio coding and syndrome- trellis coding,” inProc. IWDW, 2017, pp. 177–186

  20. [20]

    Minimizing addi- tive distortion in steganography using syndrome-trellis codes,

    T. Filler, J. Judas, and J. Fridrich, “Minimizing addi- tive distortion in steganography using syndrome-trellis codes,”IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 920–935, 2011

  21. [21]

    Efficient audio steganography using generalized audio intrinsic energy with micro-amplitude modification suppression,

    W. Su, J. Ni, X. Hu, and B. Li, “Efficient audio steganography using generalized audio intrinsic energy with micro-amplitude modification suppression,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 6559–6572, 2024

  22. [22]

    Audio steganography using lsb encoding technique with increased capacity and bit error rate optimization,

    S. Roy, J. Parida, A. K. Singh, and A. S. Sairam, “Audio steganography using lsb encoding technique with increased capacity and bit error rate optimization,” in Proc. CSCIT, 2012, pp. 372–376

  23. [23]

    Audio steganography using dual randomness lsb method,

    J. Vimal and A. M. Alex, “Audio steganography using dual randomness lsb method,” inProc. ICCICCT, 2014, pp. 941–944

  24. [24]

    Enhancing lsb using binary message size encoding for high capacity, transparent and secure audio steganography–an innova- tive approach,

    M. M. Mahmoud and H. T. Elshoush, “Enhancing lsb using binary message size encoding for high capacity, transparent and secure audio steganography–an innova- tive approach,”IEEE Access, vol. 10, pp. 29 954–29 971, 2022

  25. [25]

    Approaching optimal embedding in audio steganography with gan,

    J. Yang, H. Zheng, X. Kang, and Y .-Q. Shi, “Approaching optimal embedding in audio steganography with gan,” in Proc. ICASSP, 2020, pp. 2827–2831

  26. [26]

    Audio steganog- raphy based on iterative adversarial attacks against con- volutional neural networks,

    J. Wu, B. Chen, W. Luo, and Y . Fang, “Audio steganog- raphy based on iterative adversarial attacks against con- volutional neural networks,”IEEE Trans. Inf. Forensics Security, vol. 15, pp. 2282–2294, 2020

  27. [27]

    A ro- bust coverless audio steganography based on differential privacy clustering,

    Y . Feng, L. Xu, X. Lu, G. Zhang, and W. Rao, “A ro- bust coverless audio steganography based on differential privacy clustering,”IEEE Trans. Multimedia, vol. 27, pp. 5669–5684, 2025

  28. [28]

    Cover reproducible steganography via deep genera- tive models,

    K. Chen, H. Zhou, Y . Wang, M. Li, W. Zhang, and N. Yu, “Cover reproducible steganography via deep genera- tive models,”IEEE Trans. Dependable Secure Comput., vol. 20, no. 5, pp. 3787–3798, 2022

  29. [29]

    Mutual information-optimized steganalysis for generative steganography,

    M. Hu and H. Wang, “Mutual information-optimized steganalysis for generative steganography,”IEEE Trans. Inf. Forensics Security, pp. 1852–1865, 2025. 13

  30. [30]

    A com- parative study of audio steganography schemes,

    F. Hemeida, W. Alexan, and S. Mamdouh, “A com- parative study of audio steganography schemes,”Int. J. Comput. Digit. Syst., vol. 10, pp. 555–562, 2021

  31. [31]

    Pixinwav: Residual steganography for hiding pixels in audio,

    M. Geleta, C. Punti, K. McGuinness, J. Pons, C. Canton, and X. Giro-i Nieto, “Pixinwav: Residual steganography for hiding pixels in audio,” inProc. ICASSP, 2022, pp. 2485–2489

  32. [32]

    Distribution-preserving steganography based on text-to-speech generative models,

    K. Chen, H. Zhou, H. Zhao, D. Chen, W. Zhang, and N. Yu, “Distribution-preserving steganography based on text-to-speech generative models,”IEEE Trans. Depend- able Secure Comput., vol. 19, no. 5, pp. 3343–3356, 2021

  33. [33]

    Securing fixed neural network steganography,

    Z. Luo, S. Li, G. Li, Z. Qian, and X. Zhang, “Securing fixed neural network steganography,” inProc. ACM MM, 2023, pp. 7943–7951

  34. [34]

    Cover-separable fixed neural network steganography via deep generative models,

    G. Li, S. S. Li, Z. Qian, and X. Zhang, “Cover-separable fixed neural network steganography via deep generative models,” inProc. ACM MM, 2024, pp. 10 238–10 247

  35. [35]

    Rfnns: Robust fixed neural network steganography with popular deep generative models,

    Y . Cheng, J. Zhou, J. Chen, Z. Yin, and X. Zhang, “Rfnns: Robust fixed neural network steganography with popular deep generative models,”Proc. AAAI, 2026

  36. [36]

    Fixed neural network steganography: Train the images, not the network,

    V . Kishore, X. Chen, Y . Wang, B. Li, and K. Q. Wein- berger, “Fixed neural network steganography: Train the images, not the network,” inProc. ICLR, 2022

  37. [37]

    Evasion attack steganography: Turning vulnerability of machine learning to adversarial attacks into a real-world application,

    S. Ghamizi, M. Cordy, M. Papadakis, and Y . Le Traon, “Evasion attack steganography: Turning vulnerability of machine learning to adversarial attacks into a real-world application,” inProc. ICCV, 2021, pp. 31–40

  38. [38]

    Audio watermark: Dynamic and harmless watermark for black-box voice dataset copy- right protection,

    H. Guo, J. Guo, B. Chen, Y . Wang, X. Chen, H. Huang, Q. Yan, and L. Xiao, “Audio watermark: Dynamic and harmless watermark for black-box voice dataset copy- right protection,” inProc. USENIX Security, 2025, pp. 4601–4620

  39. [39]

    Peaq-the itu standard for objective measurement of perceived audio quality,

    T. Thiede, W. C. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. G. Beerends, and C. Colomes, “Peaq-the itu standard for objective measurement of perceived audio quality,”J. Audio Eng. Soc., vol. 48, no. 1/2, pp. 3–29, 2000

  40. [40]

    Audio steganalysis with convolutional neural network,

    B. Chen, W. Luo, and H. Li, “Audio steganalysis with convolutional neural network,” inProc. ACM Workshop Inf. Hiding Multimedia Security (IH&MMSec), 2017, pp. 85–90

  41. [41]

    Audio steganalysis with improved convolutional neural network,

    Y . Lin, R. Wang, D. Yan, L. Dong, and X. Zhang, “Audio steganalysis with improved convolutional neural network,” inProc. ACM Workshop Inf. Hiding Multime- dia Security (IH&MMSec), 2019, pp. 210–215

  42. [42]

    Improved audio steganalytic feature and its applications in audio forensics,

    W. Luo, H. Li, Q. Yan, R. Yang, and J. Huang, “Improved audio steganalytic feature and its applications in audio forensics,”ACM Trans. Multimedia Comput., Commun., Appl., vol. 14, no. 2, pp. 1–14, 2018

  43. [43]

    Ahcm: Adaptive huffman code mapping for audio steganogra- phy based on psychoacoustic model,

    X. Yi, K. Yang, X. Zhao, Y . Wang, and H. Yu, “Ahcm: Adaptive huffman code mapping for audio steganogra- phy based on psychoacoustic model,”IEEE Trans. Inf. Forensics Security, vol. 14, no. 8, pp. 2217–2231, 2019

  44. [44]

    Hifi- stego: A high-fidelity embedding audio steganography based on audio features decoupling,

    S. Zhang, B. Tian, Y . Gao, X. Liu, and W. Yang, “Hifi- stego: A high-fidelity embedding audio steganography based on audio features decoupling,”IEEE Trans. Audio, Speech, Lang. Process., vol. 33, pp. 2032–2044, 2025

  45. [45]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”Proc. ICLR, 2015

  46. [46]

    Wet paper codes with improved embedding efficiency,

    J. Fridrich, M. Goljan, and D. Soukal, “Wet paper codes with improved embedding efficiency,”IEEE Trans. Inf. Forensics Security, vol. 1, no. 1, pp. 102–110, 2006