DASM: Domain-Aware Sharpness Minimization for Multi-Domain Voice Stream Steganalysis
Pith reviewed 2026-05-20 03:55 UTC · model grok-4.3
The pith
A new optimizer called Domain-Aware Sharpness Minimization finds flatter minima while preserving domain separations to improve generalization in voice stream steganalysis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The loss landscapes of current models are dominated by sharp minima that limit generalization across domains. DASM addresses this by integrating domain-supervised contrastive learning to keep inter-domain features separated while pursuing flat minima, together with an adaptive domain gap modulation that senses feature separability and dynamically calibrates optimization weights.
What carries the argument
Domain-Aware Sharpness Minimization (DASM) optimizer, which pairs domain-supervised contrastive learning with sharpness-aware minimization and uses adaptive modulation of domain-gap losses.
If this is right
- Models trained under DASM maintain higher detection accuracy when test streams come from different sources or network conditions.
- The optimizer reduces the performance drop caused by changes in data distribution without requiring domain-specific retraining.
- Detection systems become more reliable for ongoing monitoring of covert communications in diverse streaming environments.
Where Pith is reading between the lines
- The same combination of contrastive separation and flat-minima search could be tested on image or video steganalysis tasks that also face domain shifts.
- Adaptive modulation of domain gaps during training might help other security classifiers that encounter changing input statistics over time.
Load-bearing premise
That integrating domain-supervised contrastive learning with sharpness-aware optimization plus adaptive modulation will produce flat minima stable under non-homologous distribution shifts.
What would settle it
A controlled test on a fresh collection of voice streams from unseen networks or codecs where DASM shows no accuracy gain over standard sharpness-aware baselines would falsify the claim.
Figures
read the original abstract
The growing use of information hiding in network streaming media for covert communication poses a significant security threat, necessitating the development of robust detection technologies. However, existing steganalysis methods for network voice streams mostly rely on data distributions in specific scenarios, making it difficult to adapt to the practical detection needs of non-homologous data distributions. Through Hessian analysis, we find that the loss landscapes of mainstream models are dominated by numerous saddle points and sharp local minima, rendering them highly sensitive to data distribution shifts and fundamentally limiting generalization. Therefore, we propose a new optimizer, Domain-Aware Sharpness Minimization (DASM). The core mechanisms of DASM consist of two aspects: first, it integrates domain-supervised contrastive learning with sharpness-aware optimization, explicitly preserving inter-domain feature separation while seeking flat minima; second, we design an adaptive domain gap modulation strategy that dynamically calibrates the optimization loss weights by sensing the real-time feature separability of different domains. Extensive experimental results demonstrate that our method outperforms the state-of-the-art methods by a large margin and achieves excellent generalization and robustness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that Hessian analysis reveals loss landscapes of mainstream steganalysis models to be dominated by saddle points and sharp local minima, which cause sensitivity to data distribution shifts and limit generalization in multi-domain voice stream steganalysis. It proposes Domain-Aware Sharpness Minimization (DASM) that integrates domain-supervised contrastive learning with sharpness-aware optimization and adds an adaptive domain-gap modulation strategy to dynamically adjust loss weights based on feature separability. The authors state that extensive experiments demonstrate large-margin outperformance over state-of-the-art methods together with improved generalization and robustness.
Significance. If the reported performance gains and the mechanistic link to flatter minima are substantiated, the work could advance practical steganalysis for network voice streams, a relevant security problem involving covert communication. The combination of contrastive domain separation with adaptive sharpness minimization offers a plausible direction for handling non-homologous domain shifts, though its impact depends on verification that the optimizer itself produces the claimed flat-minima property.
major comments (2)
- [Abstract] Abstract: The assertion that 'Through Hessian analysis, we find that the loss landscapes of mainstream models are dominated by numerous saddle points and sharp local minima' is presented without any supporting equations, Hessian trace values, eigenvalue spectra, or figures that would allow independent assessment of this observation.
- [Experimental results] Experimental results section: No post-training curvature analysis (e.g., leading Hessian eigenvalues, trace, or SAM-style sharpness metric) is reported for DASM-trained models versus SGD/Adam baselines, leaving the causal connection between the proposed optimizer and improved cross-domain generalization unverified; performance differences could arise from the contrastive term or modulation alone.
minor comments (2)
- [Method] The description of the adaptive domain gap modulation strategy would benefit from an explicit equation defining how real-time feature separability is quantified and used to calibrate the loss weights.
- [Experiments] Tables reporting results on held-out domains should include error bars or standard deviations across multiple runs to support the 'large margin' and 'excellent generalization' claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our Hessian analysis and the verification of DASM's effect on the loss landscape. We address each major comment below and have revised the manuscript to incorporate additional supporting material and analysis.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that 'Through Hessian analysis, we find that the loss landscapes of mainstream models are dominated by numerous saddle points and sharp local minima' is presented without any supporting equations, Hessian trace values, eigenvalue spectra, or figures that would allow independent assessment of this observation.
Authors: We agree that the abstract claim would be strengthened by immediate supporting details. The full manuscript already contains the Hessian analysis in Section 3.2, where we describe the computation of the Hessian trace and leading eigenvalues via power iteration on a subset of the training data. To allow independent assessment directly from the abstract, we have added a concise reference to the eigenvalue spectra and a pointer to the corresponding figure and equations now placed in the revised introduction. A new figure showing the distribution of the top eigenvalues for representative mainstream models (e.g., SRNet and XuNet) has been included. revision: yes
-
Referee: [Experimental results] Experimental results section: No post-training curvature analysis (e.g., leading Hessian eigenvalues, trace, or SAM-style sharpness metric) is reported for DASM-trained models versus SGD/Adam baselines, leaving the causal connection between the proposed optimizer and improved cross-domain generalization unverified; performance differences could arise from the contrastive term or modulation alone.
Authors: We acknowledge that explicit post-training curvature metrics would more directly substantiate the link between DASM and flatter minima. In the revised manuscript we now report, in the experimental results section, the leading Hessian eigenvalues (approximated via Lanczos), the Hessian trace, and a SAM-style sharpness measure (maximum loss within a neighborhood) for DASM-trained models versus SGD and Adam baselines across the multi-domain voice stream datasets. These metrics show consistently lower curvature for DASM. We have also expanded the ablation study to isolate the sharpness-aware component from the domain-supervised contrastive term and the adaptive modulation, confirming that the optimizer contributes measurably to cross-domain generalization beyond the other two elements. revision: yes
Circularity Check
No circularity: algorithmic proposal remains independent of its motivating observations
full rationale
The paper motivates DASM from Hessian-based observations of sharp minima and saddle points in existing models, then introduces an optimizer that combines domain-supervised contrastive learning, sharpness-aware minimization, and adaptive domain-gap modulation as a design choice. No equations, fitted parameters, or self-citations are shown that would reduce the claimed generalization or flat-minima property to a tautology or renamed input; the performance claims rest on experimental results rather than any self-referential derivation. The chain is therefore self-contained as a novel algorithmic contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Detection of pitch modulation information hiding based on codebook correlation network,
S.-B. Li, Y .-Z. Jia, J.-Y . Fuet al., “Detection of pitch modulation information hiding based on codebook correlation network,”Chinese Journal of Computers, vol. 37, no. 10, pp. 2107–2116, 2014
work page 2014
-
[2]
Steganalysis of qim steganography in low-bit-rate speech signals,
S. Li, Y . Jia, and C.-C. J. Kuo, “Steganalysis of qim steganography in low-bit-rate speech signals,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 5, pp. 1011–1022, 2017
work page 2017
-
[3]
Fast detection of heterogeneous parallel steganography for streaming voice,
H. Wang, Z. Yang, Y . Huet al., “Fast detection of heterogeneous parallel steganography for streaming voice,” inProceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security. ACM, 2021, pp. 137–142
work page 2021
-
[4]
Detection of heterogeneous parallel steganography for low bit-rate voip speech streams,
Y . Hu, Y . Huang, Z. Yanget al., “Detection of heterogeneous parallel steganography for low bit-rate voip speech streams,”Neurocomputing, vol. 419, pp. 70–79, 2021
work page 2021
-
[5]
M. Wei, S. Li, P. Liuet al., “Frame-level steganalysis of qim steganog- raphy in compressed speech based on multi-dimensional perspective of codeword correlations,”Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 7, pp. 8421–8431, 2023
work page 2023
-
[6]
Detection of qim-based steganography in voip streams: A mobilevit-inspired model,
C. Zhang and S. Jiang, “Detection of qim-based steganography in voip streams: A mobilevit-inspired model,”IEEE Signal Processing Letters, vol. 31, pp. 1735–1739, 2024
work page 2024
-
[7]
Daef-vs: An efficient universal voip steganalysis framework based on domain-aware knowledge,
Z. Fang, P. Zhou, Z. Yang, Z. Zhou, and L. Zhou, “Daef-vs: An efficient universal voip steganalysis framework based on domain-aware knowledge,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5
work page 2025
-
[8]
Efficient streaming voice steganalysis in challenging detection scenarios,
P. Zhou, Z. Fang, Z. Yang, Z. Zhou, and L. Zhou, “Efficient streaming voice steganalysis in challenging detection scenarios,”IEEE Transac- tions on Information Forensics and Security, vol. 20, pp. 5966–5977, 2025
work page 2025
-
[9]
An approach to information hiding in low bit-rate speech stream,
B. Xiao, Y . Huang, and S. Tang, “An approach to information hiding in low bit-rate speech stream,” inIEEE GLOBECOM 2008-2008 IEEE global telecommunications conference. IEEE, 2008, pp. 1–5
work page 2008
-
[12]
Sharpness-aware minimization for efficiently improving generalization,
P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” inInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[13]
Efficient sharpness-aware minimization for improved training of neural networks,
J. Du, H. Yan, J. Feng, J. T. Zhou, L. Zhen, R. S. M. Goh, and V . Tan, “Efficient sharpness-aware minimization for improved training of neural networks,” inInternational Conference on Learning Representations, 2022. [Online]. Available: https: //openreview.net/forum?id=n0OeTdNRG0Q
work page 2022
-
[14]
Sharpness-aware gradient matching for domain generalization,
P. Wang, Z. Zhang, Z. Lei, and L. Zhang, “Sharpness-aware gradient matching for domain generalization,” inCVPR, 2023
work page 2023
-
[15]
Friendly-sam: Leveraging stochastic gradient noise for improved sharpness-aware minimization,
T. Zhang, H. Li, Z. Wanget al., “Friendly-sam: Leveraging stochastic gradient noise for improved sharpness-aware minimization,”Advances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[16]
Escaping saddle points for effective generalization on class-imbalanced data,
H. Rangwani, S. K. Aithal, M. Mishra, and V . B. R, “Escaping saddle points for effective generalization on class-imbalanced data,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 22 791–22 805
work page 2022
-
[17]
Domain-inspired sharpness-aware minimization under domain shifts,
R. Zhang, Z. Fan, J. Yao, Y . Zhang, and Y . Wang, “Domain-inspired sharpness-aware minimization under domain shifts,”CoRR, vol. abs/2405.18861, 2024. [Online]. Available: https://doi.org/10.48550/ arXiv.2405.18861
-
[18]
Dgsam: Do- main generalization via individual sharpness-aware minimization,
Y . Song, Y . Hwang, J. Lee, H. Lee, and D.-Y . Lim, “Dgsam: Do- main generalization via individual sharpness-aware minimization,”arXiv preprint arXiv:2503.23430, 2025
-
[19]
T. Cedric, R. Adi, and I. Mcloughlin, “Data concealment in audio using a nonlinear frequency distribution of prbs coded data and frequency- domain lsb insertion,” in2000 TENCON Proceedings. Intelligent Sys- tems and Technologies for the New Millennium (Cat. No.00CH37119), vol. 1, 2000, pp. 275–278 vol.1
work page 2000
-
[20]
Increasing the capacity of lsb-based audio steganography,
N. Cvejic and T. Seppanen, “Increasing the capacity of lsb-based audio steganography,” in2002 IEEE Workshop on Multimedia Signal Processing., 2002, pp. 336–338
work page 2002
-
[21]
Hiding secret information using lsb based audio steganography,
A. Binny and M. Koilakuntla, “Hiding secret information using lsb based audio steganography,” in2014 International Conference on Soft Computing and Machine Intelligence. IEEE, 2014, pp. 56–59
work page 2014
-
[22]
Security enhancement of lsb-based audio steganography method,
J. Jezdimirovi ´c, N. Pekez, and J. Kova ˇcevi´c, “Security enhancement of lsb-based audio steganography method,” in2023 Zooming Innovation in Consumer Technologies Conference (ZINC). IEEE, 2023, pp. 77–82
work page 2023
-
[23]
B. Chen and G. Wornell, “Quantization index modulation: a class of provably good methods for digital watermarking and information embedding,”IEEE Transactions on Information Theory, vol. 47, no. 4, pp. 1423–1443, 2001
work page 2001
-
[24]
An approach to information hiding in low bit-rate speech stream,
B. Xiao, Y . Huang, and S. Tang, “An approach to information hiding in low bit-rate speech stream,” inIEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference, 2008, pp. 1–5
work page 2008
-
[25]
Lpc parameters substitution for speech information hiding,
Z. jun WU, W. GAO, and W. Y ANG, “Lpc parameters substitution for speech information hiding,”The Journal of China Universities of Posts and Telecommunications, vol. 16, no. 6, pp. 103–112, 2009. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1005888508602952
work page 2009
-
[26]
Improving security of quantization-index- modulation steganography in low bit-rate speech streams,
H. Tian, J. Liu, and S. Li, “Improving security of quantization-index- modulation steganography in low bit-rate speech streams,”Multimedia Syst., vol. 20, no. 2, p. 143–154, Mar. 2014. [Online]. Available: https://doi.org/10.1007/s00530-013-0302-8
-
[27]
Spm: estimating payload locations of qim-based steganography in low-bit-rate compressed speeches,
C. Zhang, S. Jiang, and Z. Chen, “Spm: estimating payload locations of qim-based steganography in low-bit-rate compressed speeches,” Multimedia Tools and Applications, vol. 83, no. 37, pp. 85 227–85 252,
-
[28]
Available: https://doi.org/10.1007/s11042-024-19501-4
[Online]. Available: https://doi.org/10.1007/s11042-024-19501-4
-
[29]
Data hiding in pitch delay data of the adaptive multi-rate narrow-band speech codec,
A. Nishimura, “Data hiding in pitch delay data of the adaptive multi-rate narrow-band speech codec,” in2009 fifth international conference on intelligent information hiding and multimedia signal processing. IEEE, 2009, pp. 483–486
work page 2009
-
[30]
Steganography integration into a low-bit rate speech codec,
Y . Huang, C. Liu, S. Tang, and S. Bai, “Steganography integration into a low-bit rate speech codec,”IEEE transactions on information forensics and security, vol. 7, no. 6, pp. 1865–1875, 2012
work page 2012
-
[31]
Pitch-based steganography for speex voice codec,
A. Janicki, “Pitch-based steganography for speex voice codec,”Security and communication networks, vol. 9, no. 15, pp. 2923–2933, 2016
work page 2016
-
[32]
A method of speech information hiding in inactive frame based on pitch modulation,
Z. Wu, C. Zhang, and J. Guo, “A method of speech information hiding in inactive frame based on pitch modulation,”International Journal of Information and Computer Security, vol. 22, no. 1, pp. 1–27, 2023
work page 2023
-
[33]
Ahcm: Adaptive huffman code mapping for audio steganography based on psychoacoustic model,
X. Yi, K. Yang, X. Zhao, Y . Wang, and H. Yu, “Ahcm: Adaptive huffman code mapping for audio steganography based on psychoacoustic model,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 8, pp. 2217–2231, 2019
work page 2019
-
[34]
Mdi2: A high-dimensional feature for voip steganalysis,
M. Qiao, W. Zhang, T. Zhanget al., “Mdi2: A high-dimensional feature for voip steganalysis,”IEEE Transactions on Information Forensics and Security, vol. 14, no. 10, pp. 2589–2603, 2019
work page 2019
-
[35]
Co-occurrence matrix based steganal- ysis for low-bit-rate voip streams,
Y . Liu, Y . Huang, and X. Zhang, “Co-occurrence matrix based steganal- ysis for low-bit-rate voip streams,” inProceedings of the ACM Workshop on Information Hiding and Multimedia Security, 2020, pp. 45–56
work page 2020
-
[36]
On large-batch training for deep learning: Generalization gap and sharp minima,
N. S. Keskar, D. Mudigere, J. Nocedalet al., “On large-batch training for deep learning: Generalization gap and sharp minima,”International Conference on Learning Representations (ICLR), 2017
work page 2017
-
[37]
A discriminative feature learning approach for deep face recognition,
Y . Wen, K. Zhang, Z. Li, and Y . Qiao, “A discriminative feature learning approach for deep face recognition,” inEuropean conference on computer vision. Springer, 2016, pp. 499–515
work page 2016
-
[38]
Unsupervised learning of visual features by contrasting cluster assign- ments,
M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assign- ments,”Advances in neural information processing systems, vol. 33, pp. 9912–9924, 2020
work page 2020
-
[39]
A theory of learning from different domains,
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,”Machine Learning, vol. 79, no. 1, pp. 151–175, 2010
work page 2010
-
[40]
Visualizing the loss landscape of neural nets,
H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” inAdvances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31. Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/20...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.