H.265/HEVC Video Steganalysis Based on CU Block Structure Gradients and IPM Mapping
Pith reviewed 2026-05-16 02:18 UTC · model grok-4.3
The pith
The GradIPMFormer network detects CU block structure steganography in H.265/HEVC videos by using gradient maps of coding unit changes combined with intra prediction mode mappings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing a gradient map that explicitly describes changes in CU block structure and combining it with block level mapping of intra prediction modes, the GradIPMFormer network jointly models the structural perturbations introduced by CU block structure steganography, enabling effective detection that outperforms prior methods across different quantization parameters and resolutions.
What carries the argument
GradIPMFormer, an integrated architecture that combines convolutional local embedding with Transformer-based token modeling to capture local CU boundary perturbations and long-range cross-CU structural dependencies.
Load-bearing premise
The gradient map of CU block structure changes combined with IPM mapping will reliably capture structural perturbations from steganography without being masked by normal compression artifacts.
What would settle it
A set of videos where CU-based steganography is applied at rates that produce no measurable drop in detection accuracy below chance level when the same videos are re-compressed without embedding, showing gradients fail to isolate the changes.
Figures
read the original abstract
Existing H.265/HEVC video steganalysis research mainly focuses on detecting the steganography based on motion vectors, intra prediction modes, and transform coefficients. However, there is currently no effective steganalysis method capable of detecting steganography based on Coding Unit (CU) block structure. To address this issue, we propose, for the first time, a H.265/HEVC video steganalysis algorithm based on CU block structure gradients and intra prediction mode mapping. The proposed method first constructs a new gradient map to explicitly describe changes in CU block structure, and combines it with a block level mapping representation of IPM. It can jointly model the structural perturbations introduced by steganography based on CU block structure. Then, we design a novel steganalysis network called GradIPMFormer, whose core innovation is an integrated architecture that combines convolutional local embedding with Transformer-based token modeling to jointly capture local CU boundary perturbations and long-range cross-CU structural dependencies, thereby effectively enhancing the capability to perceive CU block structure embedding. Experimental results show that under different quantization parameters and resolution settings, the proposed method consistently achieves superior detection performance across multiple steganography methods based on CU block structure. This study provides a new CU block structure steganalysis paradigm for H.265/HEVC and has significant research value for covert communication security detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the first steganalysis method targeting CU block structure modifications in H.265/HEVC video steganography. It constructs a gradient map to capture changes in CU partitioning, combines this with a block-level IPM mapping, and introduces the GradIPMFormer network that integrates convolutional local feature embedding with Transformer-based token modeling to detect structural perturbations. The central claim is that this approach achieves consistently superior detection performance across multiple CU-based steganography methods under varying quantization parameters and video resolutions.
Significance. If the experimental results hold, the work fills a clear gap in HEVC steganalysis by addressing an embedding domain (CU structure) that prior methods based on motion vectors, IPMs, or coefficients have not targeted. The explicit gradient-map construction and hybrid CNN-Transformer architecture represent a new paradigm that could extend to other block-structure embeddings in modern codecs, strengthening detection capabilities for covert communication.
major comments (3)
- [Section 3.1] Section 3.1 (gradient map construction): the claim that the map 'explicitly describes changes in CU block structure' is load-bearing for the central claim, yet no analysis or statistical test is provided to show that embedding-induced deltas exceed or are separable from natural QP- and content-dependent partitioning variations arising from rate-distortion optimization.
- [Section 4] Section 4 (GradIPMFormer architecture): the integrated CNN-Transformer design is presented as essential for capturing both local boundary perturbations and long-range cross-CU dependencies, but the manuscript contains no ablation studies removing the Transformer component or the IPM mapping to quantify their individual contributions to the reported accuracy gains.
- [Experimental results] Experimental results section: the headline claim of 'consistently superior detection performance' across QPs and resolutions rests on unshown quantitative evidence; the text provides neither dataset sizes, number of videos, accuracy/F1 values with error bars, nor direct comparisons against adapted baselines, making it impossible to verify that gains reflect genuine steganalysis capability rather than dataset-specific correlations.
minor comments (2)
- [Abstract] The abstract states superior performance without any numerical metrics or dataset descriptors; adding at least the peak detection accuracy and a brief dataset summary would improve readability.
- [Section 3.1] Notation for the gradient map (e.g., how block-size differences are quantized into the map) is introduced without an accompanying equation or pseudocode, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable suggestions. Below, we provide point-by-point responses to the major comments and indicate how we will revise the manuscript to address them.
read point-by-point responses
-
Referee: [Section 3.1] Section 3.1 (gradient map construction): the claim that the map 'explicitly describes changes in CU block structure' is load-bearing for the central claim, yet no analysis or statistical test is provided to show that embedding-induced deltas exceed or are separable from natural QP- and content-dependent partitioning variations arising from rate-distortion optimization.
Authors: We concur that demonstrating the gradient map's effectiveness in isolating steganographic changes from natural variations is essential. Accordingly, we will augment Section 3.1 with statistical analyses, such as distribution comparisons and separability measures (e.g., using t-tests or mutual information), across different QP values and video contents to validate that embedding-induced deltas are distinguishable. revision: yes
-
Referee: [Section 4] Section 4 (GradIPMFormer architecture): the integrated CNN-Transformer design is presented as essential for capturing both local boundary perturbations and long-range cross-CU dependencies, but the manuscript contains no ablation studies removing the Transformer component or the IPM mapping to quantify their individual contributions to the reported accuracy gains.
Authors: We agree that ablation studies are necessary to substantiate the design choices. In the revised manuscript, we will include ablation experiments that isolate the contributions of the Transformer component and the IPM mapping by reporting performance metrics for ablated versions of the GradIPMFormer network. revision: yes
-
Referee: [Experimental results] Experimental results section: the headline claim of 'consistently superior detection performance' across QPs and resolutions rests on unshown quantitative evidence; the text provides neither dataset sizes, number of videos, accuracy/F1 values with error bars, nor direct comparisons against adapted baselines, making it impossible to verify that gains reflect genuine steganalysis capability rather than dataset-specific correlations.
Authors: We will revise the experimental results section to explicitly detail the dataset composition, including the number of videos and frames used, present accuracy and F1 scores with error bars from repeated experiments, and include comprehensive comparisons with adapted baseline methods to allow verification of the performance gains. revision: yes
Circularity Check
No significant circularity; new feature construction and network are independent of inputs
full rationale
The paper presents an original construction: a gradient map explicitly describing CU block structure changes, combined with block-level IPM mapping, fed into a custom GradIPMFormer that integrates CNN local embedding with Transformer token modeling. No equations, derivations, or predictions reduce by construction to fitted parameters or prior self-citations. The central claim rests on the proposed features and architecture capturing CU perturbations, with experimental results offered as validation rather than tautological outputs. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from the authors' prior work appear in the provided text. This qualifies as a self-contained new paradigm without the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Steganography based on CU block structure introduces detectable perturbations in block gradients and intra prediction modes that can be jointly modeled.
invented entities (1)
-
GradIPMFormer network
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GradIPMFormer... convolutional local embedding with Transformer-based token modeling
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Dense invariant feature- based support vector ranking for cross-camera person reidentification,
S. Tan, F. Zheng, L. Liu, J. Han, and L. Shao, “Dense invariant feature- based support vector ranking for cross-camera person reidentification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 2, pp. 356–363, 2016
work page 2016
-
[2]
A hevc video ste- ganalysis algorithm based on pu partition modes.,
Z. Li, L. Meng, S. Xu, Z. Li, Y . Shi, and Y . Liang, “A hevc video ste- ganalysis algorithm based on pu partition modes.,”Computers, Materials & Continua, vol. 59, no. 2, 2019
work page 2019
-
[3]
Hevc video steganalysis based on pu maps and multi-scale convolutional residual network,
H. Dai, R. Wang, D. Xu, S. He, and L. Yang, “Hevc video steganalysis based on pu maps and multi-scale convolutional residual network,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2663–2676, 2023
work page 2023
-
[4]
S. Liu, Y . Hu, B. Liu, and C.-T. Li, “An hevc steganalytic approach against motion vector modification using local optimality in candidate list,”Pattern Recognition Letters, vol. 146, pp. 23–30, 2021
work page 2021
-
[5]
A steganalytic algorithm to detect dct-based data hiding methods for h. 264/avc videos,
P. Wang, Y . Cao, X. Zhao, and M. Zhu, “A steganalytic algorithm to detect dct-based data hiding methods for h. 264/avc videos,” in Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp. 123–133, 2017
work page 2017
-
[6]
H. Zhang, W. You, and X. Zhao, “A video steganalytic approach against quantized transform coefficient-based h. 264 steganography by exploit- ing in-loop deblocking filtering,”IEEE Access, vol. 8, pp. 186862– 186878, 2020
work page 2020
-
[7]
Hevc video steganalysis based on centralized error and attention mechanism,
H. Dai, D. Xu, L. Yang, and R. Wang, “Hevc video steganalysis based on centralized error and attention mechanism,”IEEE Transactions on Multimedia, 2025
work page 2025
-
[8]
Video steganalysis based on intra prediction mode calibration,
Y . Zhao, H. Zhang, Y . Cao, P. Wang, and X. Zhao, “Video steganalysis based on intra prediction mode calibration,” inInternational Workshop on Digital Watermarking, pp. 119–133, Springer, 2015
work page 2015
-
[9]
P. Liu and S. Li, “Steganalysis of intra prediction mode and motion vector-based steganography by noise residual convolutional neural net- work,”IOP Conference Series: Materials Science and Engineering, vol. 719, no. 1, p. 012068, 2020
work page 2020
-
[10]
Information hiding in hevc standard using adap- tive coding block size decision,
Y . Tew and K. Wong, “Information hiding in hevc standard using adap- tive coding block size decision,” in2014 IEEE international conference on image processing (ICIP), pp. 5502–5506, IEEE, 2014
work page 2014
-
[11]
Adaptive hevc steganogra- phy based on steganographic compression efficiency degradation model,
Y . Dong, X. Jiang, Z. Li, T. Sun, and P. He, “Adaptive hevc steganogra- phy based on steganographic compression efficiency degradation model,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 1, pp. 769–783, 2022
work page 2022
-
[12]
Quad-tree structure-preserving adaptive steganography for hevc,
L. Yang, D. Xu, J. Qian, and R. Wang, “Quad-tree structure-preserving adaptive steganography for hevc,”IEEE Transactions on Multimedia, vol. 26, pp. 8625–8638, 2024
work page 2024
-
[13]
Adaptive hevc video steganograhpy based on pu partition modes,
S. Wang, D. Xu, and S. He, “Adaptive hevc video steganograhpy based on pu partition modes,”Journal of Visual Communication and Image Representation, vol. 101, p. 104176, 2024
work page 2024
-
[14]
A prediction mode steganalysis detection algorithm for hevc,
Q. Sheng, R. Wang, M. Huang, Q. Li, and D. Xu, “A prediction mode steganalysis detection algorithm for hevc,”J Optoelectron-laser, vol. 28, no. 4, pp. 433–440, 2017. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14
work page 2017
-
[15]
L. Zhai, L. Wang, and Y . Ren, “Universal detection of video steganog- raphy in multiple domains based on the consistency of motion vec- tors,”IEEE transactions on information forensics and security, vol. 15, pp. 1762–1777, 2019
work page 2019
-
[16]
A cnn-based hevc video steganalysis against dct/dst-based steganography,
Z. Zhang, H. Shi, X. Jiang, Z. Li, and J. Liu, “A cnn-based hevc video steganalysis against dct/dst-based steganography,” inInternational Con- ference on Digital Forensics and Cyber Crime, pp. 265–276, Springer, 2021
work page 2021
-
[17]
M. Cao, L. Tian, and C. Li, “A steganalytic approach to detect intra prediction mode modification using difference of partitioning structure for hevc,”IEEE Transactions on Consumer Electronics, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.