pith. sign in

arxiv: 2602.11547 · v3 · submitted 2026-02-12 · 📡 eess.IV · cs.MM

H.265/HEVC Video Steganalysis Based on CU Block Structure Gradients and IPM Mapping

Pith reviewed 2026-05-16 02:18 UTC · model grok-4.3

classification 📡 eess.IV cs.MM
keywords video steganalysisH.265/HEVCCU block structureintra prediction modegradient mapTransformer networksteganography detectionGradIPMFormer
0
0 comments X

The pith

The GradIPMFormer network detects CU block structure steganography in H.265/HEVC videos by using gradient maps of coding unit changes combined with intra prediction mode mappings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first steganalysis method for H.265/HEVC videos targeting steganography that alters Coding Unit block structures. It builds a gradient map to highlight these structural changes and pairs it with a block-level mapping of intra prediction modes to jointly model the embedding effects. These inputs feed a hybrid network that applies convolutional layers for local boundary features and Transformer modeling for longer-range dependencies across units. Experiments show higher detection rates than prior approaches under varying quantization parameters, resolutions, and multiple embedding techniques. This addresses a gap in detecting hidden data that exploits the flexible block partitioning used in modern video compression.

Core claim

By constructing a gradient map that explicitly describes changes in CU block structure and combining it with block level mapping of intra prediction modes, the GradIPMFormer network jointly models the structural perturbations introduced by CU block structure steganography, enabling effective detection that outperforms prior methods across different quantization parameters and resolutions.

What carries the argument

GradIPMFormer, an integrated architecture that combines convolutional local embedding with Transformer-based token modeling to capture local CU boundary perturbations and long-range cross-CU structural dependencies.

Load-bearing premise

The gradient map of CU block structure changes combined with IPM mapping will reliably capture structural perturbations from steganography without being masked by normal compression artifacts.

What would settle it

A set of videos where CU-based steganography is applied at rates that produce no measurable drop in detection accuracy below chance level when the same videos are re-compressed without embedding, showing gradients fail to isolate the changes.

Figures

Figures reproduced from arXiv: 2602.11547 by Fei Peng, Haiyang Xia, Wenbin Huang, Xiang Zhang, Zhangjie Fu, Ziwen He.

Figure 1
Figure 1. Figure 1: Comparison of CDFs for Block-Structure Mapping and Block [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Differences in IPM Co-occurrence Matrix Distributions Before and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Intra Prediction Modes in H.265/HEVC and the SATD cost is computed by applying a Hadamard transform H(·) to Em: SATD(m) = X i,j |H(Em)i,j | . (6) Modes are ranked by SATD(m), and only a small set of top￾ranked modes is kept as the candidate set Mcand. In the second stage, full rate–distortion optimization (RDO) is performed only on m ∈ Mcand. Let Rm denote the reconstructed block after prediction, transfor… view at source ↗
Figure 5
Figure 5. Figure 5: Transformer Encoder Block Formally, let the CU partition configuration be C, and let the RD-optimal IPM under C be m∗ (C) = arg min m∈Mcand(C) J(m | C), (10) where Mcand(C) denotes the SATD-filtered candidate set under partition C, and J(·|C) is the corresponding RDO cost. When embedding changes the CU partition from C0 to C1, the optimal mode may drift from m∗ (C0) to m∗ (C1) due to the altered SATD ranki… view at source ↗
Figure 6
Figure 6. Figure 6: The Proposed Steganalysis Framework TABLE I COMPARISON OF MODEL COMPLEXITY AND INFERENCE EFFICIENCY. Method Param(M)↓ GFLOPs↓ Batch(ms)↓ Frame(ms)↓ Proposed 0.56 13.41 5.63 0.23 CENet [7] 0.21 47.09 19.67 0.82 PUNet [3] 0.35 34.05 9.22 0.38 NRNet [9] 0.09 21.36 7.98 0.33 ZhangNet [16] 0.06 9.07 2.43 0.10 Based on these considerations, Transformer is introduced in our framework to strengthen global context … view at source ↗
Figure 7
Figure 7. Figure 7: Example of generating CUmap that although the minimum CU size in the standard is 8 × 8, recursive partition at the PU level may further generate 4 × 4 leaf blocks. In this work, such 4 × 4 blocks are also treated as structural units for fine-grained representation. For spatial coordinates satisfying 0 ≤ i < H and 0 ≤ j < W, the CU block-structure map of cuk is defined as cumap cuk i,j =    4… view at source ↗
Figure 9
Figure 9. Figure 9: The overall framework of GradIPMFormer 1) Customized Feature Extraction Module: Considering that CU structural perturbations exhibit strong locality in the spatial domain, GradIPMFormer first employs a lightweight 2D convolutional embedding module, the customized feature extraction module, to perform local feature embedding in the joint feature tensor F. This module consists of two 3 × 3 convolutional laye… view at source ↗
Figure 10
Figure 10. Figure 10: Detection accuracies(PACC ↑) of Steganalysis Algorithms Across Four Different Networks and our proposed against Tar1–4 given video samples coded. A, B, C, D, E, and F indicate (480p, 26), (480p, 32), (480p, 38), (1080p, 26), (1080p, 32), and (1080p, 38), respectively. TABLE V DETECTION ACCURACIES (PACC ↑) COMPARISON OF DIFFERENT NETWORK AGAINST TAR1-4 GIVEN VIDEO SAMPLES COMPRESSED WITH DIFFERENT QP AND E… view at source ↗
Figure 11
Figure 11. Figure 11: Detection accuracies(PACC ↑) Under the Cover-Source Mismatch (CSM) Setting. input (Non-onehot) and converting it into a one-hot encoding (Onehot). To control variables, we conduct evaluations under a fixed setting of 1080P, QP=32, and payload=0.3, and report detection accuracies for four CU block-structure steganogra￾phy algorithms (Tar1–Tar4), as shown in Table VI. Overall, one-hot encoding yields consis… view at source ↗
read the original abstract

Existing H.265/HEVC video steganalysis research mainly focuses on detecting the steganography based on motion vectors, intra prediction modes, and transform coefficients. However, there is currently no effective steganalysis method capable of detecting steganography based on Coding Unit (CU) block structure. To address this issue, we propose, for the first time, a H.265/HEVC video steganalysis algorithm based on CU block structure gradients and intra prediction mode mapping. The proposed method first constructs a new gradient map to explicitly describe changes in CU block structure, and combines it with a block level mapping representation of IPM. It can jointly model the structural perturbations introduced by steganography based on CU block structure. Then, we design a novel steganalysis network called GradIPMFormer, whose core innovation is an integrated architecture that combines convolutional local embedding with Transformer-based token modeling to jointly capture local CU boundary perturbations and long-range cross-CU structural dependencies, thereby effectively enhancing the capability to perceive CU block structure embedding. Experimental results show that under different quantization parameters and resolution settings, the proposed method consistently achieves superior detection performance across multiple steganography methods based on CU block structure. This study provides a new CU block structure steganalysis paradigm for H.265/HEVC and has significant research value for covert communication security detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the first steganalysis method targeting CU block structure modifications in H.265/HEVC video steganography. It constructs a gradient map to capture changes in CU partitioning, combines this with a block-level IPM mapping, and introduces the GradIPMFormer network that integrates convolutional local feature embedding with Transformer-based token modeling to detect structural perturbations. The central claim is that this approach achieves consistently superior detection performance across multiple CU-based steganography methods under varying quantization parameters and video resolutions.

Significance. If the experimental results hold, the work fills a clear gap in HEVC steganalysis by addressing an embedding domain (CU structure) that prior methods based on motion vectors, IPMs, or coefficients have not targeted. The explicit gradient-map construction and hybrid CNN-Transformer architecture represent a new paradigm that could extend to other block-structure embeddings in modern codecs, strengthening detection capabilities for covert communication.

major comments (3)
  1. [Section 3.1] Section 3.1 (gradient map construction): the claim that the map 'explicitly describes changes in CU block structure' is load-bearing for the central claim, yet no analysis or statistical test is provided to show that embedding-induced deltas exceed or are separable from natural QP- and content-dependent partitioning variations arising from rate-distortion optimization.
  2. [Section 4] Section 4 (GradIPMFormer architecture): the integrated CNN-Transformer design is presented as essential for capturing both local boundary perturbations and long-range cross-CU dependencies, but the manuscript contains no ablation studies removing the Transformer component or the IPM mapping to quantify their individual contributions to the reported accuracy gains.
  3. [Experimental results] Experimental results section: the headline claim of 'consistently superior detection performance' across QPs and resolutions rests on unshown quantitative evidence; the text provides neither dataset sizes, number of videos, accuracy/F1 values with error bars, nor direct comparisons against adapted baselines, making it impossible to verify that gains reflect genuine steganalysis capability rather than dataset-specific correlations.
minor comments (2)
  1. [Abstract] The abstract states superior performance without any numerical metrics or dataset descriptors; adding at least the peak detection accuracy and a brief dataset summary would improve readability.
  2. [Section 3.1] Notation for the gradient map (e.g., how block-size differences are quantized into the map) is introduced without an accompanying equation or pseudocode, which would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. Below, we provide point-by-point responses to the major comments and indicate how we will revise the manuscript to address them.

read point-by-point responses
  1. Referee: [Section 3.1] Section 3.1 (gradient map construction): the claim that the map 'explicitly describes changes in CU block structure' is load-bearing for the central claim, yet no analysis or statistical test is provided to show that embedding-induced deltas exceed or are separable from natural QP- and content-dependent partitioning variations arising from rate-distortion optimization.

    Authors: We concur that demonstrating the gradient map's effectiveness in isolating steganographic changes from natural variations is essential. Accordingly, we will augment Section 3.1 with statistical analyses, such as distribution comparisons and separability measures (e.g., using t-tests or mutual information), across different QP values and video contents to validate that embedding-induced deltas are distinguishable. revision: yes

  2. Referee: [Section 4] Section 4 (GradIPMFormer architecture): the integrated CNN-Transformer design is presented as essential for capturing both local boundary perturbations and long-range cross-CU dependencies, but the manuscript contains no ablation studies removing the Transformer component or the IPM mapping to quantify their individual contributions to the reported accuracy gains.

    Authors: We agree that ablation studies are necessary to substantiate the design choices. In the revised manuscript, we will include ablation experiments that isolate the contributions of the Transformer component and the IPM mapping by reporting performance metrics for ablated versions of the GradIPMFormer network. revision: yes

  3. Referee: [Experimental results] Experimental results section: the headline claim of 'consistently superior detection performance' across QPs and resolutions rests on unshown quantitative evidence; the text provides neither dataset sizes, number of videos, accuracy/F1 values with error bars, nor direct comparisons against adapted baselines, making it impossible to verify that gains reflect genuine steganalysis capability rather than dataset-specific correlations.

    Authors: We will revise the experimental results section to explicitly detail the dataset composition, including the number of videos and frames used, present accuracy and F1 scores with error bars from repeated experiments, and include comprehensive comparisons with adapted baseline methods to allow verification of the performance gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new feature construction and network are independent of inputs

full rationale

The paper presents an original construction: a gradient map explicitly describing CU block structure changes, combined with block-level IPM mapping, fed into a custom GradIPMFormer that integrates CNN local embedding with Transformer token modeling. No equations, derivations, or predictions reduce by construction to fitted parameters or prior self-citations. The central claim rests on the proposed features and architecture capturing CU perturbations, with experimental results offered as validation rather than tautological outputs. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from the authors' prior work appear in the provided text. This qualifies as a self-contained new paradigm without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that CU-based embedding produces detectable gradient and IPM perturbations that the hybrid network can isolate from compression noise; no explicit free parameters or invented physical entities are described.

axioms (1)
  • domain assumption Steganography based on CU block structure introduces detectable perturbations in block gradients and intra prediction modes that can be jointly modeled.
    Invoked in the construction of the gradient map and IPM mapping to enable detection.
invented entities (1)
  • GradIPMFormer network no independent evidence
    purpose: Integrated CNN-Transformer architecture to capture local CU boundary perturbations and long-range structural dependencies.
    New postulated architecture introduced to enhance perception of CU embedding; no independent evidence provided.

pith-pipeline@v0.9.0 · 5557 in / 1316 out tokens · 30887 ms · 2026-05-16T02:18:22.685469+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Dense invariant feature- based support vector ranking for cross-camera person reidentification,

    S. Tan, F. Zheng, L. Liu, J. Han, and L. Shao, “Dense invariant feature- based support vector ranking for cross-camera person reidentification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 2, pp. 356–363, 2016

  2. [2]

    A hevc video ste- ganalysis algorithm based on pu partition modes.,

    Z. Li, L. Meng, S. Xu, Z. Li, Y . Shi, and Y . Liang, “A hevc video ste- ganalysis algorithm based on pu partition modes.,”Computers, Materials & Continua, vol. 59, no. 2, 2019

  3. [3]

    Hevc video steganalysis based on pu maps and multi-scale convolutional residual network,

    H. Dai, R. Wang, D. Xu, S. He, and L. Yang, “Hevc video steganalysis based on pu maps and multi-scale convolutional residual network,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2663–2676, 2023

  4. [4]

    An hevc steganalytic approach against motion vector modification using local optimality in candidate list,

    S. Liu, Y . Hu, B. Liu, and C.-T. Li, “An hevc steganalytic approach against motion vector modification using local optimality in candidate list,”Pattern Recognition Letters, vol. 146, pp. 23–30, 2021

  5. [5]

    A steganalytic algorithm to detect dct-based data hiding methods for h. 264/avc videos,

    P. Wang, Y . Cao, X. Zhao, and M. Zhu, “A steganalytic algorithm to detect dct-based data hiding methods for h. 264/avc videos,” in Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp. 123–133, 2017

  6. [6]

    A video steganalytic approach against quantized transform coefficient-based h. 264 steganography by exploit- ing in-loop deblocking filtering,

    H. Zhang, W. You, and X. Zhao, “A video steganalytic approach against quantized transform coefficient-based h. 264 steganography by exploit- ing in-loop deblocking filtering,”IEEE Access, vol. 8, pp. 186862– 186878, 2020

  7. [7]

    Hevc video steganalysis based on centralized error and attention mechanism,

    H. Dai, D. Xu, L. Yang, and R. Wang, “Hevc video steganalysis based on centralized error and attention mechanism,”IEEE Transactions on Multimedia, 2025

  8. [8]

    Video steganalysis based on intra prediction mode calibration,

    Y . Zhao, H. Zhang, Y . Cao, P. Wang, and X. Zhao, “Video steganalysis based on intra prediction mode calibration,” inInternational Workshop on Digital Watermarking, pp. 119–133, Springer, 2015

  9. [9]

    Steganalysis of intra prediction mode and motion vector-based steganography by noise residual convolutional neural net- work,

    P. Liu and S. Li, “Steganalysis of intra prediction mode and motion vector-based steganography by noise residual convolutional neural net- work,”IOP Conference Series: Materials Science and Engineering, vol. 719, no. 1, p. 012068, 2020

  10. [10]

    Information hiding in hevc standard using adap- tive coding block size decision,

    Y . Tew and K. Wong, “Information hiding in hevc standard using adap- tive coding block size decision,” in2014 IEEE international conference on image processing (ICIP), pp. 5502–5506, IEEE, 2014

  11. [11]

    Adaptive hevc steganogra- phy based on steganographic compression efficiency degradation model,

    Y . Dong, X. Jiang, Z. Li, T. Sun, and P. He, “Adaptive hevc steganogra- phy based on steganographic compression efficiency degradation model,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 1, pp. 769–783, 2022

  12. [12]

    Quad-tree structure-preserving adaptive steganography for hevc,

    L. Yang, D. Xu, J. Qian, and R. Wang, “Quad-tree structure-preserving adaptive steganography for hevc,”IEEE Transactions on Multimedia, vol. 26, pp. 8625–8638, 2024

  13. [13]

    Adaptive hevc video steganograhpy based on pu partition modes,

    S. Wang, D. Xu, and S. He, “Adaptive hevc video steganograhpy based on pu partition modes,”Journal of Visual Communication and Image Representation, vol. 101, p. 104176, 2024

  14. [14]

    A prediction mode steganalysis detection algorithm for hevc,

    Q. Sheng, R. Wang, M. Huang, Q. Li, and D. Xu, “A prediction mode steganalysis detection algorithm for hevc,”J Optoelectron-laser, vol. 28, no. 4, pp. 433–440, 2017. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

  15. [15]

    Universal detection of video steganog- raphy in multiple domains based on the consistency of motion vec- tors,

    L. Zhai, L. Wang, and Y . Ren, “Universal detection of video steganog- raphy in multiple domains based on the consistency of motion vec- tors,”IEEE transactions on information forensics and security, vol. 15, pp. 1762–1777, 2019

  16. [16]

    A cnn-based hevc video steganalysis against dct/dst-based steganography,

    Z. Zhang, H. Shi, X. Jiang, Z. Li, and J. Liu, “A cnn-based hevc video steganalysis against dct/dst-based steganography,” inInternational Con- ference on Digital Forensics and Cyber Crime, pp. 265–276, Springer, 2021

  17. [17]

    A steganalytic approach to detect intra prediction mode modification using difference of partitioning structure for hevc,

    M. Cao, L. Tian, and C. Li, “A steganalytic approach to detect intra prediction mode modification using difference of partitioning structure for hevc,”IEEE Transactions on Consumer Electronics, 2025