FaSST: Fast Sparsifying Secondary Transform
Pith reviewed 2026-05-15 03:14 UTC · model grok-4.3
pith:MLJD76SL Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{MLJD76SL}
Prints a linked pith:MLJD76SL badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
FaSST factorizes data-driven secondary transforms into mode-adaptive Givens rotations to match LFNST performance at far lower complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach approximates data-driven sparse orthonormal transforms (SOTs) by factorizing them into a sequence of Givens rotations. The rotations are efficiently determined using an alternating minimization strategy combined with an approximate Givens factorization procedure. Our method adapts the number of rotations based on the prediction mode, further reducing computational complexity. We design mode-dependent secondary transforms for intra-prediction residuals in AV2 using FaSST.
What carries the argument
Approximate factorization of sparse orthonormal transforms into mode-dependent sequences of Givens rotations via alternating minimization.
If this is right
- Mode-adaptive FaSST matches the RD performance of LFNST while cutting computations by 83.67 percent.
- By skipping fixed-coefficient truncation, FaSST delivers up to 1.80 percent BD-rate savings over LFNST at 66.24 percent lower complexity.
- The transforms are explicitly designed for intra-prediction residuals inside AV2.
- Mode-specific rotation counts let the encoder trade complexity against performance per block.
Where Pith is reading between the lines
- The same factorization method could be applied to inter-prediction residuals or to other stages such as primary transforms.
- Hardware implementations would see lower power draw because each block uses only the rotations required for its mode.
- Extending the design to a larger set of video content beyond the AV2 test set might expose additional gains or edge cases.
- Combining FaSST with existing low-complexity approximations already present in VVC could produce cumulative efficiency improvements.
Load-bearing premise
The alternating-minimization procedure combined with approximate Givens factorization produces transforms whose rate-distortion performance remains close to the original data-driven SOTs across all intra prediction modes and content types.
What would settle it
Direct BD-rate measurements on AV2 test sequences showing FaSST underperforms LFNST by more than 0.5 percent on average, or computation counts that fail to deliver at least 60 percent savings.
read the original abstract
Data-dependent secondary transforms, which aim to decorrelate coefficients of a separable primary transform, can improve residual coding efficiency; however, their deployment is often constrained by computational complexity. Recent video codecs use variants of the low-frequency non-separable transform (LFNST), which discards some high-frequency secondary transform coefficients, limiting achievable coding gains. Moreover, existing data-dependent secondary transforms lack explicit rate-distortion (RD) optimal design criteria. In this work, we propose a framework for designing low-complexity data-dependent secondary transforms, termed Fast Sparsifying Secondary Transforms (FaSSTs). Our approach approximates data-driven sparse orthonormal transforms (SOTs) by factorizing them into a sequence of Givens rotations. The rotations are efficiently determined using an alternating minimization strategy combined with an approximate Givens factorization procedure. Our method adapts the number of rotations based on the prediction mode, further reducing computational complexity. We design mode-dependent secondary transforms for intra-prediction residuals in AV2 using FaSST. Experimental results show that mode-adaptive FaSST matches the RD performance of LFNST while reducing the number of computations by 83.67%. Moreover, by avoiding fixed-coefficient truncation, FaSST achieves up to 1.80% BD-rate savings relative to LFNST while operating at 66.24% lower complexity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FaSST, a framework for designing low-complexity data-dependent secondary transforms by approximating sparse orthonormal transforms (SOTs) through factorization into Givens rotations using alternating minimization. Applied to intra-prediction residuals in AV2, mode-adaptive FaSST is shown to match LFNST rate-distortion performance while reducing computations by 83.67%, and achieve up to 1.80% BD-rate savings at 66.24% lower complexity by avoiding fixed-coefficient truncation.
Significance. If the approximation quality holds across modes and content, FaSST supplies a concrete route to deploy data-driven secondary transforms in video codecs at substantially lower complexity than LFNST while removing the truncation penalty, which could translate into measurable coding gains for AV2 and successor standards.
major comments (3)
- [Experimental Results] Experimental Results section: the approximation residual between the original data-driven SOTs and the FaSST versions (e.g., ||SOT − FaSST||_F or the deviation in the quadratic form governing coefficient variance) is never reported, so the claim that mode-adaptive FaSST preserves RD performance cannot be verified independently of the particular test set.
- [Experimental Results] Experimental Results section: no RD curves or tabulated performance figures are given for the unapproximated SOTs themselves, leaving the 1.80 % BD-rate gain relative to LFNST unanchored and potentially attributable to test-set specifics rather than the proposed construction.
- [Method] Method section: the alternating-minimization procedure and approximate Givens factorization lack convergence criteria, initialization strategy, and any bound on the number of rotations per mode, all of which are load-bearing for reproducing the reported complexity reductions of 83.67 % and 66.24 %.
minor comments (2)
- [Abstract] Abstract: the phrase 'up to 1.80 % BD-rate savings' is stated without indicating the sequences or QP range that attain the maximum, making the headline claim difficult to interpret.
- [Experimental Results] The manuscript does not specify the size or diversity of the AV2 test set used for the BD-rate measurements, which is a standard requirement for reproducibility of the numerical claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major point below and commit to revisions that enhance reproducibility and verifiability without altering the core claims.
read point-by-point responses
-
Referee: Experimental Results section: the approximation residual between the original data-driven SOTs and the FaSST versions (e.g., ||SOT − FaSST||_F or the deviation in the quadratic form governing coefficient variance) is never reported, so the claim that mode-adaptive FaSST preserves RD performance cannot be verified independently of the particular test set.
Authors: We agree that reporting the approximation residual is necessary for independent verification. In the revised manuscript we will add explicit tables listing the Frobenius norm ||SOT − FaSST||_F per prediction mode together with the resulting deviation in the quadratic form that governs coefficient variance. These quantities will be presented separately from the RD curves so that approximation quality can be assessed on its own. revision: yes
-
Referee: Experimental Results section: no RD curves or tabulated performance figures are given for the unapproximated SOTs themselves, leaving the 1.80 % BD-rate gain relative to LFNST unanchored and potentially attributable to test-set specifics rather than the proposed construction.
Authors: The unapproximated SOTs constitute the theoretical target but are computationally prohibitive for full codec integration; therefore their RD curves were not included in the original submission. The 1.80 % BD-rate advantage of FaSST over LFNST is attributable to the removal of fixed-coefficient truncation rather than to test-set idiosyncrasies. We will add a clarifying paragraph that explicitly links the observed gain to the non-truncated design and, where computationally feasible, will supply RD figures for SOTs on a representative subset of modes. revision: partial
-
Referee: Method section: the alternating-minimization procedure and approximate Givens factorization lack convergence criteria, initialization strategy, and any bound on the number of rotations per mode, all of which are load-bearing for reproducing the reported complexity reductions of 83.67 % and 66.24 %.
Authors: We acknowledge that the algorithmic description must be made fully reproducible. The revised Method section will specify (i) the convergence criterion (relative change in the objective below 10^{-4}), (ii) the initialization (identity matrix followed by a single random Givens sweep), and (iii) the mode-adaptive bound on the number of rotations (chosen to meet a per-mode sparsity target derived from the original SOT). These additions will allow exact reproduction of the stated complexity figures. revision: yes
Circularity Check
No significant circularity: FaSST construction and RD claims are externally anchored
full rationale
The paper defines FaSST as an explicit approximation procedure (alternating minimization + approximate Givens factorization) applied to pre-computed data-driven SOTs; the resulting transforms are then evaluated by direct RD measurement against the independent LFNST baseline on AV2 test sequences. No equation or claim reduces the reported BD-rate savings or complexity reduction to a fitted parameter or self-citation by construction. The design procedure is self-contained once the training residuals are supplied, and the headline performance numbers are obtained from external codec integration rather than from any internal redefinition of the objective.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of Givens rotations per mode
axioms (1)
- domain assumption A sparse orthonormal transform exists for each intra-prediction residual class that improves coefficient decorrelation beyond the primary transform.
Reference graph
Works this paper leans on
-
[1]
FaSST: Fast Sparsifying Secondary Transform
INTRODUCTION The non-separable Karhunen–Lo `eve Transform (KLT) is optimal for linear decorrelation under common statistical assumptions [1–5]. However, the non-separable KLT is rarely used in practice due to its lack of low-complexity implementations and memory require- ments. Separable trigonometric transforms, such as the discrete cosine transform (DCT...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
PRELIMINARIES 2.1. Secondary transforms Letx∈R N 2 be the vector obtained by stacking the columns of anN×Nblock. Given a primary transformG, and a permutation matrixPthat reorders the transform coefficients (e.g., zig-zag scan), we write the primary transform coefficients as ˆx=PG ⊤x. Let ˜T be ann×nnon-separable orthonormal transform, referred to as the ...
-
[3]
Moreover, KLTs and their approx- imations do not account for quantization effects
FAST SPARSIFYING SECONDARY TRANSFORMS LFNSTs significantly improve residual coding efficiency but rely on coefficient truncation to control computational complexity, which limits achievable coding gains. Moreover, KLTs and their approx- imations do not account for quantization effects. We address these limitations via our proposed Fast Sparsifying Seconda...
-
[4]
given the coefficients{ ˆyi}me i=1, we optimize the transform: min SJ meX i=1 ∥ˆxi −S J ˆyi∥2 2 s.t.S J = JY j=1 G(mj, nj, θj).(6) It can be shown that the above problem is equivalent to max SJ tr(ΓSJ )s.t.S J = JY j=1 G(mj, nj, θj),(7) wheretr(·)denotes the trace of a matrix,Γ= ˆY ˆX⊤ denotes the cross-covariance matrix between the thresholded secondary ...
-
[5]
SIMULA TIONS We use intra-prediction residuals obtained from A V2 for both trans- form learning and evaluation. Specifically, images from the CLIC dataset [20] are compressed using A V2, and residual blocks of sizes 8×8,16×16, and32×32are collected. Among these blocks, some used the DCT and others the ADST as primary transforms, and some used only a prima...
-
[6]
CONCLUSION In this work, we proposed an FaSST framework for designing fast, data-dependent secondary transforms for residual coding. We formulate secondary transform learning as a rate-distortion-cost- inspired SOT problem with the additional constraint that the trans- form be factorized into a sequence of Givens rotations, enabling efficient implementati...
-
[7]
A comparative study of image correlation models for directional two-dimensional sources
S. Zhu and B. Zeng, “A comparative study of image correla- tion models for directional two-dimensional sources,” in2011 IEEE 13th International Workshop on Multimedia Signal Pro- cessing, 2011, pp. 1–5
work page 2011
-
[8]
Mode-Dependent Transforms for Coding Directional Intra Prediction Residuals
C. Yeo, Y . H. Tan, Z. Li, and S. Rahardja, “Mode-dependent transforms for coding directional intra prediction residuals,” IEEE Transactions on Circuits and Systems for Video Technol- ogy, vol. 22, no. 4, pp. 545–554, 2012
work page 2012
-
[9]
Signal-Independent Separable KLT by Offline Training for Video Coding
K. Fan, R. Wang, W. Lin, L.-Y . Duan, and W. Gao, “Signal- independent separable KLT by offline training for video cod- ing,”IEEE Access, vol. 7, pp. 33087–33093, 2019
work page 2019
-
[10]
Video content dependent directional transform for intra frame coding
L. Xu, K. N. Ngan, and M. Wang, “Video content dependent directional transform for intra frame coding,” in2012 Picture Coding Symposium. IEEE, 2012, pp. 197–200
work page 2012
-
[11]
Non-separable mode dependent transforms for intra coding in HEVC
A. Arrufat, P. Philippe, and O. D´eforges, “Non-separable mode dependent transforms for intra coding in HEVC,” in2014 IEEE Visual Communications and Image Processing Confer- ence. IEEE, 2014, pp. 61–64
work page 2014
-
[12]
Overview of the Versatile Video Coding (VVC) Standard and its Applications
B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (VVC) standard and its applications,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736– 3764, 2021
work page 2021
-
[13]
Low Frequency Non-Separable Transform (LFNST)
M. Koo, M. Salehifar, J. Lim, and S.-H. Kim, “Low frequency non-separable transform (LFNST),” in2019 Picture Coding Symposium (PCS), 2019, pp. 1–5
work page 2019
-
[14]
Study On Coding Tools Beyond AV1
X. Zhao, L. Zhao, M. Krishnan, Y . Du, S. Liu, D. Mukherjee, Y . Xu, and A. Grange, “Study on coding tools beyond A V1,” in 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2021, pp. 1–6
work page 2021
-
[15]
Approximation and Compression With Sparse Orthonormal Transforms
O. G. Sezer, O. G. Guleryuz, and Y . Altunbasak, “Approxi- mation and compression with sparse orthonormal transforms,” IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2328–2343, 2015
work page 2015
-
[16]
Highly efficient non-separable transforms for next generation video coding
A. Said, X. Zhao, M. Karczewicz, H. E. Egilmez, V . Seregin, and J. Chen, “Highly efficient non-separable transforms for next generation video coding,” in2016 Picture Coding Sym- posium (PCS), 2016, pp. 1–5
work page 2016
-
[17]
An algorithm for the machine calculation of complex fourier series,
J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,”Mathematics of compu- tation, vol. 19, no. 90, pp. 297–301, 1965
work page 1965
-
[18]
Robust Learning of 2-D Separable Transforms for Next-Generation Video Coding
O. G. Sezer, R. Cohen, and A. Vetro, “Robust learning of 2- D separable transforms for next-generation video coding,” in 2011 Data Compression Conference. IEEE, 2011, pp. 63–72
work page 2011
-
[19]
NSST: Non-separable secondary transforms for next generation video coding
X. Zhao, J. Chen, A. Said, V . Seregin, H. E. Egilmez, and M. Karczewicz, “NSST: Non-separable secondary transforms for next generation video coding,” in2016 Picture Coding Symposium (PCS), 2016, pp. 1–5
work page 2016
-
[20]
On secondary transforms for intra prediction residual
A. Saxena and F. C. Fernandes, “On secondary transforms for intra prediction residual,” in2012 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 1201–1204
work page 2012
-
[21]
Alternating minimization algorithms: From blahut-arimoto to expectation-maximization,
J. A. O’Sullivan, “Alternating minimization algorithms: From blahut-arimoto to expectation-maximization,” inCodes, curves, and signals: Common threads in communications, pp. 173–192. Springer, 1998
work page 1998
-
[22]
Convergence rates and source conditions for Tikhonov regularization with sparsity constraints
D. A. Lorenz, “Convergence rates and source conditions for tikhonov regularization with sparsity constraints,”arXiv preprint arXiv:0801.1774, 2008
work page internal anchor Pith review Pith/arXiv arXiv 2008
-
[23]
A generalized solution of the orthogonal procrustes problem,
P. H. Sch ¨onemann, “A generalized solution of the orthogonal procrustes problem,”Psychometrika, vol. 31, no. 1, pp. 1–10, 1966
work page 1966
-
[24]
G. Golub and C. Van Loan, “Matrix computations, vol. 3 bal- timore,”MD: JHU Press., 2012
work page 2012
-
[25]
Approximate Fast Graph Fourier Transforms via Multilayer Sparse Approximations
L. Le Magoarou, R. Gribonval, and N. Tremblay, “Approx- imate fast graph fourier transforms via multilayer sparse ap- proximations,”IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 2, pp. 407–420, 2018
work page 2018
-
[26]
5th challenge on learned image compression,
“5th challenge on learned image compression,” 2022, Online
work page 2022
-
[27]
D. Pakiyarajah, E. Pavez, A. Ortega, D. Mukherjee, O. Guleryuz, K.-S. Lu, and K. K. Sivakumar, “Joint optimiza- tion of primary and secondary transforms using rate-distortion optimized transform design,” in2025 IEEE International Con- ference on Image Processing (ICIP), 2025, pp. 2594–2599
work page 2025
-
[28]
Image quality assessment: from error visibility to structural similarity
Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similar- ity,”IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004
work page 2004
-
[29]
The disparity between optimal and practical Lagrangian multiplier estimation in video encoders
D. J. Ringis, Vibhoothi, F. Piti´e, and A. Kokaram, “The dispar- ity between optimal and practical Lagrangian multiplier esti- mation in video encoders,”Frontiers in Signal Processing, vol. 3, pp. 1205104, 2023
work page 2023
-
[30]
High Throughput CABAC Entropy Coding in HEVC
V . Sze and M. Budagavi, “High throughput cabac entropy cod- ing in hevc,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1778–1791, 2012
work page 2012
-
[31]
Calculation of average PSNR differences be- tween RD-curves,
G. Bjontegaard, “Calculation of average PSNR differences be- tween RD-curves,”ITU-T SG16 Q, vol. 6, 2001
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.