pith. machine review for the scientific record. sign in

arxiv: 2605.08727 · v1 · submitted 2026-05-09 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 1 theorem link

· Lean Theorem

Control Your View: High-Resolution Global Semantic Manipulation in Learned Image Compression

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:34 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords learned image compressionadversarial attackglobal semantic manipulationprojected gradient descenthigh-resolution attackimage compression securitysemantic manipulationadversarial robustness
0
0 comments X

The pith

A periodic geometric decay step-size schedule enables the first stable high-resolution global semantic manipulation in learned image compression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to prove that learned image compression models, which use neural networks to map images into compact latent codes with good rate-distortion performance, are vulnerable to adversarial perturbations that change the global semantic content of high-resolution images. Prior projected gradient descent attacks succeed on classification or low-resolution tasks but fail here because their fixed or linear step schedules cannot navigate the distinct Lazying, Oscillating, and Refining phases required to drive examples from the Identity Region into the Amplification Region. By replacing the step schedule with a periodic geometric decay, the authors produce a minimal extension called PGD²-GSM that succeeds on full-resolution inputs such as the Kodak dataset. If the claim holds, it shows that the same efficiency gains that make learned compression attractive also create a concrete pathway for semantic-level attacks that earlier methods could not reach.

Core claim

The authors show that well-performing global semantic manipulation requires adversarial examples to pass through Lazying-Oscillating-Refining stages, and that standard ℓ∞-bounded attacks fail because their step-size schedules cannot accommodate both the Oscillating and Refining stages. They therefore introduce a Periodic Geometric Decay schedule, integrate it with projected gradient descent to obtain the minimal variant PGD²-GSM, and demonstrate on Kodak images of size 3×768×512 that this variant is the first to achieve stable high-resolution global semantic manipulation, exposing a previously inaccessible threat to learned image compression systems.

What carries the argument

The Periodic Geometric Decay schedule for step sizes, which allows the attack to traverse the Lazying-Oscillating-Refining stages as examples move from the Identity Region to the Amplification Region.

If this is right

  • High-resolution global semantic manipulation becomes achievable under white-box ℓ∞ constraints for learned image compression.
  • Standard projected gradient descent cannot reach stable high-resolution GSM without the new step-size schedule.
  • The threat applies to practical resolutions such as 768×512 and is not limited to low-resolution or local manipulations.
  • Learned compression systems now face an explicit attack vector that alters entire-image semantics while remaining bounded in perturbation size.
  • The minimal PGD²-GSM variant demonstrates that only the schedule change is needed to cross the previous barrier.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Defenses for learned compression may need to incorporate training procedures that explicitly penalize trajectories through the oscillating and refining stages.
  • Similar step-size mismatches could appear in other high-resolution neural tasks such as generative modeling or video compression.
  • Testing PGD²-GSM on additional learned compression architectures would indicate whether the vulnerability is architecture-specific or general.
  • The existence of the Amplification Region suggests that rate-distortion optimization itself might be hardened by adding constraints on latent-code sensitivity.

Load-bearing premise

The claim rests on the premise that standard attacks fail specifically because their step-size schedules cannot handle both the oscillating and refining stages at once.

What would settle it

Running ordinary PGD with fixed or linearly decaying step sizes on the same high-resolution LIC models and Kodak images to test whether global semantic manipulation remains impossible, or verifying whether PGD²-GSM itself loses stability when the periodic decay is removed.

Figures

Figures reproduced from arXiv: 2605.08727 by Chi-Man Pun, Greta Seng Peng Mok, Jiaming Liang, Weisi Lin.

Figure 1
Figure 1. Figure 1: Illustrative threat scenario of high-resolution GSM: Manipulator controls the reconstruction [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of reconstruction fidelity attacks. Rec and Adv are abbreviations for reconstruc [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Lazying–Oscillating–Refining illustrated via η contours. Blue: Identity Region. Purple: Amplification Region. Section 3.2 reveals that effective GSM examples reside in the amplification region, while most source images lie in the identity region. This suggests two critical phases for successful GSM: (1) driving the adversarial example from the identity region into the amplification region, and (2) identify… view at source ↗
Figure 4
Figure 4. Figure 4: LCS trajectories and visualizations of PGD on LALIC under different [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparison between PGD2 -GSM and PGD with different step sizes. explored in prior work [54] for tasks such as classification attacks, it is typically optional. In contrast, it is indispensable for high-resolution GSM, as its absence leads to immediate failure (Section 4.2). Algorithm 1 PGD2 -GSM Require: Source xp, target xq, victim LIC model f, perturbation budget ϵ, steps T, initial step size α, d… view at source ↗
Figure 6
Figure 6. Figure 6: Quantitative comparison between PGD2 -GSM and PGD with different step sizes [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Quantitative results of PGD2 -GSM under different T and ϵ. (a) LALIC 0.06 0.08 0.10 T 1 T 2 T 3 (b) DCAE 0.06 0.08 0.10 (c) CCA 0.06 0.08 0.10 (d) HiFi-VRIC 0.06 0.08 0.10 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of PGD2 -GSM under different step numbers T and perturbation budgets ϵ. 4.5 Performance under Different Perturbation Budgets ϵ The above experiments compare the performance of PGD2 -GSM and PGD under fixed ϵ = 0.08. However, the trade-off between stealthiness and the quality of the manipulated reconstruction may incentivize an adversary to either reduce or increase ϵ. To this end, we further … view at source ↗
Figure 9
Figure 9. Figure 9: Kodak dataset for reference. Each image has a resolution of either [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualizations of high-resolution GSM examples generated by PGD and PGD [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visualizations of adversarial examples generated by PGD [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Reconstructions of high-resolution GSM examples. In the parentheses, the first [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
read the original abstract

Learned image compression (LIC) integrates deep neural networks (DNNs) to map high-dimensional images into compact latent representations, reducing redundancy and achieving superior rate-distortion (RD) performance in benign settings. Unfortunately, due to inherent vulnerabilities in DNNs, LIC systems are susceptible to adversarial perturbations that lead to downstream deterioration, compression rate degradation, untargeted distortion, and both local semantic manipulation (LSM) and low-resolution ($3\times28\times28$) global semantic manipulation (GSM). However, high-resolution GSM remains unexplored due to its intractability. Notably, the existing project gradient descent (PGD) method achieves near-perfect white-box attacks for classification, segmentation, and other tasks, yet fails to generalize to high-resolution GSM. Our theoretical and empirical analyses reveal that well-performing GSM drives adversarial examples from the Identity Region to the Amplification Region through the Lazying-Oscillating-Refining stages. General $\ell_{\infty}$-bounded attacks fail on high-resolution GSM because their step-size schedules cannot accommodate both the Oscillating and Refining stages. Based on this, we propose the Periodic Geometric Decay schedule that enables $\ell_{\infty}$-bounded high-resolution GSM. To verify our approach, we integrate it with PGD, yielding a minimal variant, PGD$^{2}$-GSM. Extensive experiments on the Kodak $(3\times768\times512)$ demonstrate that our PGD$^{2}$-GSM is the first to stably achieve high-resolution GSM, thereby exposing a novel threat to LIC systems. Code is available at https://github.com/chinaliangjiaming/PGD2-GSM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that learned image compression (LIC) systems are vulnerable to adversarial attacks enabling high-resolution global semantic manipulation (GSM), but standard ℓ∞-bounded PGD fails because its step-size schedules cannot handle the Lazying-Oscillating-Refining stages needed to drive examples from the Identity Region to the Amplification Region. The authors derive a Periodic Geometric Decay schedule from their theoretical and empirical analysis of attack dynamics, integrate it into PGD to form the minimal variant PGD²-GSM, and demonstrate that it is the first method to stably achieve high-resolution GSM on the Kodak dataset (3×768×512), thereby exposing a novel threat to LIC.

Significance. If the empirical results are reproducible and the stage-based analysis holds, the work is significant for identifying a new, high-resolution adversarial threat to LIC that goes beyond low-resolution GSM or standard distortions. The proposed schedule offers a practical, minimal modification to PGD that achieves stable manipulation where prior methods do not, and the code release supports verification. This could inform future defenses in compression pipelines used for transmission or storage.

major comments (3)
  1. [Theoretical analysis section] Theoretical analysis section: The central claim that general ℓ∞-bounded attacks fail specifically because no step-size schedule can accommodate both the Oscillating and Refining stages rests on an unproven uniqueness assumption. The Lazying-Oscillating-Refining stages are presented as derived from observed behavior rather than as invariants of the LIC loss landscape, with no formal proof or exhaustive enumeration showing that alternatives (e.g., cosine annealing, per-pixel adaptive, or learned schedulers) cannot achieve equivalent traversal from Identity to Amplification Region.
  2. [Experiments section (Kodak results)] Experiments section (Kodak results): The assertion that PGD²-GSM is 'the first to stably achieve high-resolution GSM' is load-bearing for the novelty claim, yet the manuscript provides insufficient quantification of stability (e.g., success rates, variance across random seeds, or multiple LIC models) and lacks direct comparisons to other adaptive step-size schedules that might also succeed, undermining the explanatory link between the three-stage model and the necessity of Periodic Geometric Decay.
  3. [Abstract and theoretical analysis] Abstract and theoretical analysis: The assumption that well-performing high-resolution GSM requires driving examples through the specific Lazying-Oscillating-Refining stages is presented as explanatory for why standard PGD fails, but without a derivation showing these stages are necessary (rather than observed in successful runs), the argument that the proposed schedule is required to expose the threat remains circular to the empirical outcomes.
minor comments (2)
  1. [Abstract] Abstract: The low-resolution GSM size is written as ($3×28×28$) while high-resolution is ($3×768×512$); standardize the notation and ensure dimensional consistency throughout.
  2. [Notation] Notation: Define acronyms (LIC, GSM, PGD, PGD²-GSM) on first use in the main body and ensure the Periodic Geometric Decay schedule is given an explicit equation or pseudocode for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, clarifying our claims and outlining revisions to strengthen the presentation of the theoretical analysis, experimental validation, and novelty arguments.

read point-by-point responses
  1. Referee: Theoretical analysis section: The central claim that general ℓ∞-bounded attacks fail specifically because no step-size schedule can accommodate both the Oscillating and Refining stages rests on an unproven uniqueness assumption. The Lazying-Oscillating-Refining stages are presented as derived from observed behavior rather than as invariants of the LIC loss landscape, with no formal proof or exhaustive enumeration showing that alternatives (e.g., cosine annealing, per-pixel adaptive, or learned schedulers) cannot achieve equivalent traversal from Identity to Amplification Region.

    Authors: We appreciate this point and agree that our analysis does not include a formal uniqueness proof, which would indeed require exhaustive enumeration of all possible schedulers—an intractable task. The Lazying-Oscillating-Refining stages were identified through a combination of theoretical modeling of gradient dynamics across loss landscape regions and empirical trajectory analysis in both failed and successful attacks. Our contribution is the derivation of the Periodic Geometric Decay schedule specifically to handle these observed dynamics, enabling stable traversal where standard schedules fail. We will revise the theoretical analysis section to explicitly clarify that we do not assert these stages are the only possible path or that our schedule is uniquely necessary; instead, we present it as a practical, minimal modification motivated by the identified dynamics. This revision will remove any implication of exclusivity while preserving the explanatory value of the stage-based analysis. revision: yes

  2. Referee: Experiments section (Kodak results): The assertion that PGD²-GSM is 'the first to stably achieve high-resolution GSM' is load-bearing for the novelty claim, yet the manuscript provides insufficient quantification of stability (e.g., success rates, variance across random seeds, or multiple LIC models) and lacks direct comparisons to other adaptive step-size schedules that might also succeed, undermining the explanatory link between the three-stage model and the necessity of Periodic Geometric Decay.

    Authors: We acknowledge that stronger quantification and additional baselines would better support the stability claims and the connection to our stage analysis. The current experiments demonstrate that PGD²-GSM achieves stable high-resolution GSM on Kodak (3×768×512) where standard PGD does not, with code released for reproducibility. To address the concern, we will expand the experiments section to report success rates, variance across multiple random seeds, and results on additional LIC models. We will also include direct comparisons against other adaptive schedules (e.g., cosine annealing and per-pixel adaptive variants) to empirically show their limitations in handling the full Lazying-Oscillating-Refining progression. These additions will reinforce the link between the three-stage model and the effectiveness of Periodic Geometric Decay without overstating exclusivity. revision: yes

  3. Referee: Abstract and theoretical analysis: The assumption that well-performing high-resolution GSM requires driving examples through the specific Lazying-Oscillating-Refining stages is presented as explanatory for why standard PGD fails, but without a derivation showing these stages are necessary (rather than observed in successful runs), the argument that the proposed schedule is required to expose the threat remains circular to the empirical outcomes.

    Authors: We thank the referee for highlighting the risk of circularity. The stages were not posited a priori but emerged from analyzing attack trajectories: theoretical examination of how step-size affects gradient behavior in the Identity versus Amplification Regions, combined with empirical observation that successful high-resolution GSM consistently exhibits Lazying, Oscillating, and Refining phases. The Periodic Geometric Decay schedule was then derived to accommodate these phases. We will revise the abstract and theoretical analysis to more clearly delineate the empirical observations from the theoretical motivation, emphasizing that the schedule is designed to address the dynamics required for stable manipulation (as evidenced by failure of standard schedules). This will avoid any suggestion that the stages are strictly necessary a priori and instead frame them as key observed requirements that our method successfully handles, thereby exposing the threat. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper's central chain proceeds from stated theoretical/empirical observations of attack dynamics (Identity to Amplification Region via Lazying-Oscillating-Refining stages) to the claim that standard ℓ∞ schedules cannot accommodate both Oscillating and Refining phases, followed by the proposal of Periodic Geometric Decay and PGD²-GSM. No quoted equation, definition, or self-citation reduces the result to its own inputs by construction; the stages and failure diagnosis are presented as analysis outputs rather than tautological re-labelings of fitted parameters or prior self-referential results. Empirical verification on Kodak images supplies independent content, satisfying the default expectation of non-circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the Lazying-Oscillating-Refining stage analysis for high-resolution GSM and the ability of the new schedule to navigate those stages, which are presented as derived from the authors' theoretical and empirical work.

free parameters (1)
  • Hyperparameters of Periodic Geometric Decay schedule
    The schedule requires parameters to control periodicity and decay rates, likely tuned to the attack dynamics.
axioms (1)
  • domain assumption Well-performing GSM drives adversarial examples through Lazying-Oscillating-Refining stages
    Invoked in the theoretical and empirical analyses to explain why standard attacks fail.

pith-pipeline@v0.9.0 · 5606 in / 1387 out tokens · 79861 ms · 2026-05-12T02:34:03.644639+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

  1. [1]

    Toward robust neural image compression: Adversarial attack and model finetuning.IEEE Transactions on Circuits and Systems for Video Technology, 33(12):7842–7856, 2023

    Tong Chen and Zhan Ma. Toward robust neural image compression: Adversarial attack and model finetuning.IEEE Transactions on Circuits and Systems for Video Technology, 33(12):7842–7856, 2023

  2. [2]

    The jpeg still picture compression standard.Communications of the ACM, 34(4):30–44, 1991

    Gregory K Wallace. The jpeg still picture compression standard.Communications of the ACM, 34(4):30–44, 1991

  3. [3]

    The jpeg 2000 still image compression standard.IEEE Signal processing magazine, 18(5):36–58, 2002

    Athanassios Skodras, Charilaos Christopoulos, and Touradj Ebrahimi. The jpeg 2000 still image compression standard.IEEE Signal processing magazine, 18(5):36–58, 2002

  4. [4]

    Overview of the high efficiency video coding (hevc) standard.IEEE Transactions on circuits and systems for video technology, 22(12):1649–1668, 2012

    Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. Overview of the high efficiency video coding (hevc) standard.IEEE Transactions on circuits and systems for video technology, 22(12):1649–1668, 2012

  5. [5]

    Overview of the versatile video coding (vvc) standard and its applications.IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021

    Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J Sullivan, and Jens- Rainer Ohm. Overview of the versatile video coding (vvc) standard and its applications.IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021

  6. [6]

    End-to-end optimized image compres- sion

    Johannes Ballé, Valero Laparra, and Eero P Simoncelli. End-to-end optimized image compres- sion. InInternational Conference on Learning Representations, 2017

  7. [7]

    Varia- tional image compression with a scale hyperprior

    Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Varia- tional image compression with a scale hyperprior. InInternational Conference on Learning Representations, 2018

  8. [8]

    Joint autoregressive and hierarchical priors for learned image compression.Advances in neural information processing systems, 31, 2018

    David Minnen, Johannes Ballé, and George D Toderici. Joint autoregressive and hierarchical priors for learned image compression.Advances in neural information processing systems, 31, 2018

  9. [9]

    Wecon- vene: Learned image compression with wavelet-domain convolution and entropy model

    Haisheng Fu, Jie Liang, Zhenman Fang, Jingning Han, Feng Liang, and Guohe Zhang. Wecon- vene: Learned image compression with wavelet-domain convolution and entropy model. In European Conference on Computer Vision, pages 37–53. Springer, 2024

  10. [10]

    Causal context adjustment loss for learned image compression.Advances in Neural Information Processing Systems, 37:133231–133253, 2024

    Minghao Han, Shiyin Jiang, Shengxi Li, Xin Deng, Mai Xu, Ce Zhu, and Shuhang Gu. Causal context adjustment loss for learned image compression.Advances in Neural Information Processing Systems, 37:133231–133253, 2024

  11. [11]

    Llic: Large receptive field transform coding with adaptive weights for learned image compression.IEEE Transactions on Multimedia, 26:10937–10951, 2024

    Wei Jiang, Peirong Ning, Jiayu Yang, Yongqi Zhai, Feng Gao, and Ronggang Wang. Llic: Large receptive field transform coding with adaptive weights for learned image compression.IEEE Transactions on Multimedia, 26:10937–10951, 2024

  12. [12]

    Linear attention modeling for learned image compression

    Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, and Li Song. Linear attention modeling for learned image compression. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7623–7632, 2025

  13. [13]

    Learned image compression with dictionary-based entropy model

    Jingbo Lu, Leheng Zhang, Xingyu Zhou, Mu Li, Wen Li, and Shuhang Gu. Learned image compression with dictionary-based entropy model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12850–12859, 2025

  14. [14]

    Learned image compression with hierarchical progressive context modeling

    Yuqi Li, Haotian Zhang, Li Li, and Dong Liu. Learned image compression with hierarchical progressive context modeling. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18834–18843, 2025

  15. [15]

    Learned im- age compression via local-to-global cross-component prior.IEEE Transactions on Multimedia, 2026

    Wenhong Duan, Jiaye Fu, Chen Cui, Junqi Wu, Li Song, Siwei Ma, and Wen Gao. Learned im- age compression via local-to-global cross-component prior.IEEE Transactions on Multimedia, 2026. 10

  16. [16]

    Qarv++: An improved hierarchical vae for learned image compression.IEEE Transactions on Circuits and Systems for Video Technology, 2026

    Yichi Zhang, Yuning Huang, and Fengqing Zhu. Qarv++: An improved hierarchical vae for learned image compression.IEEE Transactions on Circuits and Systems for Video Technology, 2026

  17. [17]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfel- low, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

  18. [18]

    Explaining and harnessing adversar- ial examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversar- ial examples. InInternational Conference on Learning Representations, 2015

  19. [19]

    Transferable learned image compression-resistant adversarial perturbations

    Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, and Zhenzhong Chen. Transferable learned image compression-resistant adversarial perturbations. In2024 Data Compression Conference (DCC), pages 582–582. IEEE, 2024

  20. [20]

    Ma- nipulation attacks on learned image compression.IEEE Transactions on Artificial Intelligence, 5(6):3083–3097, 2023

    Kang Liu, Di Wu, Yangyu Wu, Yiru Wang, Dan Feng, Benjamin Tan, and Siddharth Garg. Ma- nipulation attacks on learned image compression.IEEE Transactions on Artificial Intelligence, 5(6):3083–3097, 2023

  21. [21]

    Backdoor attacks against deep image compression via adaptive frequency trigger

    Yi Yu, Yufei Wang, Wenhan Yang, Shijian Lu, Yap-Peng Tan, and Alex C Kot. Backdoor attacks against deep image compression via adaptive frequency trigger. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12250–12259, 2023

  22. [22]

    On the adversarial robustness of learning-based image compression against rate-distortion attacks.IEEE Transactions on Multimedia, 2025

    Chenhao Wu, Qingbo Wu, Haoran Wei, Lei Wang, Fanman Meng, King Ngi Ngan, Li Zhuo, and Hongliang Li. On the adversarial robustness of learning-based image compression against rate-distortion attacks.IEEE Transactions on Multimedia, 2025

  23. [23]

    Efficient adversarial attack and training on learned image compression

    Jun Kurihara and Heming Sun. Efficient adversarial attack and training on learned image compression. In2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 2453–2458. IEEE, 2025

  24. [24]

    Reconstruction distortion of learned image compression with imperceptible perturbations

    Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, and Zhenzhong Chen. Reconstruction distortion of learned image compression with imperceptible perturbations. In 2024 Data Compression Conference (DCC), pages 583–583. IEEE, 2024

  25. [25]

    An imperceptible adversarial attack against reconstruction for learned image compression

    Jingui Ma and Ronggang Wang. An imperceptible adversarial attack against reconstruction for learned image compression. In2024 Data Compression Conference (DCC), pages 573–573. IEEE, 2024

  26. [26]

    T-mla: A targeted multiscale log–exponential attack framework for neural image compression.Information Sciences, page 123143, 2026

    Nikolay I Kalmykov, Razan Dibo, Kaiyu Shen, Xu Zhonghan, Anh-Huy Phan, Yipeng Liu, and Ivan Oseledets. T-mla: A targeted multiscale log–exponential attack framework for neural image compression.Information Sciences, page 123143, 2026

  27. [27]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

  28. [28]

    Towards evaluating the robustness of neural networks

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017

  29. [29]

    Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness

    Jindong Gu, Hengshuang Zhao, V olker Tresp, and Philip HS Torr. Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness. InEuropean Conference on Computer Vision, pages 308–325. Springer, 2022

  30. [30]

    Towards adversarially robust object detection

    Haichao Zhang and Jianyu Wang. Towards adversarially robust object detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 421–430, 2019

  31. [31]

    The devil is in the details: Window-based attention for image compression

    Renjie Zou, Chunfeng Song, and Zhaoxiang Zhang. The devil is in the details: Window-based attention for image compression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17492–17501, 2022

  32. [32]

    Nvtc: Nonlinear vector transform cod- ing

    Runsen Feng, Zongyu Guo, Weiping Li, and Zhibo Chen. Nvtc: Nonlinear vector transform cod- ing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6101–6110, 2023. 11

  33. [33]

    Learned image compression with mixed transformer- cnn architectures

    Jinming Liu, Heming Sun, and Jiro Katto. Learned image compression with mixed transformer- cnn architectures. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14388–14397, 2023

  34. [34]

    Frequency- aware transformer for learned image compression

    Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, and Hongkai Xiong. Frequency- aware transformer for learned image compression. InInternational Conference on Learning Representations, 2024

  35. [35]

    Variable rate image compression with recurrent neural networks

    George Toderici, Sean M O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. Variable rate image compression with recurrent neural networks. InInternational Conference on Learning Representations, 2016

  36. [36]

    Lossy image compression with compressive autoencoders

    Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. Lossy image compression with compressive autoencoders. InInternational Conference on Learning Representations, 2017

  37. [37]

    Channel-wise autoregressive entropy models for learned image compression

    David Minnen and Saurabh Singh. Channel-wise autoregressive entropy models for learned image compression. In2020 IEEE International Conference on Image Processing (ICIP), pages 3339–3343. IEEE, 2020

  38. [38]

    Universally quantized neural compression.Advances in neural information processing systems, 33:12367–12376, 2020

    Eirikur Agustsson and Lucas Theis. Universally quantized neural compression.Advances in neural information processing systems, 33:12367–12376, 2020

  39. [39]

    Variable rate image compression method with dead-zone quantizer

    Jing Zhou, Akira Nakagawa, Keizo Kato, Sihan Wen, Kimihiko Kazui, and Zhiming Tan. Variable rate image compression method with dead-zone quantizer. InProceedings of the IEEE/cvf conference on computer vision and pattern recognition workshops, pages 162–163, 2020

  40. [40]

    Learned image compression with soft bit-based rate-distortion optimization

    David Alexandre, Chih-Peng Chang, Wen-Hsiao Peng, and Hsueh-Ming Hang. Learned image compression with soft bit-based rate-distortion optimization. In2019 IEEE International Conference on Image Processing (ICIP), pages 1715–1719. IEEE, 2019

  41. [41]

    Soft then hard: Rethinking the quantization in neural image compression

    Zongyu Guo, Zhizheng Zhang, Runsen Feng, and Zhibo Chen. Soft then hard: Rethinking the quantization in neural image compression. InInternational Conference on Machine Learning, pages 3920–3929. PMLR, 2021

  42. [42]

    Stanh: Paramet- ric quantization for variable rate learned image compression.IEEE Transactions on Image Processing, 34:639–651, 2025

    Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, and Marco Grangetto. Stanh: Paramet- ric quantization for variable rate learned image compression.IEEE Transactions on Image Processing, 34:639–651, 2025

  43. [43]

    Checkerboard context model for efficient learned image compression

    Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. Checkerboard context model for efficient learned image compression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14771–14780, 2021

  44. [44]

    Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding

    Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5718–5727, 2022

  45. [45]

    Entroformer: A transformer- based entropy model for learned image compression

    Yichen Qian, Ming Lin, Xiuyu Sun, Zhiyu Tan, and Rong Jin. Entroformer: A transformer- based entropy model for learned image compression. InInternational Conference on Learning Representations, 2022

  46. [46]

    Multirate progressive entropy model for learned image compression.IEEE Transactions on Circuits and Systems for Video Technology, 34(8):7725–7741, 2024

    Chao Li, Shanzhi Yin, Chuanmin Jia, Fanyang Meng, Yonghong Tian, and Yongsheng Liang. Multirate progressive entropy model for learned image compression.IEEE Transactions on Circuits and Systems for Video Technology, 34(8):7725–7741, 2024

  47. [47]

    Diversify, contextualize, and adapt: Efficient entropy modeling for neural image codec.Advances in Neural Information Processing Systems, 37:45956–45974, 2024

    Jun-Hyuk Kim, Seungeon Kim, Won-Hee Lee, and Dokwan Oh. Diversify, contextualize, and adapt: Efficient entropy modeling for neural image codec.Advances in Neural Information Processing Systems, 37:45956–45974, 2024

  48. [48]

    Robustly overfitting latents for flexible neural image compression.Advances in Neural Information Processing Systems, 37:106714–106742, 2024

    Yura Perugachi-Diaz, Arwin Gansekoele, and Sandjai Bhulai. Robustly overfitting latents for flexible neural image compression.Advances in Neural Information Processing Systems, 37:106714–106742, 2024. 12

  49. [49]

    Learning optimal lattice vector quantizers for end-to-end neural image compression.Advances in Neural Information Processing Systems, 37:106497–106518, 2024

    Xi Zhang and Xiaolin Wu. Learning optimal lattice vector quantizers for end-to-end neural image compression.Advances in Neural Information Processing Systems, 37:106497–106518, 2024

  50. [50]

    Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, and Ronggang Wang. Mlic++: Linear complex- ity multi-reference entropy modeling for learned image compression.ACM Transactions on Multimedia Computing, Communications and Applications, 21(5):1–25, 2025

  51. [51]

    Distilling complexity-scalable learned image compression models via neural architecture search.IEEE Transactions on Circuits and Systems for Video Technology, 2026

    Shen Wang, Zhengxue Cheng, Donghui Feng, Qi Wang, Qunshan Gu, Li Song, and Wenjun Zhang. Distilling complexity-scalable learned image compression models via neural architecture search.IEEE Transactions on Circuits and Systems for Video Technology, 2026

  52. [52]

    Bitstream collisions in neural image compres- sion via adversarial perturbations.arXiv preprint arXiv:2503.19817, 2025

    Jordan Madden, Lhamo Dorje, and Xiaohua Li. Bitstream collisions in neural image compres- sion via adversarial perturbations.arXiv preprint arXiv:2503.19817, 2025

  53. [53]

    Adversarial examples for generative models

    Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. In2018 ieee security and privacy workshops (spw), pages 36–42. IEEE, 2018

  54. [54]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

    Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational conference on machine learning, pages 2206–2216. PMLR, 2020

  55. [55]

    High-fidelity variable-rate image compression via invertible activation transformation

    Shilv Cai, Zhijun Zhang, Liqun Chen, Luxin Yan, Sheng Zhong, and Xu Zou. High-fidelity variable-rate image compression via invertible activation transformation. InProceedings of the 30th ACM International Conference on Multimedia, pages 2021–2031. ACM, 2022

  56. [56]

    Kodak lossless true color image suite (photocd pcd0992), 1993

    Eastman Kodak. Kodak lossless true color image suite (photocd pcd0992), 1993. 13 Appendix Section A: Proofs Section A.1: Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Section B: Supplementary Materials Section B.1: Kodak Dataset . . . . . . . . . . . . . . . . . . . ...