arxiv: 2605.13243 · v1 · submitted 2026-05-13 · 📡 eess.IV

Recognition: 2 theorem links

· Lean Theorem

Spatial Competition for Low-Complexity Learned Image Compression

Th\'eophile Blard , Pierrick Philippe , Th\'eo Ladune , Xiaoran Jiang , Olivier D\'eforges

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:46 UTC · model grok-4.3

classification 📡 eess.IV

keywords learned image compressionneural codecsspatial selectionmode maplow-complexity decodingrate-distortionHEVCCLIC dataset

0 comments

The pith

Multiple specialized neural codecs compete per region via a transmitted mode map to deliver better rates while keeping decoding as cheap as one codec.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a way to get the compression gains of learned autoencoder codecs without paying their usual high decoding cost. Several neural codecs are trained to specialize on different image content; the encoder evaluates each one on every region and picks the best according to a rate-distortion cost. Only a compact mode map is sent to tell the decoder which codec applies where. At decode time the system activates just the chosen codec for each region, so total complexity stays equal to that of a single network. On the CLIC 2020 test set this yields up to 14.5 percent lower rate than any individual codec and reaches the efficiency of HEVC while using only 1433 MACs per pixel.

Core claim

By letting the encoder select, for each spatial region, the neural codec that minimizes rate-distortion cost and by transmitting a mode map that records those choices, the decoder can reconstruct the image using only the selected codec for each region; the result is content-adaptive performance at the computational cost of a single codec.

What carries the argument

The mode map that records per-region codec selection and guides the decoder to apply only the indicated specialized network for each area.

If this is right

Up to 14.5 percent rate reduction relative to any single learned codec on the CLIC 2020 dataset.
HEVC-level rate-distortion performance at a fixed decoding cost of 1433 MACs per pixel.
Encoding remains fast because selection occurs per image without extra decoder cost.
The system adapts to local image statistics while preserving the complexity of one codec.
Per-region choice is performed once at encode time and signaled once via the mode map.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same per-region selection idea could be extended to video by adding temporal consistency to the mode map.
Hardware decoders could preload all candidate networks and switch between them using only the mode-map index.
Joint training of the codec set together with the selection rule might further enlarge the observed rate savings.

Load-bearing premise

The extra bits needed to transmit the mode map stay small enough that they do not erase the rate savings from choosing the locally best codec.

What would settle it

Measure the total rate, including mode-map overhead, on the same CLIC images; if that total rate is not lower than the rate of the single best codec, the claimed net gain disappears.

read the original abstract

Autoencoder-based image codecs achieve state-of-the-art compression performance but often incur high computational complexity, particularly at decoding time. This work introduces a low-complexity learned image compression framework based on spatial competition between multiple specialized neural codecs. For each image region, the encoder selects the codec that best matches the local content according to a rate-distortion cost. A mode map is transmitted as side information to indicate the per-region codec selection. At decoding time, this mode map-based selection guides reconstruction while preserving the complexity of a single codec. This design enables per-image adaptation with low decoding complexity and fast encoding. On the CLIC 2020 dataset, our method achieves up to -14.5% rate reduction compared to a single codec and reaches HEVC-level performance with a decoding complexity of 1433 MACs per pixel.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The mode map approach for spatial codec selection looks workable for cutting decode complexity but the side-info cost needs explicit numbers to confirm any real rate win.

read the letter

The main takeaway is a straightforward way to get some rate improvement in learned image compression without paying the full decode cost of multiple models. The encoder picks among several specialized codecs per region using rate-distortion cost, then sends a mode map so the decoder runs only the chosen one. This keeps decoding at single-codec complexity while allowing per-image adaptation, which directly tackles a practical limit in the field.

Referee Report

2 major / 2 minor

Summary. The paper proposes a low-complexity learned image compression method that employs spatial competition among multiple specialized autoencoder codecs. For each image region the encoder selects the codec minimizing a rate-distortion cost; a mode map is transmitted as side information to guide the decoder, which then runs only the selected codec per region. This yields per-image adaptation while keeping decoding complexity comparable to a single codec. On the CLIC 2020 dataset the method is reported to deliver up to 14.5 % rate reduction relative to a single learned codec and to reach HEVC-level rate-distortion performance at a decoding complexity of 1433 MACs per pixel.

Significance. If the mode-map overhead proves negligible and the per-region selection is robust, the framework offers a practical route to content-adaptive learned compression without decoder complexity inflation, potentially closing the complexity gap between learned codecs and HEVC while retaining rate-distortion gains. Concrete complexity figures and evaluation on a public dataset are positive attributes.

major comments (2)

[Abstract, §3] Abstract and §3 (mode-map transmission): the headline -14.5 % rate-reduction claim is load-bearing on the side-information rate of the mode map remaining small. The manuscript must report the average mode-map bitrate in bpp, its fraction of total rate, the region granularity (e.g., block size), the number of competing codecs, and the entropy coder used for the map; without these numbers it is impossible to verify that the reported savings are net of overhead.
[§4] §4 (encoding procedure): the claim of “fast encoding” requires explicit quantification of the extra cost incurred by evaluating the rate-distortion cost for every competing codec on every region. If all codecs must be run to compute the selection, the encoding complexity may exceed that of a single codec by a large factor; an approximation or early-termination strategy should be described and timed.

minor comments (2)

[Table 1] Table 1 or equivalent: add a column or footnote stating the exact region size and number of codecs used for the CLIC 2020 results.
[§2] §2: the relation to prior spatial-adaptation and mixture-of-experts codecs should be expanded with explicit citations and a short comparison table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to supply the requested quantitative details.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (mode-map transmission): the headline -14.5 % rate-reduction claim is load-bearing on the side-information rate of the mode map remaining small. The manuscript must report the average mode-map bitrate in bpp, its fraction of total rate, the region granularity (e.g., block size), the number of competing codecs, and the entropy coder used for the map; without these numbers it is impossible to verify that the reported savings are net of overhead.

Authors: We agree that these figures are required to substantiate the net savings. The revised manuscript will add a concise paragraph in §3 (with a brief mention in the abstract) that states the average mode-map bitrate in bpp, its fraction of total rate, the region granularity, the number of competing codecs, and the entropy coder used for the map. revision: yes
Referee: [§4] §4 (encoding procedure): the claim of “fast encoding” requires explicit quantification of the extra cost incurred by evaluating the rate-distortion cost for every competing codec on every region. If all codecs must be run to compute the selection, the encoding complexity may exceed that of a single codec by a large factor; an approximation or early-termination strategy should be described and timed.

Authors: We acknowledge that the encoding overhead must be quantified to support the fast-encoding claim. The revised §4 will report measured encoding complexity (including the cost of RD evaluation over all codecs and regions) together with the early-termination heuristic used to limit the overhead. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with external validation

full rationale

The paper describes a spatial competition method that selects among multiple learned codecs per region using rate-distortion optimization and transmits a mode map for decoder guidance. Claims rest on direct empirical comparison to independent baselines (single codec, HEVC) on the public CLIC 2020 dataset. No equations reduce a prediction to a fitted input by construction, no self-citation chain supports a uniqueness theorem, and no ansatz is smuggled via prior work. The derivation is self-contained engineering design plus measurement; the reported rate and complexity numbers are not forced by redefinition of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard rate-distortion optimization and the assumption that multiple specialized autoencoders can be trained independently without prohibitive total training cost.

axioms (1)

domain assumption Rate-distortion cost is an appropriate metric for selecting the best codec per region
Standard practice in compression literature; invoked implicitly when describing encoder selection.

pith-pipeline@v0.9.0 · 5449 in / 1096 out tokens · 36095 ms · 2026-05-14T18:46:14.988566+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mk = arg min_m L(xk; θm, ϕm, ψm) … R(m) = K ⌈log₂ M⌉ / HW bpp
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decoding complexity remains … κdec = κgs + κp = 1433 MAC/pixel

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Spatial Competition for Low-Complexity Learned Image Compression

INTRODUCTION AND RELA TED WORKS Learned image codecs [1, 2] have recently surpassed con- ventional codecs (HEVC [3], VVC [4]) in rate-distortion (RD) performance. These methods typically train autoen- coders end-to-end to minimize the average RD cost over a large dataset. Once trained, their parameters are fixed, and compression of unseen images relies so...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[2]

METHOD 2.1. Autoencoder-based learned image compression Learned image compression [16] is commonly formulated within the transform coding paradigm using autoencoder ar- chitectures optimized under a rate-distortion objective. It comprises an analysis transformg a(·;θ), a synthesis trans- formg s(·;ϕ)and an entropy modelp(·;ψ), parameterized byθ,ϕ, andψ, r...
[3]

Experimental setup Model architecture.All experiments are conducted using the same autoencoder architecture

EXPERIMENTS 3.1. Experimental setup Model architecture.All experiments are conducted using the same autoencoder architecture. To enable low-complexity decoding, the synthesis transform and entropy model are taken from Cool-chic 4.0 [14], with respective complexities ofκ gs = 708andκ p = 725MAC/pixel. To form a com- plete autoencoder, a compatible analysis...

2020
[4]

By selecting, for each image region, the codec that minimizes the local rate-distortion cost, the proposed approach enables efficient per-image adaptation

CONCLUSION This work introduces a low-complexity learned image compression framework based on spatial competition be- tween multiple specialized neural codecs. By selecting, for each image region, the codec that minimizes the local rate-distortion cost, the proposed approach enables efficient per-image adaptation. On the CLIC 2020 dataset, our method achi...

2020
[5]

EVC: Towards real-time neural image compression with mask de- cay,

Guo-Hua Wang, Jiahao Li, Bin Li, and Yan Lu, “EVC: Towards real-time neural image compression with mask de- cay,” inInternational Conference on Learning Representa- tions, 2023

2023
[6]

Towards practical real-time neural video compression,

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu, “Towards practical real-time neural video compression,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12543–12552

2025
[7]

Overview of the high efficiency video coding (HEVC) standard,

Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand, “Overview of the high efficiency video coding (HEVC) standard,”IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012

2012
[8]

Overview of the Versatile Video Coding (VVC) standard and its applications,

Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, and Jens-Rainer Ohm, “Overview of the Versatile Video Coding (VVC) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technol- ogy, vol. 31, no. 10, pp. 3736–3764, 2021

2021
[9]

On efficient neural network architectures for image compression,

Yichi Zhang, Zhihao Duan, and Fengqing Zhu, “On efficient neural network architectures for image compression,” in2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 3674–3680

2024
[10]

Architecture optimizations for improving neural im- age compression compute complexity,

Matthew Muckley, Marton Havasi, Jakob Verbeek, and Karen Ullrich, “Architecture optimizations for improving neural im- age compression compute complexity,” in2025 Data Com- pression Conference (DCC). IEEE, 2025, pp. 3–12

2025
[11]

Grouped transform for ultra-low-complexity learned image compression,

Wen Tan, Youneng Bao, Fanyang Meng, and Yongsheng Liang, “Grouped transform for ultra-low-complexity learned image compression,” in2025 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2025, pp. 1–5

2025
[12]

Computationally-efficient neural image compression with shallow decoders,

Yibo Yang and Stephan Mandt, “Computationally-efficient neural image compression with shallow decoders,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 530–540

2023
[13]

Asymllic: Asymmetric lightweight learned image compression,

Shen Wang, Zhengxue Cheng, Donghui Feng, Guo Lu, Li Song, and Wenjun Zhang, “Asymllic: Asymmetric lightweight learned image compression,” in2024 IEEE In- ternational Conference on Visual Communications and Image Processing (VCIP), 2024, pp. 1–5

2024
[14]

Computationally efficient neural image compression,

Nick Johnston, Elad Eban, Ariel Gordon, and Johannes Ball ´e, “Computationally efficient neural image compression,”arXiv preprint arXiv:1912.08771, 2019

work page arXiv 1912
[15]

Structured pruning and quantization for learned image compression,

Md Adnan Faisal Hossain and Fengqing Zhu, “Structured pruning and quantization for learned image compression,” in 2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 3730–3736

2024
[16]

Knowledge distillation for learned image compression,

Yunuo Chen, Zezheng Lyu, Bing He, Ning Cao, Gang Chen, Guo Lu, and Wenjun Zhang, “Knowledge distillation for learned image compression,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4996– 5006

2025
[17]

Cool-chic: Coordinate-based low complexity hierarchical image codec,

Th ´eo Ladune, Pierrick Philippe, F ´elix Henry, Gordon Clare, and Thomas Leguay, “Cool-chic: Coordinate-based low complexity hierarchical image codec,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 13515–13522

2023
[18]

The Cool-chic image and video codec,

Orange Research, “The Cool-chic image and video codec,” https://github.com/Orange-OpenSource/ Cool-Chic/releases/tag/v4.0.0, 2025, Open- source software, version 4.0

2025
[19]

Overfitted image coding at reduced complexity,

Th ´eophile Blard, Th ´eo Ladune, Pierrick Philippe, Gordon Clare, Xiaoran Jiang, and Olivier D´eforges, “Overfitted image coding at reduced complexity,” in2024 32nd European Signal Processing Conference (EUSIPCO). IEEE, 2024, pp. 927–931

2024
[20]

Variational image compression with a scale hyperprior,

Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston, “Variational image compression with a scale hyperprior,” inInternational Conference on Learn- ing Representations, 2018

2018
[21]

Workshop and challenge on learned image com- pression (CLIC2020),

George Toderici, Wenzhe Shi, Radu Timofte, Lucas Theis, Jo- hannes Ball ´e, Eirikur Agustsson, Nick Johnston, and Fabian Mentzer, “Workshop and challenge on learned image com- pression (CLIC2020),” 2020

2020
[22]

Vector quantization,

Robert Gray, “Vector quantization,”IEEE Assp Magazine, vol. 1, no. 2, pp. 4–29, 1984

1984
[23]

thesis, INSA de Rennes, 2015

Adri `a Arrufat Batalla,Multiple Transforms for Video Coding, Ph.D. thesis, INSA de Rennes, 2015

2015
[24]

Unsplash dataset,

Luke Chesser, Timothy Carbone, and Ali Zahid, “Unsplash dataset,”https://unsplash.com/data, 2020

2020
[25]

Adam: A Method for Stochastic Optimization

Diederik P Kingma, “Adam: A method for stochastic opti- mization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

C3: High-performance and low-complexity neural compression from a single image or video,

Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, and Emilien Dupont, “C3: High-performance and low-complexity neural compression from a single image or video,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 9347–9358

2024
[27]

Kodak lossless true color image suite,

Eastman Kodak, “Kodak lossless true color image suite,” https://r0k.us/graphics/kodak/, 1993

1993
[28]

JPEG-AI test images,

JPEG-AI, “JPEG-AI test images,”https://jpegai. github.io/test_images, 2020

2020
[29]

Calculation of average PSNR differences between RD-curves,

Gisle Bjontegaard, “Calculation of average PSNR differences between RD-curves,”ITU-Telecommunications Standardiza- tion Document, 2001

2001