arxiv: 2604.05347 · v1 · submitted 2026-04-07 · 📡 eess.IV · cs.CV· cs.MM

Recognition: 2 theorem links

· Lean Theorem

CI-ICM: Channel Importance-driven Learned Image Coding for Machines

Yun Zhang , Junle Liu , Huan Zhang , Zhaoqing Pan , Gangyi Jiang , Weisi Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:35 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.MM

keywords learned image compressionmachine visionchannel importanceobject detectioninstance segmentationfeature channel groupingbitrate allocationtask adaptation

0 comments

The pith

A learned image codec for machines scores feature channel importance to allocate bits preferentially and raise task accuracy at fixed bitrates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional image codecs optimized for human eyes waste bits on details irrelevant to machines and discard features machines require. The paper introduces CI-ICM to generate importance scores for every feature channel, group and scale channels accordingly, and apply context modeling that protects high-value channels while adapting the output to multiple downstream tasks. Experiments on COCO2017 demonstrate clear gains in object detection and instance segmentation over a baseline learned codec. Readers should care because machine perception pipelines now process the majority of images; shifting compression toward task-critical features can cut transmission costs while preserving or improving AI accuracy.

Core claim

The authors propose Channel Importance-driven learned Image Coding for Machines (CI-ICM). A Channel Importance Generation module produces and ranks channel importance scores via a channel order loss. These scores feed a Feature Channel Grouping and Scaling module that non-uniformly groups channels and adjusts their dynamic ranges, plus a Channel Importance-based Context module that allocates bits to preserve fidelity in critical channels. A Task-Specific Channel Adaptation module further enhances features for multiple machine tasks. On COCO2017 the method delivers BD-mAP@50:95 gains of 16.25% in object detection and 13.72% in instance segmentation over the baseline codec.

What carries the argument

The Channel Importance Generation (CIG) module that quantifies and ranks feature-channel importance for machine tasks, enabling the Feature Channel Grouping and Scaling (FCGS) and Channel Importance-based Context (CI-CTX) modules to perform non-uniform bitrate allocation.

If this is right

Machine vision tasks obtain higher mean average precision at the same bitrate constraint.
Bitrate is allocated non-uniformly to preserve higher fidelity in channels ranked as task-critical.
A single codec supports multiple downstream tasks through the task-specific adaptation module.
Ablation studies confirm that each of the four proposed modules contributes to the measured gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same importance-driven grouping could be applied to compress video streams for surveillance or autonomous driving pipelines.
If the importance scores generalize beyond the tested models, pre-computed channel rankings might enable faster real-time encoding.
The work suggests compression loops that incorporate feedback from the downstream machine task could outperform purely reconstruction-focused codecs.

Load-bearing premise

The channel importance scores produced by the CIG module accurately reflect task-critical information across varied machine vision models and datasets.

What would settle it

Apply CI-ICM-compressed images to an object-detection or segmentation model whose architecture was not used when training the channel importance scores and check whether the BD-mAP gains disappear or reverse.

Figures

Figures reproduced from arXiv: 2604.05347 by Gangyi Jiang, Huan Zhang, Junle Liu, Weisi Lin, Yun Zhang, Zhaoqing Pan.

**Figure 2.** Figure 2: Framework of the proposed CI-ICM. (a) Network architecture of the proposed CI-ICM, (b) The reordering, grouping, and scaling feature flow of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Relationship between the number of removed channels and the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Decoding process of the CI-CTX module. the weights Wc, more features of higher importance can be reserved in the feature removal experiment, which validates the effectiveness of CIG and its generated importance weights. Based on these weights Wc, we propose a channel order loss LCO to guide the extraction process of the feature representation module ga. In this case, the output feature y from retrained ga … view at source ↗

**Figure 5.** Figure 5: MSE(Φi ch-org, Φi ch) with different si. Note that si, i ∈ [1, 2, 3] are plotted in 1/si, and s4 are plotted in log10(s4) for better observation. (a) s1, (b) s2, (c) s3, (d) s4. We established baseline priors Φi ch-org generated by the prior extraction network g i ch without any scaling. The scale table {si} is initialized as [1, 1, 2, 10, 1 × 104 ], complying with the principles si−1 ≤ si . Then, the scal… view at source ↗

**Figure 7.** Figure 7: Structure of the TSCA module. to 0.099, which are small. The Correlation Coefficients (CC) of the fitting curves are 0.955 to 0.995, which is high. The high CC and small RMSE show that the quadratic function is accurate. Therefore, based on these fitted curves, the optimal si is obtained to achieve the highest mAP@50:95, which is [1, 1.85, 2.27, 3.71, 1 × 104.38]. Note that these optimal values are obtaine… view at source ↗

**Figure 8.** Figure 8: Three stages of training for the proposed CI-ICM, where dark green [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Trained Wc of an image in COCO2017 before and after training using channel order loss. (a) Before training, (b) After training. where Ryˆi denote the bitrate of the i-th feature subset, n the the total number of subsets, Φ0 z and Φi ch denotes the hyperprior parameters of i-th feature channels. To enforce ordered feature extraction and proper feature separation, channel order loss is added to formulate th… view at source ↗

**Figure 10.** Figure 10: Rate-accuracy curves of the proposed CI-ICM and benchmark schemes on object detection task, (a) mAP@50:95, (b) mAP@50, (c) mAP@75. [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Visualization of object detection results using different codecs, where bpp values are also presented. (a) and (f): Ground truth, (b) and (g): ELIC, (c) [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 12.** Figure 12: Rate-accuracy curves of the proposed CI-ICM and baseline schemes on instance segmentation task, (a) mAP@50:95, (b) mAP@50, (c) mAP@75. [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 13.** Figure 13: Visualization of instance segmentation results from coded images, where bpp values also presented. (a) and (f): Ground truth, (b) and (g): ELIC, (c) [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

**Figure 14.** Figure 14: Rate-accuracy curves of ablation studies on object detection task and instance segmentation task, where the task accuracy is measured with [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

**Figure 15.** Figure 15: Rate-accuracy curves of generalization studies, where the task accuracy is measured with mAP@50:95. (a) Analysis on COCO 2017 dataset, Faster [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗

read the original abstract

Traditional human vision-centric image compression methods are suboptimal for machine vision centric compression due to different visual properties and feature characteristics. To address this problem, we propose a Channel Importance-driven learned Image Coding for Machines (CI-ICM), aiming to maximize the performance of machine vision tasks at a given bitrate constraint. First, we propose a Channel Importance Generation (CIG) module to quantify channel importance in machine vision and develop a channel order loss to rank channels in descending order. Second, to properly allocate bitrate among feature channels, we propose a Feature Channel Grouping and Scaling (FCGS) module that non-uniformly groups the feature channels based on their importance and adjusts the dynamic range of each group. Based on FCGS, we further propose a Channel Importance-based Context (CI-CTX) module to allocate bits among feature groups and to preserve higher fidelity in critical channels. Third, to adapt to multiple machine tasks, we propose a Task-Specific Channel Adaptation (TSCA) module to adaptively enhance features for multiple downstream machine tasks. Experimental results on the COCO2017 dataset show that the proposed CI-ICM achieves BD-mAP@50:95 gains of 16.25$\%$ in object detection and 13.72$\%$ in instance segmentation over the established baseline codec. Ablation studies validate the effectiveness of each contribution, and computation complexity analysis reveals the practicability of the CI-ICM. This work establishes feature channel optimization for machine vision-centric compression, bridging the gap between image coding and machine perception.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CI-ICM adds four modules to rank and allocate bits by channel importance for machine tasks, with reported BD-mAP gains on COCO that look plausible but may be tied to the specific training setup.

read the letter

The paper's core move is to treat feature channels differently in a learned codec depending on how much they matter for downstream machine vision. It introduces CIG to score and order channels via a new loss, FCGS to group and scale them non-uniformly, CI-CTX to shape the context model around those groups, and TSCA to adapt the features for multiple tasks. That combination is new relative to standard learned compression pipelines that treat channels more uniformly or focus on human perception metrics.

Referee Report

3 major / 2 minor

Summary. The paper proposes CI-ICM, a learned image codec for machine vision tasks that introduces a Channel Importance Generation (CIG) module with channel order loss, a Feature Channel Grouping and Scaling (FCGS) module, a Channel Importance-based Context (CI-CTX) module, and a Task-Specific Channel Adaptation (TSCA) module. On COCO2017, it reports BD-mAP@50:95 gains of 16.25% for object detection and 13.72% for instance segmentation over a baseline codec, supported by ablations and complexity analysis.

Significance. If reproducible and generalizable, the work could advance machine-centric compression by demonstrating that non-uniform bit allocation based on learned channel importance improves downstream task performance at fixed rates. The explicit ablation studies and complexity analysis are strengths that support practical claims; however, the absence of baseline specifications and cross-task validation limits the assessed impact.

major comments (3)

[Abstract] Abstract: The central performance claim (BD-mAP gains of 16.25% detection / 13.72% segmentation) is presented without naming the baseline codec, its rate points, or any statistical significance tests, preventing verification of the reported improvements.
[Experimental Results] Experimental section (implied by results on COCO2017): The TSCA module is described as enabling adaptation to multiple tasks, yet only results for the two COCO tasks are shown; without cross-model or cross-task transfer experiments, it remains unclear whether the CIG-derived importance scores capture general machine-critical features or merely overfit to the specific detection/segmentation heads used in training.
[Method] Method description (CIG and FCGS modules): The channel order loss and subsequent non-uniform grouping/scaling assume that importance scores derived from gradients or activations generalize across varied machine vision models, but no evidence is provided that the added modules avoid introducing distribution shifts harmful to unseen downstream models.

minor comments (2)

[Abstract] The abstract states that 'computation complexity analysis reveals the practicability' but does not quantify the overhead of the CIG/FCGS/CI-CTX/TSCA modules relative to the baseline.
[Method] Notation for channel importance scores and grouping is introduced without an explicit equation or diagram reference in the provided summary, which could be clarified for reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback and positive recognition of the ablation studies and complexity analysis. We address each major comment below with clarifications and proposed revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim (BD-mAP gains of 16.25% detection / 13.72% segmentation) is presented without naming the baseline codec, its rate points, or any statistical significance tests, preventing verification of the reported improvements.

Authors: We agree that the abstract should enable immediate verification. The baseline is the standard learned image codec (without CIG, FCGS, CI-CTX, or TSCA modules) as defined in Section III and used for all rate-distortion curves in Section IV. The BD-mAP@50:95 values are computed over the same set of rate points shown in Figures 3 and 4 (approximately 0.1–0.8 bpp). While statistical significance tests are not standard in learned compression literature, we will add a sentence to the abstract naming the baseline explicitly and referencing the rate points and evaluation protocol used in the experimental section. revision: yes
Referee: [Experimental Results] Experimental section (implied by results on COCO2017): The TSCA module is described as enabling adaptation to multiple tasks, yet only results for the two COCO tasks are shown; without cross-model or cross-task transfer experiments, it remains unclear whether the CIG-derived importance scores capture general machine-critical features or merely overfit to the specific detection/segmentation heads used in training.

Authors: The TSCA module is trained jointly with the two COCO tasks (detection and instance segmentation) that employ distinct heads, and the reported gains demonstrate that the same channel importance scores can be adapted to both. We acknowledge that this does not constitute full cross-model transfer (e.g., to classification or different backbones). We will revise the experimental section to explicitly state the scope of the current validation, add a limitations paragraph discussing potential task-specific overfitting, and note that TSCA fine-tuning would be required for new heads. revision: partial
Referee: [Method] Method description (CIG and FCGS modules): The channel order loss and subsequent non-uniform grouping/scaling assume that importance scores derived from gradients or activations generalize across varied machine vision models, but no evidence is provided that the added modules avoid introducing distribution shifts harmful to unseen downstream models.

Authors: The channel importance is computed from task-specific gradients and activations, and the channel order loss enforces a stable ranking that prioritizes task-critical channels. Ablation results (Table II) show consistent gains when CIG/FCGS are included, indicating that the non-uniform allocation improves rather than harms the tested tasks. We do not claim zero distribution shift for arbitrary unseen models; TSCA is designed precisely to mitigate task-specific shifts via adaptation. We will add a short discussion in Section III clarifying this scope and the role of TSCA for new models. revision: partial

standing simulated objections not resolved

Comprehensive experiments on completely unseen downstream models (different architectures or tasks without any fine-tuning) to quantify potential distribution shifts introduced by CIG/FCGS.

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper proposes a set of architectural modules (CIG for channel importance, FCGS for grouping/scaling, CI-CTX for context allocation, and TSCA for task adaptation) within a learned image codec and reports empirical BD-mAP gains on COCO2017 for detection and segmentation. No mathematical derivation, first-principles prediction, or fitted parameter is presented as a 'result' that reduces to its own inputs by construction. The central claims are performance measurements from training and evaluation, not self-referential definitions or renamed known patterns. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz that would force the outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 4 invented entities

The central claim rests on the domain assumption that feature channels can be meaningfully ranked for machine tasks and that the proposed modules can be trained end-to-end without harming rate-distortion behavior. No numerical free parameters are stated. The four modules are new invented components whose independent evidence is limited to the reported experiments.

axioms (1)

domain assumption Feature channels in learned codecs carry unequal importance for downstream machine vision tasks.
Invoked to justify the CIG module and subsequent grouping.

invented entities (4)

Channel Importance Generation (CIG) module no independent evidence
purpose: Quantify and rank channel importance for machine vision
New component introduced to generate importance scores.
Feature Channel Grouping and Scaling (FCGS) module no independent evidence
purpose: Non-uniform grouping and dynamic-range adjustment of channels
New component for bitrate allocation.
Channel Importance-based Context (CI-CTX) module no independent evidence
purpose: Context modeling that preserves fidelity in critical channels
New component for entropy coding.
Task-Specific Channel Adaptation (TSCA) module no independent evidence
purpose: Adapt features for multiple downstream machine tasks
New component for multi-task support.

pith-pipeline@v0.9.0 · 5590 in / 1388 out tokens · 56155 ms · 2026-05-10T19:35:30.982826+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a CIG module to explicitly analyze the importance of feature channels. Complemented by a novel channel order loss, CI-ICM extracts the ordered feature representation...
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the features are divided into n uneven groups... high importance features are set to a small group

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Towards efficient front-end visual sensing for digital retina: A model- centric paradigm,

Y . Lou, L.-Y . Duan, Y . Luo, Z. Chen, T. Liu, S. Wang, and W. Gao, “Towards efficient front-end visual sensing for digital retina: A model- centric paradigm,”IEEE Trans. Multimedia, vol. 22, no. 11, pp. 3002– 3013, Nov. 2020

work page 2020
[2]

Empowering things with intelligence: A survey of the progress, challenges, and opportunities in artificial intelligence of things,

J. Zhang and D. Tao, “Empowering things with intelligence: A survey of the progress, challenges, and opportunities in artificial intelligence of things,”IEEE Internet Things J., vol. 8, no. 10, pp. 7789–7817, May 2021

work page 2021
[3]

Device- edge-cloud collaborative acceleration method towards occluded face recognition in high-traffic areas,

P. Zhang, F. Huang, D. Wu, B. Yang, Z. Yang, and L. Tan, “Device- edge-cloud collaborative acceleration method towards occluded face recognition in high-traffic areas,”IEEE Trans. Multimedia, vol. 25, pp. 1513–1520, Mar. 2023

work page 2023
[4]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Trans. Circuit Syst. Video Technol., vol. 31, no. 10, pp. 3736–3764, Oct. 2021

work page 2021
[5]

Hnr-isc: Hybrid neural representation for image set compression,

P. Zhang, S. Wang, M. Wang, P. Chen, W. Wu, X. Wang, and S. Kwong, “Hnr-isc: Hybrid neural representation for image set compression,”IEEE Trans. Multimedia, vol. 27, pp. 28–40, Dec. 2025

work page 2025
[6]

Recent advances in end-to-end learned image and video compression,

W.-H. Peng and H.-M. Hang, “Recent advances in end-to-end learned image and video compression,” inIEEE Int. Conf. Vis. Commun. Image Process., Macau, China, Dec. 2020, pp. 1–2

work page 2020
[7]

Video coding for machines: A paradigm of collaborative compression and intelligent analytics,

L. Duan, J. Liu, W. Yang, T. Huang, and W. Gao, “Video coding for machines: A paradigm of collaborative compression and intelligent analytics,”IEEE Trans. Image Process., vol. 29, pp. 8680–8695, Aug. 2020

work page 2020
[8]

Task-driven video compression for humans and machines: Framework design and opti- mization,

X. Yi, H. Wang, S. Kwong, and C.-C. Jay Kuo, “Task-driven video compression for humans and machines: Framework design and opti- mization,”IEEE Trans. Multimedia, vol. 25, pp. 8091–8102, Dec. 2023

work page 2023
[9]

Just noticeable difference for deep machine vision,

J. Jin, X. Zhang, X. Fu, H. Zhang, W. Lin, J. Lou, and Y . Zhao, “Just noticeable difference for deep machine vision,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 6, pp. 3452–3461, Jun. 2022

work page 2022
[10]

Task-aware quantization network for jpeg image compression,

J. Choi and B. Han, “Task-aware quantization network for jpeg image compression,” inEur. Conf. Comput. Vis., Nov. 2020, pp. 309–324

work page 2020
[11]

Visual analysis motivated rate- distortion model for image coding,

Z. Huang, C. Jia, S. Wang, and S. Ma, “Visual analysis motivated rate- distortion model for image coding,” inIEEE Int. Conf. Multimedia Expo, Shenzhen, China, Jun. 2021, pp. 1–6

work page 2021
[12]

Saliency segmentation oriented deep image compression with novel bit allocation,

Y . Li, W. Gao, G. Li, and S. Ma, “Saliency segmentation oriented deep image compression with novel bit allocation,”IEEE Trans. Image Process., vol. 34, pp. 16–29, Nov. 2025

work page 2025
[13]

Learning to predict object-wise just recognizable distortion for image and video compression,

Y . Zhang, H. Lin, J. Sun, L. Zhu, and S. Kwong, “Learning to predict object-wise just recognizable distortion for image and video compression,”IEEE Trans. Multimedia, vol. 26, pp. 5925–5938, Dec. 2024

work page 2024
[14]

Towards coding for human and machine vision: Scalable face image coding,

S. Yang, Y . Hu, W. Yang, L.-Y . Duan, and J. Liu, “Towards coding for human and machine vision: Scalable face image coding,”IEEE Trans. Multimedia, vol. 23, pp. 2957–2971, Mar. 2021

work page 2021
[15]

Image coding for machines with edge information learning using segment anything,

T. Shindo, K. Yamada, T. Watanabe, and H. Watanabe, “Image coding for machines with edge information learning using segment anything,” inIEEE Int. Conf. Image Process., Abu Dhabi, UAE, Oct. 2024, pp. 3702–3708

work page 2024
[16]

Rethink- ing semantic image compression: Scalable representation with cross- modality transfer,

P. Zhang, S. Wang, M. Wang, J. Li, X. Wang, and S. Kwong, “Rethink- ing semantic image compression: Scalable representation with cross- modality transfer,”IEEE Trans. Circuit Syst. Video Technol., vol. 33, no. 8, pp. 4441–4445, Aug. 2023

work page 2023
[17]

End-to-end compression towards machine vision: Network architecture design and optimization,

S. Wang, Z. Wang, S. Wang, and Y . Ye, “End-to-end compression towards machine vision: Network architecture design and optimization,” IEEE Open J. Circuits Syst., vol. 2, pp. 675–685, Nov. 2021

work page 2021
[18]

Learned image coding for machines: A content-adaptive approach,

N. Le, H. Zhang, F. Cricri, R. Ghaznavi-Youvalari, H. R. Tavakoli, and E. Rahtu, “Learned image coding for machines: A content-adaptive approach,” inIEEE Int. Conf. Multimedia Expo, Shenzhen, China, Nov. 2021, pp. 1–6

work page 2021
[19]

Image coding for machines: an end-to-end learned approach,

N. Le, H. Zhang, F. Cricri, R. Ghaznavi-Youvalari, and E. Rahtu, “Image coding for machines: an end-to-end learned approach,” inICASSP IEEE Int. Conf. Acoust. Speech Signal Process, Jun. 2021, pp. 1590–1594

work page 2021
[20]

End-to-end learning of compressible features,

S. Singh, S. Abu-El-Haija, N. Johnston, J. Ball ´e, A. Shrivastava, and G. Toderici, “End-to-end learning of compressible features,” inIEEE Int. Conf. Image Process., Abu Dhabi, UAE, Oct. 2020, pp. 3349–3353

work page 2020
[21]

SC2 benchmark: Supervised compression for split computing,

Y . Matsubara, R. Yang, M. Levorato, and S. Mandt, “SC2 benchmark: Supervised compression for split computing,”Trans. Mach. Learn. Res., pp. 1–20, Jun. 2023

work page 2023
[22]

Multiscale feature importance-based bit allocation for end-to-end feature coding for ma- chines,

J. Liu, Y . Zhang, Z. Guo, X. Huang, and G. Jiang, “Multiscale feature importance-based bit allocation for end-to-end feature coding for ma- chines,”ACM Trans. Multimed. Comput. Commun. Appl., vol. 21, no. 9, pp. 1–19, Sep. 2025

work page 2025
[23]

End-to-end optimized image compression for machines, a study,

L. D. Chamain, F. Racap ´e, J. B ´egaint, A. Pushparaja, and S. Feltman, “End-to-end optimized image compression for machines, a study,” in Data Compression Conf., Snowbird, UT, USA, Mar. 2021, pp. 163–172

work page 2021
[24]

Rate-distortion theory in coding for machines and its applications,

A. Harell, Y . Foroutan, N. Ahuja, P. Datta, B. Kanzariya, V . S. So- mayazulu, O. Tickoo, A. de Andrade, and I. V . Baji ´c, “Rate-distortion theory in coding for machines and its applications,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 7, pp. 5501–5519, Jul. 2025

work page 2025
[25]

Improving multiple machine vision tasks in the compressed domain,

J. Liu, H. Sun, and J. Katto, “Improving multiple machine vision tasks in the compressed domain,” inInt. Conf. Pattern Recog., Montreal, QC, Canada, Aug. 2022, pp. 331–337

work page 2022
[26]

Scalable image coding for humans and machines,

H. Choi and I. V . Baji ´c, “Scalable image coding for humans and machines,”IEEE Trans. Image Process., vol. 31, pp. 2739–2754, Mar. 2022

work page 2022
[27]

Latent-space scalability for multi-task collabo- rative intelligence,

H. Choi and I. V . Baji ´c, “Latent-space scalability for multi-task collabo- rative intelligence,” inIEEE Int. Conf. Image Process., Anchorage, AK, USA, Sep. 2021, pp. 3562–3566

work page 2021
[28]

Unified and scalable deep image compression framework for human and machine,

G. Zhang, X. Zhang, and L. Tang, “Unified and scalable deep image compression framework for human and machine,”ACM Trans. Multi- media Comput. Commun. Appl., vol. 20, no. 10, pp. 1–22, Oct. 2024

work page 2024
[29]

Learned disentangled latent representations for scalable image coding for humans and ma- chines,

E. ¨Ozyılkan, M. Ulhaq, H. Choi, and F. Racap ´e, “Learned disentangled latent representations for scalable image coding for humans and ma- chines,” inData Compression Conf., Snowbird, UT, USA, Mar. 2023, pp. 42–51

work page 2023
[30]

Learnt mutual feature compression for machine vision,

T. Liu, M. Xu, S. Li, C. Chen, L. Yang, and Z. Lv, “Learnt mutual feature compression for machine vision,” inIEEE Int. Conf. Acoust. Speech Signal Process., Rhodes Island, Greece, Jun. 2023, pp. 1–5

work page 2023
[31]

Semantically scalable image coding with compression of feature maps,

N. Yan, D. Liu, H. Li, and F. Wu, “Semantically scalable image coding with compression of feature maps,” inIEEE Int. Conf. Image Process., Abu Dhabi, UAE, Oct. 2020, pp. 3114–3118

work page 2020
[32]

Semantic and saliency-aware scalable image coding towards human-machine collaboration,

T. Cui, Y . Wang, Y . Wang, and Z. Fang, “Semantic and saliency-aware scalable image coding towards human-machine collaboration,”IEEE Trans. Circuit Syst. Video Technol., pp. 1–1, May 2025

work page 2025
[33]

Transtic: Transferring transformer-based image compression from hu- man perception to machine perception,

Y .-H. Chen, Y . Weng, C.-H. Kao, C. Chien, W.-C. Chiu, and W. Peng, “Transtic: Transferring transformer-based image compression from hu- man perception to machine perception,” inIEEE/CVF Int. Conf. Comput. Vis., Paris, France, Oct. 2023, pp. 23 240–23 250

work page 2023
[34]

Im- age compression for machine and human vision with spatial-frequency adaptation,

H. Li, S. Li, S. Ding, W. Dai, M. Cao, C. Li, J. Zou, and H. Xiong, “Im- age compression for machine and human vision with spatial-frequency adaptation,” inEur. Conf. Comput. Vis., Milan, Italy, Oct. 2024, pp. 382–399

work page 2024
[35]

Faster r-cnn: Towards real-time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017

work page 2017
[36]

Elic: Efficient learned image compression with unevenly grouped space- channel contextual adaptive coding,

D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y . Wang, “Elic: Efficient learned image compression with unevenly grouped space- channel contextual adaptive coding,” inIEEE/CVF Conf. Comput. Vis. Pattern Recog., New Orleans, LA, USA, Jun. 2022, pp. 5708–5717

work page 2022
[37]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., Salt Lake City, UT, USA, Jun. 2018, pp. 7132–7141

work page 2018
[38]

Mlic: Multi- reference entropy model for learned image compression,

W. Jiang, J. Yang, Y . Zhai, P. Ning, F. Gao, and R. Wang, “Mlic: Multi- reference entropy model for learned image compression,” inACM Int. Conf. Multimedia, New York, NY , USA, Oct. 2023, pp. 7618 – 7627

work page 2023
[39]

Real- time evaluation of object detection models across open world scenarios,

P. Goswami, L. Aggarwal, A. Kumar, R. Kanwar, and U. Vasisht, “Real- time evaluation of object detection models across open world scenarios,” Appl. Soft Comput., vol. 163, p. 111921, Sep. 2024

work page 2024
[40]

On biasing transformer attention towards monotonicity,

A. R. Gonzales, C. Amrhein, N. Aepli, and R. Sennrich, “On biasing transformer attention towards monotonicity,” inN. Am. Chapter Assoc. Comput. Linguist.Online: Association for Computational Linguistics, Jun. 2021, pp. 4474–4488

work page 2021
[41]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” inIEEE Int. Conf. Comput. Vis., Venice, Italy, Oct. 2017, pp. 2980–2988

work page 2017
[42]

The jpeg 2000 still image compression standard,

A. Skodras, C. Christopoulos, and T. Ebrahimi, “The jpeg 2000 still image compression standard,”IEEE Signal Process. Mag., vol. 18, no. 5, pp. 36–58, Sep. 2001

work page 2000
[43]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,”Int. J. Comput. Vision, vol. 88, no. 2, pp. 303–338, Sep. 2010. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , VOL.XX, NO.XX, 2025 15 Yun Zhang(Senior Member, IEEE) received the B.S. and M.S. degrees in electrical e...

work page 2010