Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

Junqi Liu; Long Xu; Weisi Lin; Xiaoxia Huang; Yun Zhang

arxiv: 2604.09421 · v1 · submitted 2026-04-10 · 📡 eess.IV · cs.CV· cs.MM

Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

Junqi Liu , Yun Zhang , Xiaoxia Huang , Long Xu , Weisi Lin This is my paper

Pith reviewed 2026-05-10 16:39 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.MM

keywords just recognizable differencevideo coding for machinesmulti-task learningobject detectioninstance segmentationkeypoint detectionperceptual modelingattribute fusion

0 comments

The pith

An attribute-assisted multi-task model predicts just recognizable differences across three machine vision tasks to support efficient video coding for machines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a dataset of 27,264 machine-generated JRD annotations for object detection, instance segmentation, and keypoint detection. It introduces the AMT-JRD model that combines generalized feature extraction, specialized task features, and fusion of object attributes such as size and location to predict visibility thresholds jointly. This approach addresses the single-task limitation of prior JRD methods by enabling shared learning that compensates for image-feature shortcomings. A reader would care because accurate multi-task JRD prediction lets video codecs drop imperceptible details for machines, cutting data rates while preserving task performance in areas like surveillance or autonomous systems. Experiments confirm lower prediction errors than single-task baselines and concrete bit-rate savings when the predictions guide coding.

Core claim

The authors show that the AMT-JRD model, trained on the new MT-JRD dataset, achieves a mean absolute error of 3.781 and error variance of 5.332 across the three tasks by integrating GFEM, SFEM, and AFFM modules, outperforming state-of-the-art single-task prediction by 6.7% and 6.3% respectively, and delivering average BD-mAP improvements of 3.861% over VVC and 7.886% over JPEG when applied to VCM.

What carries the argument

The Attribute-assisted Multi-Task JRD (AMT-JRD) prediction model, which uses Generalized Feature Extraction Module (GFEM), Specialized Feature Extraction Module (SFEM), and Attribute Feature Fusion Module (AFFM) to jointly estimate object-wise JRDs by incorporating prior object size and location knowledge.

If this is right

The predicted JRDs can be used to reduce coding bit rates in VCM pipelines while keeping accuracy high across multiple tasks simultaneously.
Object attribute fusion compensates for limitations of image features alone, leading to more robust threshold estimates.
The same model architecture supports joint optimization for object detection, instance segmentation, and keypoint detection without separate predictors.
Integration with existing codecs like VVC yields measurable BD-mAP gains over both VVC and JPEG baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The attribute-fusion idea could be tested on video sequences with motion to see if temporal object attributes further improve JRD accuracy.
The dataset construction method might scale to collect labels for additional machine tasks such as action recognition or tracking.
If the multi-task predictions generalize, they could serve as a starting point for standardized perceptual models in future machine-oriented compression standards.

Load-bearing premise

The 27,264 machine-generated JRD annotations collected for the three tasks represent the perceptual behavior of real-world machine vision systems on unseen content and additional tasks.

What would settle it

A direct test applying the AMT-JRD predictions to new video sequences or a fourth machine task and finding no BD-mAP gain or a drop in task accuracy would show the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.09421 by Junqi Liu, Long Xu, Weisi Lin, Xiaoxia Huang, Yun Zhang.

**Figure 1.** Figure 1: Visualized examples of MT-JRD. (a) Performance degradation in multi-task [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: The pipeline for constructing the MT-JRD dataset. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: MT-JRD quantity distribution under different [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Distortion distribution of the MT-JRD dataset. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Relationship between object size and JRD. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 8.** Figure 8: Architectures of MT-JRD prediction models. (a) Independent single-task archi [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

**Figure 9.** Figure 9: The proposed AMT-JRD model, which consists of GFEM, SFEM, AFFM, and multi-task classification heads. [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 10.** Figure 10: The processing workflow for JRD-based VCM optimization. (a) JRD-based [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Coding gain comparison among JRD models in VVC. (a) OD, (b) IS, (c) KPD. [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Coding gain comparison among JRD models in JPEG. (a) OD, (b) IS, (c) KPD. [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 13.** Figure 13: Complexity and accuracy of JRD models on the MT-JRD dataset. [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

**Figure 14.** Figure 14: Visualized machine analytical results of compressed images from different JRD-based coding methods. The first, second, and third lines correspond to OD, IS, and KPD, [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

read the original abstract

Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construct a dataset comprising 27,264 JRD annotations from machines, supporting three representative tasks including object detection, instance segmentation, and keypoint detection. Secondly, we propose the AMT-JRD prediction model, which integrates Generalized Feature Extraction Module (GFEM) and Specialized Feature Extraction Module (SFEM) to facilitate joint learning across multiple tasks. Thirdly, we innovatively incorporate object attribute information into object-wise JRD prediction through the Attribute Feature Fusion Module (AFFM), which introduces prior knowledge about object size and location. This design effectively compensates for the limitations of relying solely on image features and enhances the model's capacity to represent the perceptual mechanisms of machine vision. Finally, we apply the AMT-JRD model to VCM, where the accurately predicted JRDs are applied to reduce the coding bit rate while preserving accuracy across multiple machine vision tasks. Extensive experimental results demonstrate that AMT-JRD achieves precise and robust multi-task prediction with a mean absolute error of 3.781 and error variance of 5.332 across three tasks, outperforming the state-of-the-art single-task prediction model by 6.7% and 6.3%, respectively. Coding experiments further reveal that compared to the baseline VVC and JPEG, the AMT-JRD-based VCM improves an average of 3.861% and 7.886% Bjontegaard Delta-mean Average Precision (BD-mAP), respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a multi-task JRD dataset and an attribute-fused predictor that reports better accuracy and modest BD-mAP gains in VCM, but the machine-generated labels limit how far the results can be trusted.

read the letter

The paper collects 27,264 machine-generated JRD annotations across object detection, instance segmentation, and keypoint detection, then trains an AMT-JRD model that combines generalized and specialized feature modules with an attribute fusion step for object size and location. This is new relative to the single-task JRD papers it cites, and the joint prediction setup plus the attribute prior are sensible ways to handle the multi-task case without running three separate predictors. The reported MAE of 3.781 and the 3.86% average BD-mAP improvement over VVC in the coding tests are the concrete numbers that would matter to someone building a bandwidth-limited pipeline with several vision tasks running at once. The dataset size is large enough to support the experiments they describe, and folding in object attributes is a clear attempt to compensate for what pure image features miss. The main soft spot is that every label comes from running fixed machine models on the source videos. Those thresholds therefore reflect the quirks of the particular detectors and segmentors used to create the ground truth, not some intrinsic machine perceptual limit. If the video corpus is narrow or the annotation models are not representative, the learned predictor can latch onto dataset-specific patterns and the claimed coding savings will not carry over to other networks or new scenes. The abstract also gives error figures and BD-mAP deltas without showing data splits, baseline code details, or significance tests, so the full manuscript needs to supply those to make the gains convincing. This work is for researchers focused on video coding for machines who already care about running multiple tasks on the same bitstream. It is worth sending to peer review because the dataset is fresh and the multi-task framing is practical, even if the generalization question will require extra experiments from the authors.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Multi-Task Just Recognizable Difference (MT-JRD) dataset containing 27,264 machine-generated annotations across object detection, instance segmentation, and keypoint detection tasks. It proposes the Attribute-assisted MT-JRD (AMT-JRD) model that combines Generalized Feature Extraction Module (GFEM), Specialized Feature Extraction Module (SFEM), and Attribute Feature Fusion Module (AFFM) to jointly predict JRD thresholds for multiple tasks by incorporating object size and location priors. The model is then integrated into a Video Coding for Machines (VCM) pipeline to reduce bitrate while preserving task accuracy, with reported results of MAE 3.781 and error variance 5.332, 6.7%/6.3% gains over single-task baselines, and average BD-mAP improvements of 3.861% over VVC and 7.886% over JPEG.

Significance. If the empirical results hold under proper validation, the work provides a concrete step toward multi-task perceptual modeling for VCM, extending single-task JRD approaches with a new dataset and an architecture that fuses task-specific and attribute-based features. The downstream coding gains demonstrate a practical application, and the explicit use of machine-generated annotations as training targets is a clear methodological choice that could be reproduced if the annotation protocol is fully documented.

major comments (3)

[Experimental Results / Dataset Construction] The central performance claims (MAE 3.781, 6.7%/6.3% gains, BD-mAP improvements) rest on the 27,264 annotations being faithful proxies for machine vision thresholds, yet the manuscript provides no information on the specific detectors/segmentors/keypoint models used to generate the labels, the video corpus selection criteria, or any cross-validation across different machine vision backbones. This directly affects whether the GFEM+SFEM+AFFM architecture learns intrinsic perceptual limits or dataset-specific correlations (see results tables and experimental setup).
[Experimental Results] No details are given on train/validation/test splits, whether the held-out evaluation uses the same machine vision models as annotation generation, or any statistical significance testing for the reported error metrics and BD-mAP deltas. Without these, it is impossible to assess whether the multi-task gains are robust or influenced by post-hoc choices (see all quantitative tables and the VCM coding experiments).
[Model Architecture / Ablation Studies] The AFFM module claims to compensate for limitations of image features by injecting object size and location priors, but the paper does not quantify the contribution of this module via ablation (e.g., AMT-JRD without AFFM) or show that the priors are not already implicitly captured by the feature extractors on the collected data.

minor comments (2)

[Abstract / Dataset] The abstract and results sections should explicitly state the number of videos/frames per task and the range of JRD values to allow readers to judge the scale of the reported MAE 3.781.
[Introduction / Method] Notation for the three tasks and the JRD definition should be introduced consistently in the introduction or method section rather than only in the abstract.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of the work. We agree that the manuscript requires additional details on dataset construction, experimental protocols, and ablation studies to strengthen the validation of the reported results. We will revise the paper accordingly.

read point-by-point responses

Referee: [Experimental Results / Dataset Construction] The central performance claims (MAE 3.781, 6.7%/6.3% gains, BD-mAP improvements) rest on the 27,264 annotations being faithful proxies for machine vision thresholds, yet the manuscript provides no information on the specific detectors/segmentors/keypoint models used to generate the labels, the video corpus selection criteria, or any cross-validation across different machine vision backbones. This directly affects whether the GFEM+SFEM+AFFM architecture learns intrinsic perceptual limits or dataset-specific correlations (see results tables and experimental setup).

Authors: We acknowledge that the current manuscript does not provide sufficient detail on these aspects of dataset construction. In the revised version, we will expand the dataset section to explicitly document the machine vision models used to generate the annotations, the criteria and sources for selecting the video corpus, and any cross-validation experiments performed across alternative backbones. This will allow readers to better evaluate whether the model captures general perceptual thresholds. revision: yes
Referee: [Experimental Results] No details are given on train/validation/test splits, whether the held-out evaluation uses the same machine vision models as annotation generation, or any statistical significance testing for the reported error metrics and BD-mAP deltas. Without these, it is impossible to assess whether the multi-task gains are robust or influenced by post-hoc choices (see all quantitative tables and the VCM coding experiments).

Authors: We agree that these experimental details are necessary for assessing robustness. The revised manuscript will include the train/validation/test split information, confirmation that held-out evaluation employs the same models as annotation generation, and results from statistical significance testing (such as paired t-tests with reported p-values) on the MAE, variance, and BD-mAP metrics. Updated tables and text will incorporate these elements. revision: yes
Referee: [Model Architecture / Ablation Studies] The AFFM module claims to compensate for limitations of image features by injecting object size and location priors, but the paper does not quantify the contribution of this module via ablation (e.g., AMT-JRD without AFFM) or show that the priors are not already implicitly captured by the feature extractors on the collected data.

Authors: We recognize the importance of ablation studies to isolate the AFFM's contribution. In the revision, we will add a dedicated ablation analysis comparing the full AMT-JRD model to a variant without the AFFM (and without attribute priors). This will quantify the impact on prediction error and demonstrate whether the size and location priors provide benefits beyond what is implicitly learned by the GFEM and SFEM on the dataset. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on new dataset and held-out evaluation

full rationale

The paper first constructs a fresh MT-JRD dataset of 27,264 machine-generated annotations for object detection, instance segmentation and keypoint detection. It then trains the AMT-JRD model (GFEM + SFEM + AFFM) on this data and reports MAE 3.781 plus BD-mAP gains on held-out test material and downstream VCM coding. No equation or claim reduces by construction to a fitted parameter, self-citation chain, or renamed input; the reported predictions are ordinary supervised outputs from the collected annotations rather than tautological restatements. The chain from data collection through multi-task learning to coding application is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on a newly collected machine-annotated dataset and a neural network whose weights are fitted to those annotations; no additional physical constants or closed-form derivations are invoked.

free parameters (1)

neural network weights
All parameters of the AMT-JRD model are learned from the 27,264 JRD annotations.

axioms (1)

domain assumption Machine vision perceptual thresholds for detection, segmentation and keypoint tasks can be reliably captured by image features plus object size and location attributes.
This assumption underpins both the dataset collection and the design of the Attribute Feature Fusion Module.

pith-pipeline@v0.9.0 · 5656 in / 1406 out tokens · 38314 ms · 2026-05-10T16:39:19.117468+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Just Recognizable Difference (JRD) ... minimal perceptual threshold that significantly influences machine vision performance
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AMT-JRD prediction model ... GFEM, SFEM, AFFM

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,”IEEE Trans. Circuit Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, 2012

work page 2012
[2]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Trans. Circuit Syst. Video Technol., vol. 31, no. 10, pp. 3736–3764, 2021

work page 2021
[3]

Just noticeable visual redundancy forecasting: a deep multimodal-driven approach,

W. Xie, S. Wang, S. Tian, L. Huang, Y . Liu, and M. Wang, “Just noticeable visual redundancy forecasting: a deep multimodal-driven approach,” inAAAI Conf. Artif. Intell., vol. 37, no. 3, 2023, pp. 2965– 2973

work page 2023
[4]

Metajnd: A meta-learning approach for just noticeable difference estimation,

M. Wang, Y . Zhu, R. Zhang, and W. Xie, “Metajnd: A meta-learning approach for just noticeable difference estimation,” inInt. Joint Conf. Artif. Intell., 2024, pp. 3151–3159

work page 2024
[5]

Toward top-down just noticeable difference estimation of natural images,

Q. Jiang, Z. Liu, S. Wang, F. Shao, and W. Lin, “Toward top-down just noticeable difference estimation of natural images,”IEEE Trans. Image Process., vol. 31, pp. 3697–3712, 2022

work page 2022
[6]

Rethinking and con- ceptualizing just noticeable difference estimation by residual learning,

Q. Jiang, F. Liu, Z. Wang, S. Wang, and W. Lin, “Rethinking and con- ceptualizing just noticeable difference estimation by residual learning,” IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 10, pp. 9515–9527, 2024

work page 2024
[7]

Hierarchical predictive coding-based jnd estimation for image compression,

H. Wang, L. Yu, J. Liang, H. Yin, T. Li, and S. Wang, “Hierarchical predictive coding-based jnd estimation for image compression,”IEEE Trans. Image Process., vol. 30, pp. 487–500, 2021

work page 2021
[8]

A survey on perceptually optimized video coding,

Y . Zhang, L. Zhu, G. Jiang, S. Kwong, and C.-C. J. Kuo, “A survey on perceptually optimized video coding,”ACM Comput. Surveys, vol. 55, no. 12, pp. 1–37, 2023

work page 2023
[9]

Video coding for machines: A paradigm of collaborative compression and intelligent analytics,

L. Duan, J. Liu, W. Yang, T. Huang, and W. Gao, “Video coding for machines: A paradigm of collaborative compression and intelligent analytics,”IEEE Trans. Image Process., vol. 29, pp. 8680–8695, 2020

work page 2020
[10]

Progress and opportunities in modelling just- noticeable difference (jnd) for multimedia,

W. Lin and G. Ghinea, “Progress and opportunities in modelling just- noticeable difference (jnd) for multimedia,”IEEE Trans. Multimedia, vol. 24, pp. 3706–3721, 2022

work page 2022
[11]

Deep learning-based picture-wise just noticeable distortion prediction 12 model for image compression,

H. Liu, Y . Zhang, H. Zhang, C. Fan, S. Kwong, C.-C. J. Kuo, and X. Fan, “Deep learning-based picture-wise just noticeable distortion prediction 12 model for image compression,”IEEE Trans. Image Process., vol. 29, pp. 641–656, 2020

work page 2020
[12]

Deep learning based just noticeable difference and perceptual quality prediction models for compressed video,

Y . Zhang, H. Liu, Y . Yang, X. Fan, S. Kwong, and C. C. J. Kuo, “Deep learning based just noticeable difference and perceptual quality prediction models for compressed video,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 3, pp. 1197–1212, 2022

work page 2022
[13]

Vp-jnd:visual perception assisted deep picture-wise just noticeable difference predic- tion model for image compression,

Y . Zhang, S. Zhang, N. Li, C. Fan, and R. Hamzaoui, “Vp-jnd:visual perception assisted deep picture-wise just noticeable difference predic- tion model for image compression,”IEEE Trans. Circuit Syst. Video Technol., pp. 1–1, 2025

work page 2025
[14]

Mtjnd: Multi-task deep learning framework for improved jnd prediction,

S. Nami, F. Pakdaman, M. R. Hashemi, S. Shirmohammadi, and M. Gabbouj, “Mtjnd: Multi-task deep learning framework for improved jnd prediction,” inProc. IEEE Int. Conf. Image Process., 2023, pp. 1245–1249

work page 2023
[15]

Sg-jnd: Semantic-guided just noticeable distortion predictor for image compression,

L. Cao, W. Sun, X. Min, J. Jia, Z. Zhang, Z. Chen, Y . Zhu, L. Liu, Q. Chen, J. Chen, and G. Zhai, “Sg-jnd: Semantic-guided just noticeable distortion predictor for image compression,” inProc. IEEE Int. Conf. Image Process., 2024, pp. 1139–1145

work page 2024
[16]

Lightweight multitask learning for robust jnd prediction using latent space and reconstructed frames,

S. Nami, F. Pakdaman, M. R. Hashemi, S. Shirmohammadi, and M. Gabbouj, “Lightweight multitask learning for robust jnd prediction using latent space and reconstructed frames,”IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 9, pp. 8657–8671, 2024

work page 2024
[17]

Recent standard development activities on video coding for machines,

W. Gao, S. Liu, X. Xu, M. Rafie, Y . Zhang, and I. Curcio, “Recent standard development activities on video coding for machines,”arXiv preprint arXiv:2105.12653, 2021

work page arXiv 2021
[18]

Statistical study on perceived jpeg image quality via mcl-jci dataset construction and analysis,

L. Jin, J. Y . Lin, S. Hu, H. Wang, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Statistical study on perceived jpeg image quality via mcl-jci dataset construction and analysis,”Electronic Imaging, vol. 2016, no. 13, pp. 1–9, 2016

work page 2016
[19]

Large-scale crowdsourced subjective assessment of picturewise just noticeable difference,

H. Lin, G. Chen, M. Jenadeleh, V . Hosu, U.-D. Reips, R. Hamzaoui, and D. Saupe, “Large-scale crowdsourced subjective assessment of picturewise just noticeable difference,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 9, pp. 5859–5873, 2022

work page 2022
[20]

Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset,

H. Wang, W. Gan, S. Hu, J. Y . Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset,” inProc. IEEE Int. Conf. Image Process., 2016, pp. 1509–1513

work page 2016
[21]

Videoset: A large-scale compressed video quality dataset based on jnd measurement,

H. Wang, I. Katsavounidis, J. Zhou, J. Park, S. Lei, X. Zhou, M.-O. Pun, X. Jin, R. Wang, X. Wanget al., “Videoset: A large-scale compressed video quality dataset based on jnd measurement,”J. Vis. Commun. Image Represent., vol. 46, pp. 292–302, 2017

work page 2017
[22]

Transtic: Transferring transformer-based image compression from human perception to machine perception,

Y .-H. Chen, Y .-C. Weng, C.-H. Kao, C. Chien, W.-C. Chiu, and W.- H. Peng, “Transtic: Transferring transformer-based image compression from human perception to machine perception,” inProc. Int. Conf. Comput. Vis., 2023, pp. 23 240–23 250

work page 2023
[23]

Im- age compression for machine and human vision with spatial-frequency adaptation,

H. Li, S. Li, S. Ding, W. Dai, M. Cao, C. Li, J. Zou, and H. Xiong, “Im- age compression for machine and human vision with spatial-frequency adaptation,” inProc. Eur. Conf. Comput. Vis.Springer, 2024, pp. 382– 399

work page 2024
[24]

Boosting neural image compression for machines using latent space masking,

K. Fischer, F. Brand, and A. Kaup, “Boosting neural image compression for machines using latent space masking,”IEEE Trans. Circuit Syst. Video Technol., vol. 35, no. 4, pp. 3719–3731, 2025

work page 2025
[25]

Preprocessing enhanced image compression for machine vision,

G. Lu, X. Ge, T. Zhong, Q. Hu, and J. Geng, “Preprocessing enhanced image compression for machine vision,”IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 12, pp. 13 556–13 568, 2024

work page 2024
[26]

Task-switchable pre-processor for image compression for multiple machine vision tasks,

M. Yang, F. Yang, L. Murn, M. G. Blanch, J. Sock, S. Wan, F. Yang, and L. Herranz, “Task-switchable pre-processor for image compression for multiple machine vision tasks,”IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 7, pp. 6416–6429, 2024

work page 2024
[27]

Video coding for machines: Compact visual representation compression for intelligent collaborative analytics,

W. Yang, H. Huang, Y . Hu, L.-Y . Duan, and J. Liu, “Video coding for machines: Compact visual representation compression for intelligent collaborative analytics,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 7, pp. 5174–5191, 2024

work page 2024
[28]

All-in-one image coding for joint human-machine vision with multi-path aggregation,

X. Zhang, P. Guo, M. Lu, and Z. Ma, “All-in-one image coding for joint human-machine vision with multi-path aggregation,”Proc. Adv. Neural Inf. Process. Syst., vol. 37, pp. 71 465–71 503, 2024

work page 2024
[29]

Rate- distortion-cognition controllable versatile neural image compression,

J. Liu, R. Feng, Y . Qi, Q. Chen, Z. Chen, W. Zeng, and X. Jin, “Rate- distortion-cognition controllable versatile neural image compression,” in Proc. Eur. Conf. Comput. Vis.Springer, 2024, pp. 329–348

work page 2024
[30]

Just noticeable difference for deep machine vision,

J. Jin, X. Zhang, X. Fu, H. Zhang, W. Lin, J. Lou, and Y . Zhao, “Just noticeable difference for deep machine vision,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 6, pp. 3452–3461, 2022

work page 2022
[31]

Perceptual video coding for machines via satisfied machine ratio modeling,

Q. Zhang, S. Wang, X. Zhang, C. Jia, Z. Wang, S. Ma, and W. Gao, “Perceptual video coding for machines via satisfied machine ratio modeling,”IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–18, 2024

work page 2024
[32]

A non-reference just recognized distortion prediction framework for object detection task,

Y . Liu, H. Yin, H. Wang, X. Wang, and L. Yin, “A non-reference just recognized distortion prediction framework for object detection task,” in 2024 Data Compression Conference (DCC), 2024, pp. 570–570

work page 2024
[33]

Generative adversarial nets,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014

work page 2014
[34]

Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,

C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7464–7475

work page 2023
[35]

Just recognizable distortion for machine vision oriented image and video coding,

Q. Zhang, S. Wang, X. Zhang, S. Ma, and W. Gao, “Just recognizable distortion for machine vision oriented image and video coding,”Int. J. Comput. Vis., vol. 129, no. 10, pp. 2889–2906, 2021

work page 2021
[36]

Learning to predict object-wise just recognizable distortion for image and video compression,

Y . Zhang, H. Lin, J. Sun, L. Zhu, and S. Kwong, “Learning to predict object-wise just recognizable distortion for image and video compression,”IEEE Trans. Multimedia, vol. 26, pp. 5925–5938, 2024

work page 2024
[37]

Dt-jrd: Deep transformer-based just recognizable difference prediction model for video coding for machines,

J. Liu, Y . Zhang, X. Wang, L. Xu, and S. Kwong, “Dt-jrd: Deep transformer-based just recognizable difference prediction model for video coding for machines,”IEEE Trans. Multimedia, vol. 28, pp. 114– 127, 2026

work page 2026
[38]

Faster r-cnn: Towards real-time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017

work page 2017
[39]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” inProc. Int. Conf. Comput. Vis., 2017, pp. 2980–2988

work page 2017
[40]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inProc. Eur. Conf. Comput. Vis., 2014, pp. 740–755

work page 2014
[41]

Aggregated residual transformations for deep neural networks,

S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5987–5995

work page 2017
[42]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,”Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010

work page 2010
[43]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[44]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 586– 595

work page 2018
[45]

Understanding the effective receptive field in deep convolutional neural networks,

W. Luo, Y . Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,”Proc. Adv. Neural Inf. Process. Syst., vol. 29, 2016

work page 2016
[46]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. Int. Conf. Comput. Vis., 2021, pp. 10 012–10 022. Junqi Liureceived the B.E. degree in electronic information science and technology from Sun Yat- sen University, China, in 2024. He is currently pur- su...

work page 2021

[1] [1]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,”IEEE Trans. Circuit Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, 2012

work page 2012

[2] [2]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Trans. Circuit Syst. Video Technol., vol. 31, no. 10, pp. 3736–3764, 2021

work page 2021

[3] [3]

Just noticeable visual redundancy forecasting: a deep multimodal-driven approach,

W. Xie, S. Wang, S. Tian, L. Huang, Y . Liu, and M. Wang, “Just noticeable visual redundancy forecasting: a deep multimodal-driven approach,” inAAAI Conf. Artif. Intell., vol. 37, no. 3, 2023, pp. 2965– 2973

work page 2023

[4] [4]

Metajnd: A meta-learning approach for just noticeable difference estimation,

M. Wang, Y . Zhu, R. Zhang, and W. Xie, “Metajnd: A meta-learning approach for just noticeable difference estimation,” inInt. Joint Conf. Artif. Intell., 2024, pp. 3151–3159

work page 2024

[5] [5]

Toward top-down just noticeable difference estimation of natural images,

Q. Jiang, Z. Liu, S. Wang, F. Shao, and W. Lin, “Toward top-down just noticeable difference estimation of natural images,”IEEE Trans. Image Process., vol. 31, pp. 3697–3712, 2022

work page 2022

[6] [6]

Rethinking and con- ceptualizing just noticeable difference estimation by residual learning,

Q. Jiang, F. Liu, Z. Wang, S. Wang, and W. Lin, “Rethinking and con- ceptualizing just noticeable difference estimation by residual learning,” IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 10, pp. 9515–9527, 2024

work page 2024

[7] [7]

Hierarchical predictive coding-based jnd estimation for image compression,

H. Wang, L. Yu, J. Liang, H. Yin, T. Li, and S. Wang, “Hierarchical predictive coding-based jnd estimation for image compression,”IEEE Trans. Image Process., vol. 30, pp. 487–500, 2021

work page 2021

[8] [8]

A survey on perceptually optimized video coding,

Y . Zhang, L. Zhu, G. Jiang, S. Kwong, and C.-C. J. Kuo, “A survey on perceptually optimized video coding,”ACM Comput. Surveys, vol. 55, no. 12, pp. 1–37, 2023

work page 2023

[9] [9]

Video coding for machines: A paradigm of collaborative compression and intelligent analytics,

L. Duan, J. Liu, W. Yang, T. Huang, and W. Gao, “Video coding for machines: A paradigm of collaborative compression and intelligent analytics,”IEEE Trans. Image Process., vol. 29, pp. 8680–8695, 2020

work page 2020

[10] [10]

Progress and opportunities in modelling just- noticeable difference (jnd) for multimedia,

W. Lin and G. Ghinea, “Progress and opportunities in modelling just- noticeable difference (jnd) for multimedia,”IEEE Trans. Multimedia, vol. 24, pp. 3706–3721, 2022

work page 2022

[11] [11]

Deep learning-based picture-wise just noticeable distortion prediction 12 model for image compression,

H. Liu, Y . Zhang, H. Zhang, C. Fan, S. Kwong, C.-C. J. Kuo, and X. Fan, “Deep learning-based picture-wise just noticeable distortion prediction 12 model for image compression,”IEEE Trans. Image Process., vol. 29, pp. 641–656, 2020

work page 2020

[12] [12]

Deep learning based just noticeable difference and perceptual quality prediction models for compressed video,

Y . Zhang, H. Liu, Y . Yang, X. Fan, S. Kwong, and C. C. J. Kuo, “Deep learning based just noticeable difference and perceptual quality prediction models for compressed video,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 3, pp. 1197–1212, 2022

work page 2022

[13] [13]

Vp-jnd:visual perception assisted deep picture-wise just noticeable difference predic- tion model for image compression,

Y . Zhang, S. Zhang, N. Li, C. Fan, and R. Hamzaoui, “Vp-jnd:visual perception assisted deep picture-wise just noticeable difference predic- tion model for image compression,”IEEE Trans. Circuit Syst. Video Technol., pp. 1–1, 2025

work page 2025

[14] [14]

Mtjnd: Multi-task deep learning framework for improved jnd prediction,

S. Nami, F. Pakdaman, M. R. Hashemi, S. Shirmohammadi, and M. Gabbouj, “Mtjnd: Multi-task deep learning framework for improved jnd prediction,” inProc. IEEE Int. Conf. Image Process., 2023, pp. 1245–1249

work page 2023

[15] [15]

Sg-jnd: Semantic-guided just noticeable distortion predictor for image compression,

L. Cao, W. Sun, X. Min, J. Jia, Z. Zhang, Z. Chen, Y . Zhu, L. Liu, Q. Chen, J. Chen, and G. Zhai, “Sg-jnd: Semantic-guided just noticeable distortion predictor for image compression,” inProc. IEEE Int. Conf. Image Process., 2024, pp. 1139–1145

work page 2024

[16] [16]

Lightweight multitask learning for robust jnd prediction using latent space and reconstructed frames,

S. Nami, F. Pakdaman, M. R. Hashemi, S. Shirmohammadi, and M. Gabbouj, “Lightweight multitask learning for robust jnd prediction using latent space and reconstructed frames,”IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 9, pp. 8657–8671, 2024

work page 2024

[17] [17]

Recent standard development activities on video coding for machines,

W. Gao, S. Liu, X. Xu, M. Rafie, Y . Zhang, and I. Curcio, “Recent standard development activities on video coding for machines,”arXiv preprint arXiv:2105.12653, 2021

work page arXiv 2021

[18] [18]

Statistical study on perceived jpeg image quality via mcl-jci dataset construction and analysis,

L. Jin, J. Y . Lin, S. Hu, H. Wang, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Statistical study on perceived jpeg image quality via mcl-jci dataset construction and analysis,”Electronic Imaging, vol. 2016, no. 13, pp. 1–9, 2016

work page 2016

[19] [19]

Large-scale crowdsourced subjective assessment of picturewise just noticeable difference,

H. Lin, G. Chen, M. Jenadeleh, V . Hosu, U.-D. Reips, R. Hamzaoui, and D. Saupe, “Large-scale crowdsourced subjective assessment of picturewise just noticeable difference,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 9, pp. 5859–5873, 2022

work page 2022

[20] [20]

Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset,

H. Wang, W. Gan, S. Hu, J. Y . Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset,” inProc. IEEE Int. Conf. Image Process., 2016, pp. 1509–1513

work page 2016

[21] [21]

Videoset: A large-scale compressed video quality dataset based on jnd measurement,

H. Wang, I. Katsavounidis, J. Zhou, J. Park, S. Lei, X. Zhou, M.-O. Pun, X. Jin, R. Wang, X. Wanget al., “Videoset: A large-scale compressed video quality dataset based on jnd measurement,”J. Vis. Commun. Image Represent., vol. 46, pp. 292–302, 2017

work page 2017

[22] [22]

Transtic: Transferring transformer-based image compression from human perception to machine perception,

Y .-H. Chen, Y .-C. Weng, C.-H. Kao, C. Chien, W.-C. Chiu, and W.- H. Peng, “Transtic: Transferring transformer-based image compression from human perception to machine perception,” inProc. Int. Conf. Comput. Vis., 2023, pp. 23 240–23 250

work page 2023

[23] [23]

Im- age compression for machine and human vision with spatial-frequency adaptation,

H. Li, S. Li, S. Ding, W. Dai, M. Cao, C. Li, J. Zou, and H. Xiong, “Im- age compression for machine and human vision with spatial-frequency adaptation,” inProc. Eur. Conf. Comput. Vis.Springer, 2024, pp. 382– 399

work page 2024

[24] [24]

Boosting neural image compression for machines using latent space masking,

K. Fischer, F. Brand, and A. Kaup, “Boosting neural image compression for machines using latent space masking,”IEEE Trans. Circuit Syst. Video Technol., vol. 35, no. 4, pp. 3719–3731, 2025

work page 2025

[25] [25]

Preprocessing enhanced image compression for machine vision,

G. Lu, X. Ge, T. Zhong, Q. Hu, and J. Geng, “Preprocessing enhanced image compression for machine vision,”IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 12, pp. 13 556–13 568, 2024

work page 2024

[26] [26]

Task-switchable pre-processor for image compression for multiple machine vision tasks,

M. Yang, F. Yang, L. Murn, M. G. Blanch, J. Sock, S. Wan, F. Yang, and L. Herranz, “Task-switchable pre-processor for image compression for multiple machine vision tasks,”IEEE Trans. Circuit Syst. Video Technol., vol. 34, no. 7, pp. 6416–6429, 2024

work page 2024

[27] [27]

Video coding for machines: Compact visual representation compression for intelligent collaborative analytics,

W. Yang, H. Huang, Y . Hu, L.-Y . Duan, and J. Liu, “Video coding for machines: Compact visual representation compression for intelligent collaborative analytics,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 7, pp. 5174–5191, 2024

work page 2024

[28] [28]

All-in-one image coding for joint human-machine vision with multi-path aggregation,

X. Zhang, P. Guo, M. Lu, and Z. Ma, “All-in-one image coding for joint human-machine vision with multi-path aggregation,”Proc. Adv. Neural Inf. Process. Syst., vol. 37, pp. 71 465–71 503, 2024

work page 2024

[29] [29]

Rate- distortion-cognition controllable versatile neural image compression,

J. Liu, R. Feng, Y . Qi, Q. Chen, Z. Chen, W. Zeng, and X. Jin, “Rate- distortion-cognition controllable versatile neural image compression,” in Proc. Eur. Conf. Comput. Vis.Springer, 2024, pp. 329–348

work page 2024

[30] [30]

Just noticeable difference for deep machine vision,

J. Jin, X. Zhang, X. Fu, H. Zhang, W. Lin, J. Lou, and Y . Zhao, “Just noticeable difference for deep machine vision,”IEEE Trans. Circuit Syst. Video Technol., vol. 32, no. 6, pp. 3452–3461, 2022

work page 2022

[31] [31]

Perceptual video coding for machines via satisfied machine ratio modeling,

Q. Zhang, S. Wang, X. Zhang, C. Jia, Z. Wang, S. Ma, and W. Gao, “Perceptual video coding for machines via satisfied machine ratio modeling,”IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–18, 2024

work page 2024

[32] [32]

A non-reference just recognized distortion prediction framework for object detection task,

Y . Liu, H. Yin, H. Wang, X. Wang, and L. Yin, “A non-reference just recognized distortion prediction framework for object detection task,” in 2024 Data Compression Conference (DCC), 2024, pp. 570–570

work page 2024

[33] [33]

Generative adversarial nets,

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014

work page 2014

[34] [34]

Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,

C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7464–7475

work page 2023

[35] [35]

Just recognizable distortion for machine vision oriented image and video coding,

Q. Zhang, S. Wang, X. Zhang, S. Ma, and W. Gao, “Just recognizable distortion for machine vision oriented image and video coding,”Int. J. Comput. Vis., vol. 129, no. 10, pp. 2889–2906, 2021

work page 2021

[36] [36]

Learning to predict object-wise just recognizable distortion for image and video compression,

Y . Zhang, H. Lin, J. Sun, L. Zhu, and S. Kwong, “Learning to predict object-wise just recognizable distortion for image and video compression,”IEEE Trans. Multimedia, vol. 26, pp. 5925–5938, 2024

work page 2024

[37] [37]

Dt-jrd: Deep transformer-based just recognizable difference prediction model for video coding for machines,

J. Liu, Y . Zhang, X. Wang, L. Xu, and S. Kwong, “Dt-jrd: Deep transformer-based just recognizable difference prediction model for video coding for machines,”IEEE Trans. Multimedia, vol. 28, pp. 114– 127, 2026

work page 2026

[38] [38]

Faster r-cnn: Towards real-time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017

work page 2017

[39] [39]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” inProc. Int. Conf. Comput. Vis., 2017, pp. 2980–2988

work page 2017

[40] [40]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inProc. Eur. Conf. Comput. Vis., 2014, pp. 740–755

work page 2014

[41] [41]

Aggregated residual transformations for deep neural networks,

S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5987–5995

work page 2017

[42] [42]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,”Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010

work page 2010

[43] [43]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[44] [44]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 586– 595

work page 2018

[45] [45]

Understanding the effective receptive field in deep convolutional neural networks,

W. Luo, Y . Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,”Proc. Adv. Neural Inf. Process. Syst., vol. 29, 2016

work page 2016

[46] [46]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. Int. Conf. Comput. Vis., 2021, pp. 10 012–10 022. Junqi Liureceived the B.E. degree in electronic information science and technology from Sun Yat- sen University, China, in 2024. He is currently pur- su...

work page 2021