LEIQ-Assessor: Multi-dimensional Quality Assessment of Low-light Enhanced Images via Multi-task Learning

Dandan Zhu; Guangtao Zhai; Jikai Xu; Jinqiu Sang; Wei Sun; Weixia Zhang; Yanwei Jiang

arxiv: 2606.29752 · v1 · pith:OF2THWJAnew · submitted 2026-06-29 · 💻 cs.CV · cs.MM

LEIQ-Assessor: Multi-dimensional Quality Assessment of Low-light Enhanced Images via Multi-task Learning

Wei Sun , Yanwei Jiang , Dandan Zhu , Jinqiu Sang , Jikai Xu , Weixia Zhang , Guangtao Zhai This is my paper

Pith reviewed 2026-06-30 06:36 UTC · model grok-4.3

classification 💻 cs.CV cs.MM

keywords low-light image enhancementimage quality assessmentmulti-task learningno-reference IQAperceptual sub-attributesvision transformerMOS predictionPLCC loss

0 comments

The pith

Joint optimization on overall and sub-attribute quality scores yields richer features for assessing low-light enhanced images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Low-light image enhancement can introduce artifacts like noise and color shifts that affect perceived quality, so reliable assessment metrics matter for algorithm development and use. The paper develops LEIQ-Assessor to evaluate these enhanced images by predicting both an overall quality score and six specific perceptual attributes at once. It uses a pre-trained vision transformer as backbone and trains the tasks together with PLCC loss to learn better shared features. This multi-task approach is shown to work better than single-task versions or other existing methods on a standard benchmark. Such a metric supports better development of enhancement techniques.

Core claim

LEIQ-Assessor is a multi-dimensional quality assessment model for low-light enhanced images that uses multi-task learning on a pre-trained SigLIP2 Vision Transformer backbone to jointly predict the overall Mean Opinion Score along with six perceptual sub-attributes by optimizing with the PLCC loss, thereby capturing richer quality-aware features in the shared representation than single-task training and outperforming existing no-reference IQA models on the MLE benchmark.

What carries the argument

The shared representation from the SigLIP2 Vision Transformer backbone, learned by jointly optimizing predictions of overall MOS and six correlated perceptual sub-attributes via the PLCC loss.

Load-bearing premise

The six perceptual sub-attributes are correlated with overall quality in a way that makes joint optimization produce better features than training on the overall score alone.

What would settle it

A direct comparison experiment where the identical backbone is trained only on overall MOS and evaluated on the MLE benchmark; equal or superior performance would indicate that multi-task learning does not provide the claimed benefit.

Figures

Figures reproduced from arXiv: 2606.29752 by Dandan Zhu, Guangtao Zhai, Jikai Xu, Jinqiu Sang, Wei Sun, Weixia Zhang, Yanwei Jiang.

read the original abstract

Low-light image enhancement algorithms (LIEAs) aim to improve the visibility of images captured under poor illumination. However, the enhancement process often introduces artifacts such as noise amplification, color shift, structural damage, and over-exposure, which degrade the perceptual quality of the enhanced images. Therefore, a reliable image quality assessment (IQA) metric for evaluating enhancement effects is of great importance for both the development of LIEAs and their practical applications. In this paper, we present \textbf{LEIQ-Assessor}, a multi-dimensional quality assessment model for low-light image enhancement based on multi-task learning, developed for the QoMEX 2026 Grand Challenge on Low-light Enhanced Image Quality Assessment. Specifically, our method leverages a pre-trained SigLIP2 Vision Transformer as the backbone and simultaneously predicts the overall Mean Opinion Score (MOS) together with six perceptual sub-attributes: lightness, color fidelity, noise level, exposure quality, naturalness, and content recovery. By jointly optimizing these correlated objectives via the PLCC loss, the shared representation captures richer quality-aware features than its single-task counterpart. Experiments on the MLE benchmark demonstrate that LEIQ-Assessor significantly outperforms existing no-reference IQA models and hand-crafted quality descriptors. Our method achieved second place in the QoMEX 2026 Grand Challenge on Low-light Enhanced Image Quality Assessment. The code is available at https://github.com/sunwei925/LEIQ-Assessor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A solid challenge entry that fine-tunes SigLIP2 for multi-task prediction of MOS plus six low-light attributes and took second place, with the usual thin experimental details in the abstract.

read the letter

The key takeaway is that LEIQ-Assessor applies a standard multi-task setup on top of SigLIP2 to assess low-light enhanced images across overall quality and six perceptual attributes, and it earned second place in the QoMEX 2026 Grand Challenge.

The paper does a few things right. It identifies a clear need for domain-specific IQA because general enhancement methods introduce artifacts like noise and color shifts. Using a pre-trained transformer backbone makes sense for feature extraction, and jointly optimizing the tasks with PLCC loss is a logical way to leverage correlations between attributes. Releasing the code is helpful for reproducibility. The challenge result provides some external check on the performance claim.

On the downside, the abstract is light on experimental specifics. It mentions outperformance but skips details like how baselines were implemented, what the training splits were, or whether the gains are statistically significant. Without those, it's difficult to assess if the multi-task advantage is real or if it's sensitive to hyperparameters. The assumption that the six attributes are the right ones and that the MLE benchmark represents practical use cases could be tested more explicitly, perhaps with cross-dataset evaluation. These are common issues in challenge papers but still worth noting.

Overall, this work targets researchers and practitioners in image enhancement who need a tailored quality metric. Someone developing low-light algorithms or building vision systems for poor lighting conditions could get value from the model and the benchmark results. It is not breaking new ground in IQA methods but fills a niche application.

I would recommend sending it for peer review. The combination of the challenge placement and the practical focus makes it worth a referee's time, even if revisions are needed to flesh out the experiments.

Referee Report

2 major / 1 minor

Summary. The paper presents LEIQ-Assessor, a multi-task model for no-reference quality assessment of low-light enhanced images. It uses a pre-trained SigLIP2 Vision Transformer backbone to jointly predict overall MOS and six perceptual sub-attributes (lightness, color fidelity, noise level, exposure quality, naturalness, content recovery) via PLCC loss, claiming that the shared representation yields richer quality-aware features than single-task training. Experiments on the MLE benchmark are reported to show significant outperformance over existing NR-IQA models and hand-crafted descriptors, with the method placing second in the QoMEX 2026 Grand Challenge; code is released.

Significance. If the performance gains hold under rigorous verification, the multi-task PLCC formulation offers a practical advance for IQA of enhanced images by exploiting attribute correlations, with direct utility for developing and deploying low-light enhancement algorithms. The challenge placement and public code release provide concrete external corroboration and reproducibility value.

major comments (2)

[Experiments] The manuscript supplies no experimental details on training procedure (e.g., learning-rate schedule, batch size, data splits, or loss-weighting scheme for the multi-task objective), baseline re-implementations, or statistical significance testing. This directly undermines verification of the central claim that joint optimization on the six sub-attributes produces richer features and superior benchmark performance.
[Experiments] No ablation is described that isolates the contribution of multi-task training versus single-task training on the same backbone and loss; without this comparison the assertion that the shared representation is richer remains untested.

minor comments (1)

The abstract states that the method 'significantly outperforms' existing models; quantitative tables or figures reporting PLCC/SROCC values, confidence intervals, and the precise MLE benchmark split used would strengthen the presentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We will strengthen the experimental section to improve reproducibility and directly address the concerns raised.

read point-by-point responses

Referee: [Experiments] The manuscript supplies no experimental details on training procedure (e.g., learning-rate schedule, batch size, data splits, or loss-weighting scheme for the multi-task objective), baseline re-implementations, or statistical significance testing. This directly undermines verification of the central claim that joint optimization on the six sub-attributes produces richer features and superior benchmark performance.

Authors: We agree that the current manuscript lacks sufficient experimental details. In the revised version we will add a dedicated subsection describing the full training procedure (learning-rate schedule, batch size, data splits, and loss-weighting scheme), the re-implementation protocol for all baselines, and the results of statistical significance testing. These additions will enable independent verification of the reported performance gains. revision: yes
Referee: [Experiments] No ablation is described that isolates the contribution of multi-task training versus single-task training on the same backbone and loss; without this comparison the assertion that the shared representation is richer remains untested.

Authors: We acknowledge that an explicit ablation isolating multi-task versus single-task training on the identical SigLIP2 backbone and PLCC loss is absent. We will perform and report this ablation study in the revision; the results will directly test whether the joint optimization yields richer quality-aware features, thereby supporting (or qualifying) the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical supervised model on external annotations

full rationale

The manuscript describes a standard multi-task regression network (SigLIP2 backbone + joint PLCC loss on MOS plus six sub-attributes) trained end-to-end on the MLE benchmark. No equations, uniqueness theorems, or ansatzes are introduced that reduce the output to a fitted parameter or self-citation by construction. Performance is reported against external baselines and a public challenge ranking, satisfying the criteria for an independent empirical result. No load-bearing self-citations or self-definitional steps appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on a pre-trained backbone whose internal representations are treated as given, human-annotated benchmark data whose reliability is assumed, and the domain assumption that the chosen sub-attributes are mutually informative for overall quality prediction.

free parameters (1)

multi-task loss balancing weights
Multi-task learning requires choosing relative weights for each attribute loss; these are not specified in the abstract and are typically tuned on validation data.

axioms (1)

domain assumption Joint optimization of the six sub-attributes via PLCC loss yields richer quality-aware features than single-task training
The central performance claim rests on this assumption about correlated objectives.

pith-pipeline@v0.9.1-grok · 5812 in / 1255 out tokens · 43612 ms · 2026-06-30T06:36:47.734802+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 3 canonical work pages · 1 internal anchor

[1]

A dynamic histogram equalization for image contrast enhancement,

Mohammad Abdullah-Al-Wadud, Md Hasanul Kabir, M Ali Akber Dewan, and Oksam Chae, “A dynamic histogram equalization for image contrast enhancement,”IEEE Transactions on Consumer Electronics, vol. 53, no. 2, pp. 593–600, 2007

2007
[2]

A weighted variational model for simultaneous reflectance and illumination estimation,

Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, and Xinghao Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2782–2790

2016
[3]

Enlightengan: Deep light enhancement without paired supervision,

Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang, “Enlightengan: Deep light enhancement without paired supervision,”IEEE Transactions on Image Processing, vol. 30, pp. 2340–2349, 2021

2021
[4]

Perceptual quality assessment of low-light image enhancement,

Guangtao Zhai, Wei Sun, Xiongkuo Min, and Jiantao Zhou, “Perceptual quality assessment of low-light image enhancement,”ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 4, pp. 1–24, 2021

2021
[5]

No- reference image quality assessment in the spatial domain,

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik, “No- reference image quality assessment in the spatial domain,”IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012

2012
[6]

Making a “completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik, “Making a “completely blind” image quality analyzer,”IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, 2012

2012
[7]

Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,

Wei Sun, Xiongkuo Min, Danyang Tu, Siwei Ma, and Guangtao Zhai, “Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,”IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 6, pp. 1178–1192, 2023

2023
[8]

Blind image quality assessment via vision-language correspondence: A multitask learning perspective,

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma, “Blind image quality assessment via vision-language correspondence: A multitask learning perspective,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14071–14081

2023
[9]

Exploring clip for assessing the look and feel of images,

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy, “Exploring clip for assessing the look and feel of images,” inProceedings of the AAAI conference on Artificial Intelligence, 2023, vol. 37, pp. 2555–2563

2023
[10]

Maniqa: Multi-dimension attention network for no-reference image quality assessment,

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Ming- deng Cao, Jiahao Wang, and Yujiu Yang, “Maniqa: Multi-dimension attention network for no-reference image quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1191–1200

2022
[11]

A deep learning based no-reference quality assessment model for ugc videos,

Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai, “A deep learning based no-reference quality assessment model for ugc videos,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 856–865

2022
[12]

Dynamic backlight scaling considering ambient luminance for mobile videos on lcd displays,

Wei Sun, Xiongkuo Min, Guangtao Zhai, Ke Gu, Siwei Ma, and Xiaokang Yang, “Dynamic backlight scaling considering ambient luminance for mobile videos on lcd displays,”IEEE Transactions on Mobile Computing, vol. 21, no. 1, pp. 110–124, 2020

2020
[13]

Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment,

Wei Sun, Xiongkuo Min, Guangtao Zhai, Ke Gu, Huiyu Duan, and Siwei Ma, “Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment,”IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 1, pp. 64–77, 2019

2019
[14]

Deep neural network for blind visual quality assessment of 4k content,

Wei Lu, Wei Sun, Xiongkuo Min, Wenhan Zhu, Quan Zhou, Jun He, Qiyuan Wang, Zicheng Zhang, Tao Wang, and Guangtao Zhai, “Deep neural network for blind visual quality assessment of 4k content,”IEEE Transactions on Broadcasting, vol. 69, no. 2, pp. 406–421, 2022

2022
[15]

Analysis of video quality datasets via design of minimalistic video quality models,

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, and Kede Ma, “Analysis of video quality datasets via design of minimalistic video quality models,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 11, pp. 7056–7071, 2024

2024
[16]

Dual-branch network for portrait image quality assessment,

Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, et al., “Dual-branch network for portrait image quality assessment,”arXiv preprint arXiv:2405.08555, 2024

work page arXiv 2024
[17]

Large multi-modality model assisted ai-generated image quality assessment,

Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, and Guangtao Zhai, “Large multi-modality model assisted ai-generated image quality assessment,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 7803–7812

2024
[18]

Lmm-vqa: Advancing video quality assessment with large multimodal models,

Qihang Ge, Wei Sun, Yu Zhang, Yunhao Li, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, and Guangtao Zhai, “Lmm-vqa: Advancing video quality assessment with large multimodal models,” IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025
[19]

Enhancing blind video quality assessment with rich quality-aware features,

Wei Sun, Linhan Cao, Jun Jia, Zhichao Zhang, Zicheng Zhang, Xiongkuo Min, and Guangtao Zhai, “Enhancing blind video quality assessment with rich quality-aware features,”Expert Systems with Applications, p. 130452, 2025

2025
[20]

An empirical study for efficient video quality assessment,

Wei Sun, Kang Fu, Linhan Cao, Dandan Zhu, Kaiwei Zhang, Yucheng Zhu, Zicheng Zhang, Menghan Hu, Xiongkuo Min, and Guangtao Zhai, “An empirical study for efficient video quality assessment,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 1403–1413

2025
[21]

Compressedvqa-hdr: Generalized full-reference and no-reference quality assessment models for com- pressed high dynamic range videos,

Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, and Guangtao Zhai, “Compressedvqa-hdr: Generalized full-reference and no-reference quality assessment models for com- pressed high dynamic range videos,” in2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2025, pp. 1–6

2025
[22]

Assessing uhd image quality from aesthetics, distortions, and saliency,

Wei Sun, Weixia Zhang, Yuqin Cao, Linhan Cao, Jun Jia, Zijian Chen, Zicheng Zhang, Xiongkuo Min, and Guangtao Zhai, “Assessing uhd image quality from aesthetics, distortions, and saliency,” inEuropean Conference on Computer Vision, 2024, pp. 109–126

2024
[23]

Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning,

Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Jun Jia, Kaiwei Zhang, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026, vol. 40, pp. 2607–2615

2026
[24]

Qualirag: Retrieval-augmented generation for visual quality understanding,

Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Qualirag: Retrieval-augmented generation for visual quality understanding,”arXiv preprint arXiv:2601.18195, 2026

work page arXiv 2026
[25]

Generalizable video quality assessment via weak-to-strong learning,

Linhan Cao, Wei Sun, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Yicong Peng, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Generalizable video quality assessment via weak-to-strong learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

2026
[26]

Bench- marking multi-dimensional aigc video quality assessment: A dataset and unified model,

Zhichao Zhang, Wei Sun, Li Xinyue, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Wang Puyi, Sun Fengyu, et al., “Bench- marking multi-dimensional aigc video quality assessment: A dataset and unified model,”ACM Transactions on Multimedia Computing, Communications and Applications, vol. 21, no. 9, pp. 1–24, 2025

2025
[27]

Efficient face image quality assessment via self-training and knowledge distillation,

Wei Sun, Weixia Zhang, Linhan Cao, Jun Jia, Xiangyang Zhu, Dandan Zhu, Xiongkuo Min, and Guangtao Zhai, “Efficient face image quality assessment via self-training and knowledge distillation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 3363–3371

2025
[28]

Scandtm: A novel dual-temporal modulation scanpath prediction model for omnidirectional images,

Dandan Zhu, Kaiwei Zhang, Xiongkuo Min, Guangtao Zhai, and Xi- aokang Yang, “Scandtm: A novel dual-temporal modulation scanpath prediction model for omnidirectional images,”IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025
[29]

From discrete representation to continuous modeling: A novel audio-visual saliency prediction model with implicit neural representations,

Dandan Zhu, Kaiwei Zhang, Kun Zhu, Nana Zhang, Weiping Ding, Guangtao Zhai, and Xiaokang Yang, “From discrete representation to continuous modeling: A novel audio-visual saliency prediction model with implicit neural representations,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 6, pp. 4059–4074, 2024

2024
[30]

Mtcam: A novel weakly- supervised audio-visual saliency prediction model with multi-modal transformer,

Dandan Zhu, Kun Zhu, Weiping Ding, Nana Zhang, Xiongkuo Min, Guangtao Zhai, and Xiaokang Yang, “Mtcam: A novel weakly- supervised audio-visual saliency prediction model with multi-modal transformer,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1756–1771, 2024

2024
[31]

A no-reference evaluation metric for low- light image enhancement,

Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, and Guangtao Zhai, “A no-reference evaluation metric for low- light image enhancement,” in2021 IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6

2021
[32]

Blind multimodal quality assessment of low-light images,

Miaohui Wang, Zhuowei Xu, Mai Xu, and Weisi Lin, “Blind multimodal quality assessment of low-light images,”International Journal of Computer Vision, vol. 133, no. 4, pp. 1665–1688, 2025

2025
[33]

Mledataset: Multi-annotated and multimodal low-light image enhancement dataset,

Bo Hu, Haitian Zhao, Yuanyuan Hu, and Xinbo Gao, “Mledataset: Multi-annotated and multimodal low-light image enhancement dataset,” https://github.com/CQUPT-HuBo90/MLEDataset, 2026, QoMEX 2026 Low-light Enhanced Image Quality Assessment Challenge

2026
[34]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al., “Siglip 2: Multilingual vision- language encoders with improved semantic understanding, localization, and dense features,”arXiv preprint arXiv:2502.14786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

A dynamic histogram equalization for image contrast enhancement,

Mohammad Abdullah-Al-Wadud, Md Hasanul Kabir, M Ali Akber Dewan, and Oksam Chae, “A dynamic histogram equalization for image contrast enhancement,”IEEE Transactions on Consumer Electronics, vol. 53, no. 2, pp. 593–600, 2007

2007

[2] [2]

A weighted variational model for simultaneous reflectance and illumination estimation,

Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, and Xinghao Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2782–2790

2016

[3] [3]

Enlightengan: Deep light enhancement without paired supervision,

Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang, “Enlightengan: Deep light enhancement without paired supervision,”IEEE Transactions on Image Processing, vol. 30, pp. 2340–2349, 2021

2021

[4] [4]

Perceptual quality assessment of low-light image enhancement,

Guangtao Zhai, Wei Sun, Xiongkuo Min, and Jiantao Zhou, “Perceptual quality assessment of low-light image enhancement,”ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 4, pp. 1–24, 2021

2021

[5] [5]

No- reference image quality assessment in the spatial domain,

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik, “No- reference image quality assessment in the spatial domain,”IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012

2012

[6] [6]

Making a “completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik, “Making a “completely blind” image quality analyzer,”IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, 2012

2012

[7] [7]

Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,

Wei Sun, Xiongkuo Min, Danyang Tu, Siwei Ma, and Guangtao Zhai, “Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,”IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 6, pp. 1178–1192, 2023

2023

[8] [8]

Blind image quality assessment via vision-language correspondence: A multitask learning perspective,

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma, “Blind image quality assessment via vision-language correspondence: A multitask learning perspective,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14071–14081

2023

[9] [9]

Exploring clip for assessing the look and feel of images,

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy, “Exploring clip for assessing the look and feel of images,” inProceedings of the AAAI conference on Artificial Intelligence, 2023, vol. 37, pp. 2555–2563

2023

[10] [10]

Maniqa: Multi-dimension attention network for no-reference image quality assessment,

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Ming- deng Cao, Jiahao Wang, and Yujiu Yang, “Maniqa: Multi-dimension attention network for no-reference image quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1191–1200

2022

[11] [11]

A deep learning based no-reference quality assessment model for ugc videos,

Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai, “A deep learning based no-reference quality assessment model for ugc videos,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 856–865

2022

[12] [12]

Dynamic backlight scaling considering ambient luminance for mobile videos on lcd displays,

Wei Sun, Xiongkuo Min, Guangtao Zhai, Ke Gu, Siwei Ma, and Xiaokang Yang, “Dynamic backlight scaling considering ambient luminance for mobile videos on lcd displays,”IEEE Transactions on Mobile Computing, vol. 21, no. 1, pp. 110–124, 2020

2020

[13] [13]

Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment,

Wei Sun, Xiongkuo Min, Guangtao Zhai, Ke Gu, Huiyu Duan, and Siwei Ma, “Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment,”IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 1, pp. 64–77, 2019

2019

[14] [14]

Deep neural network for blind visual quality assessment of 4k content,

Wei Lu, Wei Sun, Xiongkuo Min, Wenhan Zhu, Quan Zhou, Jun He, Qiyuan Wang, Zicheng Zhang, Tao Wang, and Guangtao Zhai, “Deep neural network for blind visual quality assessment of 4k content,”IEEE Transactions on Broadcasting, vol. 69, no. 2, pp. 406–421, 2022

2022

[15] [15]

Analysis of video quality datasets via design of minimalistic video quality models,

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, and Kede Ma, “Analysis of video quality datasets via design of minimalistic video quality models,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 11, pp. 7056–7071, 2024

2024

[16] [16]

Dual-branch network for portrait image quality assessment,

Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, et al., “Dual-branch network for portrait image quality assessment,”arXiv preprint arXiv:2405.08555, 2024

work page arXiv 2024

[17] [17]

Large multi-modality model assisted ai-generated image quality assessment,

Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, and Guangtao Zhai, “Large multi-modality model assisted ai-generated image quality assessment,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 7803–7812

2024

[18] [18]

Lmm-vqa: Advancing video quality assessment with large multimodal models,

Qihang Ge, Wei Sun, Yu Zhang, Yunhao Li, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, and Guangtao Zhai, “Lmm-vqa: Advancing video quality assessment with large multimodal models,” IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025

[19] [19]

Enhancing blind video quality assessment with rich quality-aware features,

Wei Sun, Linhan Cao, Jun Jia, Zhichao Zhang, Zicheng Zhang, Xiongkuo Min, and Guangtao Zhai, “Enhancing blind video quality assessment with rich quality-aware features,”Expert Systems with Applications, p. 130452, 2025

2025

[20] [20]

An empirical study for efficient video quality assessment,

Wei Sun, Kang Fu, Linhan Cao, Dandan Zhu, Kaiwei Zhang, Yucheng Zhu, Zicheng Zhang, Menghan Hu, Xiongkuo Min, and Guangtao Zhai, “An empirical study for efficient video quality assessment,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 1403–1413

2025

[21] [21]

Compressedvqa-hdr: Generalized full-reference and no-reference quality assessment models for com- pressed high dynamic range videos,

Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, and Guangtao Zhai, “Compressedvqa-hdr: Generalized full-reference and no-reference quality assessment models for com- pressed high dynamic range videos,” in2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2025, pp. 1–6

2025

[22] [22]

Assessing uhd image quality from aesthetics, distortions, and saliency,

Wei Sun, Weixia Zhang, Yuqin Cao, Linhan Cao, Jun Jia, Zijian Chen, Zicheng Zhang, Xiongkuo Min, and Guangtao Zhai, “Assessing uhd image quality from aesthetics, distortions, and saliency,” inEuropean Conference on Computer Vision, 2024, pp. 109–126

2024

[23] [23]

Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning,

Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Jun Jia, Kaiwei Zhang, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026, vol. 40, pp. 2607–2615

2026

[24] [24]

Qualirag: Retrieval-augmented generation for visual quality understanding,

Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Qualirag: Retrieval-augmented generation for visual quality understanding,”arXiv preprint arXiv:2601.18195, 2026

work page arXiv 2026

[25] [25]

Generalizable video quality assessment via weak-to-strong learning,

Linhan Cao, Wei Sun, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Yicong Peng, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Generalizable video quality assessment via weak-to-strong learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

2026

[26] [26]

Bench- marking multi-dimensional aigc video quality assessment: A dataset and unified model,

Zhichao Zhang, Wei Sun, Li Xinyue, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Wang Puyi, Sun Fengyu, et al., “Bench- marking multi-dimensional aigc video quality assessment: A dataset and unified model,”ACM Transactions on Multimedia Computing, Communications and Applications, vol. 21, no. 9, pp. 1–24, 2025

2025

[27] [27]

Efficient face image quality assessment via self-training and knowledge distillation,

Wei Sun, Weixia Zhang, Linhan Cao, Jun Jia, Xiangyang Zhu, Dandan Zhu, Xiongkuo Min, and Guangtao Zhai, “Efficient face image quality assessment via self-training and knowledge distillation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 3363–3371

2025

[28] [28]

Scandtm: A novel dual-temporal modulation scanpath prediction model for omnidirectional images,

Dandan Zhu, Kaiwei Zhang, Xiongkuo Min, Guangtao Zhai, and Xi- aokang Yang, “Scandtm: A novel dual-temporal modulation scanpath prediction model for omnidirectional images,”IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025

[29] [29]

From discrete representation to continuous modeling: A novel audio-visual saliency prediction model with implicit neural representations,

Dandan Zhu, Kaiwei Zhang, Kun Zhu, Nana Zhang, Weiping Ding, Guangtao Zhai, and Xiaokang Yang, “From discrete representation to continuous modeling: A novel audio-visual saliency prediction model with implicit neural representations,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 6, pp. 4059–4074, 2024

2024

[30] [30]

Mtcam: A novel weakly- supervised audio-visual saliency prediction model with multi-modal transformer,

Dandan Zhu, Kun Zhu, Weiping Ding, Nana Zhang, Xiongkuo Min, Guangtao Zhai, and Xiaokang Yang, “Mtcam: A novel weakly- supervised audio-visual saliency prediction model with multi-modal transformer,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1756–1771, 2024

2024

[31] [31]

A no-reference evaluation metric for low- light image enhancement,

Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, and Guangtao Zhai, “A no-reference evaluation metric for low- light image enhancement,” in2021 IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6

2021

[32] [32]

Blind multimodal quality assessment of low-light images,

Miaohui Wang, Zhuowei Xu, Mai Xu, and Weisi Lin, “Blind multimodal quality assessment of low-light images,”International Journal of Computer Vision, vol. 133, no. 4, pp. 1665–1688, 2025

2025

[33] [33]

Mledataset: Multi-annotated and multimodal low-light image enhancement dataset,

Bo Hu, Haitian Zhao, Yuanyuan Hu, and Xinbo Gao, “Mledataset: Multi-annotated and multimodal low-light image enhancement dataset,” https://github.com/CQUPT-HuBo90/MLEDataset, 2026, QoMEX 2026 Low-light Enhanced Image Quality Assessment Challenge

2026

[34] [34]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al., “Siglip 2: Multilingual vision- language encoders with improved semantic understanding, localization, and dense features,”arXiv preprint arXiv:2502.14786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025