pith. sign in

arxiv: 2606.29752 · v1 · pith:OF2THWJAnew · submitted 2026-06-29 · 💻 cs.CV · cs.MM

LEIQ-Assessor: Multi-dimensional Quality Assessment of Low-light Enhanced Images via Multi-task Learning

Pith reviewed 2026-06-30 06:36 UTC · model grok-4.3

classification 💻 cs.CV cs.MM
keywords low-light image enhancementimage quality assessmentmulti-task learningno-reference IQAperceptual sub-attributesvision transformerMOS predictionPLCC loss
0
0 comments X

The pith

Joint optimization on overall and sub-attribute quality scores yields richer features for assessing low-light enhanced images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Low-light image enhancement can introduce artifacts like noise and color shifts that affect perceived quality, so reliable assessment metrics matter for algorithm development and use. The paper develops LEIQ-Assessor to evaluate these enhanced images by predicting both an overall quality score and six specific perceptual attributes at once. It uses a pre-trained vision transformer as backbone and trains the tasks together with PLCC loss to learn better shared features. This multi-task approach is shown to work better than single-task versions or other existing methods on a standard benchmark. Such a metric supports better development of enhancement techniques.

Core claim

LEIQ-Assessor is a multi-dimensional quality assessment model for low-light enhanced images that uses multi-task learning on a pre-trained SigLIP2 Vision Transformer backbone to jointly predict the overall Mean Opinion Score along with six perceptual sub-attributes by optimizing with the PLCC loss, thereby capturing richer quality-aware features in the shared representation than single-task training and outperforming existing no-reference IQA models on the MLE benchmark.

What carries the argument

The shared representation from the SigLIP2 Vision Transformer backbone, learned by jointly optimizing predictions of overall MOS and six correlated perceptual sub-attributes via the PLCC loss.

Load-bearing premise

The six perceptual sub-attributes are correlated with overall quality in a way that makes joint optimization produce better features than training on the overall score alone.

What would settle it

A direct comparison experiment where the identical backbone is trained only on overall MOS and evaluated on the MLE benchmark; equal or superior performance would indicate that multi-task learning does not provide the claimed benefit.

Figures

Figures reproduced from arXiv: 2606.29752 by Dandan Zhu, Guangtao Zhai, Jikai Xu, Jinqiu Sang, Wei Sun, Weixia Zhang, Yanwei Jiang.

Figure 1
Figure 1. Figure 1: Overall architecture of the proposed LEIQ-Assessor. A low-light [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Low-light image enhancement algorithms (LIEAs) aim to improve the visibility of images captured under poor illumination. However, the enhancement process often introduces artifacts such as noise amplification, color shift, structural damage, and over-exposure, which degrade the perceptual quality of the enhanced images. Therefore, a reliable image quality assessment (IQA) metric for evaluating enhancement effects is of great importance for both the development of LIEAs and their practical applications. In this paper, we present \textbf{LEIQ-Assessor}, a multi-dimensional quality assessment model for low-light image enhancement based on multi-task learning, developed for the QoMEX 2026 Grand Challenge on Low-light Enhanced Image Quality Assessment. Specifically, our method leverages a pre-trained SigLIP2 Vision Transformer as the backbone and simultaneously predicts the overall Mean Opinion Score (MOS) together with six perceptual sub-attributes: lightness, color fidelity, noise level, exposure quality, naturalness, and content recovery. By jointly optimizing these correlated objectives via the PLCC loss, the shared representation captures richer quality-aware features than its single-task counterpart. Experiments on the MLE benchmark demonstrate that LEIQ-Assessor significantly outperforms existing no-reference IQA models and hand-crafted quality descriptors. Our method achieved second place in the QoMEX 2026 Grand Challenge on Low-light Enhanced Image Quality Assessment. The code is available at https://github.com/sunwei925/LEIQ-Assessor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents LEIQ-Assessor, a multi-task model for no-reference quality assessment of low-light enhanced images. It uses a pre-trained SigLIP2 Vision Transformer backbone to jointly predict overall MOS and six perceptual sub-attributes (lightness, color fidelity, noise level, exposure quality, naturalness, content recovery) via PLCC loss, claiming that the shared representation yields richer quality-aware features than single-task training. Experiments on the MLE benchmark are reported to show significant outperformance over existing NR-IQA models and hand-crafted descriptors, with the method placing second in the QoMEX 2026 Grand Challenge; code is released.

Significance. If the performance gains hold under rigorous verification, the multi-task PLCC formulation offers a practical advance for IQA of enhanced images by exploiting attribute correlations, with direct utility for developing and deploying low-light enhancement algorithms. The challenge placement and public code release provide concrete external corroboration and reproducibility value.

major comments (2)
  1. [Experiments] The manuscript supplies no experimental details on training procedure (e.g., learning-rate schedule, batch size, data splits, or loss-weighting scheme for the multi-task objective), baseline re-implementations, or statistical significance testing. This directly undermines verification of the central claim that joint optimization on the six sub-attributes produces richer features and superior benchmark performance.
  2. [Experiments] No ablation is described that isolates the contribution of multi-task training versus single-task training on the same backbone and loss; without this comparison the assertion that the shared representation is richer remains untested.
minor comments (1)
  1. The abstract states that the method 'significantly outperforms' existing models; quantitative tables or figures reporting PLCC/SROCC values, confidence intervals, and the precise MLE benchmark split used would strengthen the presentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We will strengthen the experimental section to improve reproducibility and directly address the concerns raised.

read point-by-point responses
  1. Referee: [Experiments] The manuscript supplies no experimental details on training procedure (e.g., learning-rate schedule, batch size, data splits, or loss-weighting scheme for the multi-task objective), baseline re-implementations, or statistical significance testing. This directly undermines verification of the central claim that joint optimization on the six sub-attributes produces richer features and superior benchmark performance.

    Authors: We agree that the current manuscript lacks sufficient experimental details. In the revised version we will add a dedicated subsection describing the full training procedure (learning-rate schedule, batch size, data splits, and loss-weighting scheme), the re-implementation protocol for all baselines, and the results of statistical significance testing. These additions will enable independent verification of the reported performance gains. revision: yes

  2. Referee: [Experiments] No ablation is described that isolates the contribution of multi-task training versus single-task training on the same backbone and loss; without this comparison the assertion that the shared representation is richer remains untested.

    Authors: We acknowledge that an explicit ablation isolating multi-task versus single-task training on the identical SigLIP2 backbone and PLCC loss is absent. We will perform and report this ablation study in the revision; the results will directly test whether the joint optimization yields richer quality-aware features, thereby supporting (or qualifying) the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical supervised model on external annotations

full rationale

The manuscript describes a standard multi-task regression network (SigLIP2 backbone + joint PLCC loss on MOS plus six sub-attributes) trained end-to-end on the MLE benchmark. No equations, uniqueness theorems, or ansatzes are introduced that reduce the output to a fitted parameter or self-citation by construction. Performance is reported against external baselines and a public challenge ranking, satisfying the criteria for an independent empirical result. No load-bearing self-citations or self-definitional steps appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on a pre-trained backbone whose internal representations are treated as given, human-annotated benchmark data whose reliability is assumed, and the domain assumption that the chosen sub-attributes are mutually informative for overall quality prediction.

free parameters (1)
  • multi-task loss balancing weights
    Multi-task learning requires choosing relative weights for each attribute loss; these are not specified in the abstract and are typically tuned on validation data.
axioms (1)
  • domain assumption Joint optimization of the six sub-attributes via PLCC loss yields richer quality-aware features than single-task training
    The central performance claim rests on this assumption about correlated objectives.

pith-pipeline@v0.9.1-grok · 5812 in / 1255 out tokens · 43612 ms · 2026-06-30T06:36:47.734802+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    A dynamic histogram equalization for image contrast enhancement,

    Mohammad Abdullah-Al-Wadud, Md Hasanul Kabir, M Ali Akber Dewan, and Oksam Chae, “A dynamic histogram equalization for image contrast enhancement,”IEEE Transactions on Consumer Electronics, vol. 53, no. 2, pp. 593–600, 2007

  2. [2]

    A weighted variational model for simultaneous reflectance and illumination estimation,

    Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, and Xinghao Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2782–2790

  3. [3]

    Enlightengan: Deep light enhancement without paired supervision,

    Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang, “Enlightengan: Deep light enhancement without paired supervision,”IEEE Transactions on Image Processing, vol. 30, pp. 2340–2349, 2021

  4. [4]

    Perceptual quality assessment of low-light image enhancement,

    Guangtao Zhai, Wei Sun, Xiongkuo Min, and Jiantao Zhou, “Perceptual quality assessment of low-light image enhancement,”ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 4, pp. 1–24, 2021

  5. [5]

    No- reference image quality assessment in the spatial domain,

    Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik, “No- reference image quality assessment in the spatial domain,”IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012

  6. [6]

    Making a “completely blind

    Anish Mittal, Rajiv Soundararajan, and Alan C Bovik, “Making a “completely blind” image quality analyzer,”IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, 2012

  7. [7]

    Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,

    Wei Sun, Xiongkuo Min, Danyang Tu, Siwei Ma, and Guangtao Zhai, “Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,”IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 6, pp. 1178–1192, 2023

  8. [8]

    Blind image quality assessment via vision-language correspondence: A multitask learning perspective,

    Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma, “Blind image quality assessment via vision-language correspondence: A multitask learning perspective,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14071–14081

  9. [9]

    Exploring clip for assessing the look and feel of images,

    Jianyi Wang, Kelvin CK Chan, and Chen Change Loy, “Exploring clip for assessing the look and feel of images,” inProceedings of the AAAI conference on Artificial Intelligence, 2023, vol. 37, pp. 2555–2563

  10. [10]

    Maniqa: Multi-dimension attention network for no-reference image quality assessment,

    Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Ming- deng Cao, Jiahao Wang, and Yujiu Yang, “Maniqa: Multi-dimension attention network for no-reference image quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1191–1200

  11. [11]

    A deep learning based no-reference quality assessment model for ugc videos,

    Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai, “A deep learning based no-reference quality assessment model for ugc videos,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 856–865

  12. [12]

    Dynamic backlight scaling considering ambient luminance for mobile videos on lcd displays,

    Wei Sun, Xiongkuo Min, Guangtao Zhai, Ke Gu, Siwei Ma, and Xiaokang Yang, “Dynamic backlight scaling considering ambient luminance for mobile videos on lcd displays,”IEEE Transactions on Mobile Computing, vol. 21, no. 1, pp. 110–124, 2020

  13. [13]

    Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment,

    Wei Sun, Xiongkuo Min, Guangtao Zhai, Ke Gu, Huiyu Duan, and Siwei Ma, “Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment,”IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 1, pp. 64–77, 2019

  14. [14]

    Deep neural network for blind visual quality assessment of 4k content,

    Wei Lu, Wei Sun, Xiongkuo Min, Wenhan Zhu, Quan Zhou, Jun He, Qiyuan Wang, Zicheng Zhang, Tao Wang, and Guangtao Zhai, “Deep neural network for blind visual quality assessment of 4k content,”IEEE Transactions on Broadcasting, vol. 69, no. 2, pp. 406–421, 2022

  15. [15]

    Analysis of video quality datasets via design of minimalistic video quality models,

    Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, and Kede Ma, “Analysis of video quality datasets via design of minimalistic video quality models,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 11, pp. 7056–7071, 2024

  16. [16]

    Dual-branch network for portrait image quality assessment,

    Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, et al., “Dual-branch network for portrait image quality assessment,”arXiv preprint arXiv:2405.08555, 2024

  17. [17]

    Large multi-modality model assisted ai-generated image quality assessment,

    Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, and Guangtao Zhai, “Large multi-modality model assisted ai-generated image quality assessment,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 7803–7812

  18. [18]

    Lmm-vqa: Advancing video quality assessment with large multimodal models,

    Qihang Ge, Wei Sun, Yu Zhang, Yunhao Li, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, and Guangtao Zhai, “Lmm-vqa: Advancing video quality assessment with large multimodal models,” IEEE Transactions on Circuits and Systems for Video Technology, 2025

  19. [19]

    Enhancing blind video quality assessment with rich quality-aware features,

    Wei Sun, Linhan Cao, Jun Jia, Zhichao Zhang, Zicheng Zhang, Xiongkuo Min, and Guangtao Zhai, “Enhancing blind video quality assessment with rich quality-aware features,”Expert Systems with Applications, p. 130452, 2025

  20. [20]

    An empirical study for efficient video quality assessment,

    Wei Sun, Kang Fu, Linhan Cao, Dandan Zhu, Kaiwei Zhang, Yucheng Zhu, Zicheng Zhang, Menghan Hu, Xiongkuo Min, and Guangtao Zhai, “An empirical study for efficient video quality assessment,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 1403–1413

  21. [21]

    Compressedvqa-hdr: Generalized full-reference and no-reference quality assessment models for com- pressed high dynamic range videos,

    Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, and Guangtao Zhai, “Compressedvqa-hdr: Generalized full-reference and no-reference quality assessment models for com- pressed high dynamic range videos,” in2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2025, pp. 1–6

  22. [22]

    Assessing uhd image quality from aesthetics, distortions, and saliency,

    Wei Sun, Weixia Zhang, Yuqin Cao, Linhan Cao, Jun Jia, Zijian Chen, Zicheng Zhang, Xiongkuo Min, and Guangtao Zhai, “Assessing uhd image quality from aesthetics, distortions, and saliency,” inEuropean Conference on Computer Vision, 2024, pp. 109–126

  23. [23]

    Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning,

    Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Jun Jia, Kaiwei Zhang, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026, vol. 40, pp. 2607–2615

  24. [24]

    Qualirag: Retrieval-augmented generation for visual quality understanding,

    Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Qualirag: Retrieval-augmented generation for visual quality understanding,”arXiv preprint arXiv:2601.18195, 2026

  25. [25]

    Generalizable video quality assessment via weak-to-strong learning,

    Linhan Cao, Wei Sun, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Yicong Peng, Dandan Zhu, Guangtao Zhai, and Xiongkuo Min, “Generalizable video quality assessment via weak-to-strong learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

  26. [26]

    Bench- marking multi-dimensional aigc video quality assessment: A dataset and unified model,

    Zhichao Zhang, Wei Sun, Li Xinyue, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Wang Puyi, Sun Fengyu, et al., “Bench- marking multi-dimensional aigc video quality assessment: A dataset and unified model,”ACM Transactions on Multimedia Computing, Communications and Applications, vol. 21, no. 9, pp. 1–24, 2025

  27. [27]

    Efficient face image quality assessment via self-training and knowledge distillation,

    Wei Sun, Weixia Zhang, Linhan Cao, Jun Jia, Xiangyang Zhu, Dandan Zhu, Xiongkuo Min, and Guangtao Zhai, “Efficient face image quality assessment via self-training and knowledge distillation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 3363–3371

  28. [28]

    Scandtm: A novel dual-temporal modulation scanpath prediction model for omnidirectional images,

    Dandan Zhu, Kaiwei Zhang, Xiongkuo Min, Guangtao Zhai, and Xi- aokang Yang, “Scandtm: A novel dual-temporal modulation scanpath prediction model for omnidirectional images,”IEEE Transactions on Circuits and Systems for Video Technology, 2025

  29. [29]

    From discrete representation to continuous modeling: A novel audio-visual saliency prediction model with implicit neural representations,

    Dandan Zhu, Kaiwei Zhang, Kun Zhu, Nana Zhang, Weiping Ding, Guangtao Zhai, and Xiaokang Yang, “From discrete representation to continuous modeling: A novel audio-visual saliency prediction model with implicit neural representations,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 6, pp. 4059–4074, 2024

  30. [30]

    Mtcam: A novel weakly- supervised audio-visual saliency prediction model with multi-modal transformer,

    Dandan Zhu, Kun Zhu, Weiping Ding, Nana Zhang, Xiongkuo Min, Guangtao Zhai, and Xiaokang Yang, “Mtcam: A novel weakly- supervised audio-visual saliency prediction model with multi-modal transformer,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1756–1771, 2024

  31. [31]

    A no-reference evaluation metric for low- light image enhancement,

    Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, and Guangtao Zhai, “A no-reference evaluation metric for low- light image enhancement,” in2021 IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6

  32. [32]

    Blind multimodal quality assessment of low-light images,

    Miaohui Wang, Zhuowei Xu, Mai Xu, and Weisi Lin, “Blind multimodal quality assessment of low-light images,”International Journal of Computer Vision, vol. 133, no. 4, pp. 1665–1688, 2025

  33. [33]

    Mledataset: Multi-annotated and multimodal low-light image enhancement dataset,

    Bo Hu, Haitian Zhao, Yuanyuan Hu, and Xinbo Gao, “Mledataset: Multi-annotated and multimodal low-light image enhancement dataset,” https://github.com/CQUPT-HuBo90/MLEDataset, 2026, QoMEX 2026 Low-light Enhanced Image Quality Assessment Challenge

  34. [34]

    SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

    Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al., “Siglip 2: Multilingual vision- language encoders with improved semantic understanding, localization, and dense features,”arXiv preprint arXiv:2502.14786, 2025