Recognition: no theorem link
TIQA: Human-Aligned Perceptual Text Quality Assessment in Generated Images
Pith reviewed 2026-05-15 14:29 UTC · model grok-4.3
The pith
Perceptual quality of text in AI-generated images can be scored separately from semantics using a dedicated no-reference model that reaches 0.94 correlation with human judgments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Perceptual text quality in generated images forms a distinct, measurable dimension that can be predicted without reference images or semantic understanding; ANTIQA achieves PLCC/SROCC of 0.942/0.935 on labeled text crops and 0.842/0.837 on full images from unseen generators, and raises text-quality MOS by 0.36 points when selecting among five outputs.
What carries the argument
ANTIQA, a lightweight neural predictor that applies text-specific inductive biases to detected text regions to produce a perceptual quality score independent of semantic content.
If this is right
- Enables automatic ranking and filtering of generated images by text rendering quality without new human labels.
- Supports generation-time selection that improves selected image text quality by 14 percent on the human MOS scale.
- Provides a reproducible benchmark for comparing text rendering performance across current and future text-to-image models.
- Separates visual typography defects from semantic errors so each can be optimized independently.
Where Pith is reading between the lines
- The metric could be inserted into training objectives so that generators directly optimize for readable text rather than only global realism.
- Similar region-specific perceptual predictors might be developed for other localized failure modes such as fine details in faces or hands.
- Large-scale use of the scorer would let researchers track whether progress in overall image quality is accompanied by commensurate gains in text legibility.
Load-bearing premise
The mean-opinion scores from the 10k labeled crops and 1,500 full images remain stable and representative of human perception for text quality across future generators.
What would settle it
Test ANTIQA on a new text-to-image generator released after the datasets were collected and measure whether its PLCC on text-quality MOS falls below 0.75.
Figures
read the original abstract
Recent text-to-image models have improved global realism, but text rendering remains a persistent failure mode: images may look convincing overall, yet local typography often contains malformed glyphs, broken strokes, irregular spacing, and other artifacts that humans heavily penalize. We formulate Text-in-Image Quality Assessment (TIQA), a no-reference task that estimates a human-aligned perceptual quality score for detected text regions while disentangling visual text quality from semantic correctness. To support this setting, we introduce two datasets. TIQA-Crops contains 120k text crops from 36k AI-generated images produced by 12 generators, with 10k mean-opinion-score (MOS) labels and 110k proxy labels for pretraining. TIQA-Images contains 1,500 text-heavy images from 10 recent generators, including proprietary systems, with paired overall-quality and text-quality subjective scores. We also propose ANTIQA, a lightweight predictor with text-specific inductive biases. Across crop-level and image-level evaluations, ANTIQA achieves the best alignment with human judgments, reaching PLCC/SROCC of 0.942/0.935 on TIQA-Crops and 0.842/0.837 for text-quality MOS on unseen generators in TIQA-Images. In best-of-5 AI-generated image ranking, ANTIQA improves the text quality of the selected image by 0.36 MOS (14%), demonstrating utility for benchmarking, filtering, and generation-time selection. Together, these findings establish perceptual text quality as a distinct evaluation target for modern text-to-image generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the TIQA no-reference task for estimating human-aligned perceptual quality scores of text regions in AI-generated images, disentangling visual quality from semantics. It releases two datasets—TIQA-Crops (120k crops from 36k images by 12 generators, with 10k MOS labels) and TIQA-Images (1,500 text-heavy images from 10 generators with paired scores)—and proposes the lightweight ANTIQA predictor with text-specific biases. ANTIQA reports PLCC/SROCC of 0.942/0.935 on TIQA-Crops and 0.842/0.837 on text-quality MOS for unseen generators in TIQA-Images, plus a 0.36 MOS (14%) improvement in best-of-5 ranking.
Significance. If the correlations and ranking gains hold under verification, the work supplies a practical perceptual metric for a known failure mode in text-to-image models. The datasets and ANTIQA could support benchmarking, filtering, and generation-time selection; the independent MOS collection avoids obvious circularity and the unseen-generator split provides a basic test of transfer.
major comments (3)
- [Abstract] Abstract: the headline PLCC/SROCC figures (0.942/0.935 and 0.842/0.837) are given without error bars, standard deviations across folds, or statistical significance tests against baselines, so it is impossible to judge whether the claimed superiority is reliable or within noise.
- [Abstract] Abstract: the central generalization claim—that ANTIQA works “across future generators”—rests on the untested assumption that the 10k MOS labels from only 12+10 generators already cover the space of human-perceptible text artifacts; no experiments probe novel glyph statistics, post-2024 diffusion artifacts, or new architectures that could shift the perceptual mapping.
- [Abstract] Abstract: no ablation results are reported for the text-specific inductive biases in ANTIQA, so the contribution of each design choice to the reported correlations cannot be isolated and the model’s claimed lightness and specificity remain unverified.
minor comments (1)
- [Abstract] The abstract should explicitly state whether the datasets will be publicly released, as this directly affects reproducibility of the 10k MOS labels and the reported numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We will revise the manuscript to strengthen the statistical reporting, clarify the generalization claims, and add ablation studies as detailed below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline PLCC/SROCC figures (0.942/0.935 and 0.842/0.837) are given without error bars, standard deviations across folds, or statistical significance tests against baselines, so it is impossible to judge whether the claimed superiority is reliable or within noise.
Authors: We agree that error bars, standard deviations across folds, and statistical significance tests are necessary to substantiate the superiority claims. In the revised version, we will report mean PLCC/SROCC with standard deviations computed over multiple cross-validation folds and include pairwise statistical significance tests (e.g., Steiger's test or bootstrap confidence intervals) against the baselines. revision: yes
-
Referee: [Abstract] Abstract: the central generalization claim—that ANTIQA works “across future generators”—rests on the untested assumption that the 10k MOS labels from only 12+10 generators already cover the space of human-perceptible text artifacts; no experiments probe novel glyph statistics, post-2024 diffusion artifacts, or new architectures that could shift the perceptual mapping.
Authors: The reported generalization is supported by the explicit unseen-generator split in TIQA-Images (10 generators held out from training), which demonstrates transfer to new generator families. We acknowledge that exhaustive coverage of all possible future artifacts is impossible and will revise the abstract wording to specify 'unseen generators' rather than 'future generators' while adding a limitations paragraph discussing potential shifts in perceptual mappings from novel architectures. revision: partial
-
Referee: [Abstract] Abstract: no ablation results are reported for the text-specific inductive biases in ANTIQA, so the contribution of each design choice to the reported correlations cannot be isolated and the model’s claimed lightness and specificity remain unverified.
Authors: We agree that ablations are required to isolate the impact of the text-specific inductive biases. The revised manuscript will include a dedicated ablation study removing or replacing each bias component (e.g., glyph-aware convolutions, spacing priors) and reporting the resulting drops in PLCC/SROCC on both TIQA-Crops and TIQA-Images. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper collects independent human MOS labels on TIQA-Crops (10k labels) and TIQA-Images (1,500 images) from separate generators, then trains ANTIQA on those labels and reports correlation on held-out crops and unseen-generator images. No equations, self-citations, or fitted parameters are presented as independent predictions; the reported PLCC/SROCC values and MOS improvement are direct empirical measurements against external human judgments. The derivation remains self-contained against the collected benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human mean-opinion scores collected on the 10k labeled crops and 1,500 images are stable and representative of general perceptual text quality.
Reference graph
Works this paper leans on
-
[1]
Stability AI. 2022. DeepFloyd IF (IF-I-M and related checkpoints). https:// stability.ai/news/deepfloyd-if-text-to-image-model Model card
work page 2022
-
[2]
Stability AI. 2022. Stable Diffusion v2.1. https://huggingface.co/qualcomm/ Stable-Diffusion-v2.1 Model card. Accessed: 2026-01-29
work page 2022
-
[3]
Stability AI. 2024. Stable Diffusion 3 Medium. https://huggingface.co/stabilityai/ stable-diffusion-3-medium Model card / release. Accessed: 2026-01-29
work page 2024
-
[4]
Stability AI. 2024. Stable Diffusion 3 Medium (announcement). https://stability. ai/news/stable-diffusion-3-medium Blog / release. Accessed: 2026-01-29
work page 2024
-
[5]
Stability AI. 2024. Stable Diffusion 3.5 Large. https://huggingface.co/stabilityai/ stable-diffusion-3.5-large Model card. Accessed: 2026-01-29
work page 2024
-
[6]
Stability AI. 2024. Stable Diffusion 3.5 Large Turbo. https://huggingface.co/ stabilityai/stable-diffusion-3.5-large-turbo Model card / release. Accessed: 2026- 01-29
work page 2024
-
[7]
Qwen Team / Alibaba. 2025. Qwen-Image (model repo / release). https://github. com/QwenLM/Qwen-Image Model repo / release. Accessed: 2026-01-29
work page 2025
-
[8]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, et al. 2025. Qwen3-VL Technical Report. arXiv:2...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Zenab Bosheah and Vilmos Bilicki. 2025. Challenges in Generating Accurate Text in Images: A Benchmark for Text-to-Image Models on Specialized Content. Applied Sciences15, 5 (2025), 2274
work page 2025
-
[10]
Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. 2024. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InForty-first International Conference on Machine Learning
work page 2024
-
[11]
Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. 2024. TOPIQ: A Top-Down Approach From Seman- tics to Distortions for Image Quality Assessment.IEEE Transactions on Image Processing33 (2024), 2404–2418. doi:10.1109/TIP.2024.3378466
-
[12]
Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhong- dao Wang, Ping Luo, Huchuan Lu, and Zhenguo Li. 2024. PixArt -𝜎: Weak- to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation. https://arxiv.org/abs/2403.04692 arXiv preprint / project page. Accessed: 2026- 01-29
-
[13]
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei
-
[14]
Textdiffuser: Diffusion models as text painters.Advances in Neural Infor- mation Processing Systems36 (2023), 9353–9387
work page 2023
-
[15]
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhong- dao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. 2023. PixArt-𝛼: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthe- sis. https://arxiv.org/abs/2310.00426 arXiv preprint. Accessed: 2026-01-29
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
EM Colombo, CF Kirschbaum, and M Raitelli. 1987. Legibility of texts: The influence of blur.Lighting Research & Technology19, 3 (1987), 61–71
work page 1987
-
[17]
Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, and Yanjun Ma. 2025. PaddleOCR 3.0 Technical Report. arXiv:2507.05595 [cs.CV] https://arxiv.org/abs/2507.05595
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Google DeepMind. 2025. Imagen 4 (Imagen 4 Fast) — Google / DeepMind. https://deepmind.google/models/imagen/ Model page / announcement. Accessed: 2026-01-29
work page 2025
-
[19]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[20]
Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. 2024. Reno: Enhancing one-step text-to-image models through reward- based noise optimization.Advances in Neural Information Processing Systems37 (2024), 125487–125519
work page 2024
-
[21]
Forouzan Fallah, Maitreya Patel, Agneet Chatterjee, Vlad Morariu, Chitta Baral, and Yezhou Yang. 2025. Textinvision: Text and prompt complexity driven vi- sual text generation benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 525–534
work page 2025
-
[22]
Black Forest Labs (FLUX). 2026. FLUX.1 [dev] (model card). https://huggingface. co/black-forest-labs/FLUX.1-dev Model card. Accessed: 2026-01-29
work page 2026
-
[23]
Google. 2025. Nano Banana Pro (Gemini 3 Pro Image). https://blog.google/ innovation-and-ai/products/nano-banana-pro/ Product blog. Accessed: 2026-01- 29
work page 2025
-
[24]
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Yuanzhuo Wang, and Jian Guo. 2024. A Survey on LLM-as-a-Judge.arXiv preprint arXiv: 2411.15594(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778
work page 2016
-
[26]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30 (2017)
work page 2017
-
[27]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7132–7141. doi:10.1109/CVPR.2018.00745
-
[28]
Ideogram. 2025. Ideogram 3 (Ideogram v3 / v3 turbo). https://ideogram.ai/ features/3.0 Product / model page. Accessed: 2026-01-29
work page 2025
-
[29]
JaidedAI. [n. d.]. EasyOCR. https://github.com/JaidedAI/EasyOCR. Accessed: 2026-03-12
work page 2026
-
[30]
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. 2023. Pick-a-pic: An open dataset of user preferences for text-to- image generation.Advances in neural information processing systems36 (2023), 36652–36663
work page 2023
- [31]
-
[32]
Black Forest Labs. 2024. FLUX1.1 Pro (product / model page). https://bfl.ai/ models/flux-pro Vendor model page. Accessed: 2026-01-29
work page 2024
-
[33]
Black Forest Labs. 2025. FLUX.2 [max] (model / product page). https://replicate. com/black-forest-labs/flux-2-max Model / API page. Accessed: 2026-01-29
work page 2025
-
[34]
Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, and Shixiang Shane Gu. 2023. Aligning text-to-image models using human feedback.arXiv preprint arXiv:2302.12192 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Chunyi Li, Tengchuan Kou, Yixuan Gao, Yuqin Cao, Wei Sun, Zicheng Zhang, Yingjie Zhou, Zhichao Zhang, Weixia Zhang, Haoning Wu, et al. 2024. Aigiqa- 20k: A large database for ai-generated image quality assessment. InIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 6327–6336
work page 2024
-
[36]
Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang. 2019. Show, attend and read: a simple and strong baseline for irregular text recognition. InPro- ceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artifici...
-
[37]
Xiongkuo Min, Ke Gu, Guangtao Zhai, Xiaokang Yang, Wenjun Zhang, Patrick Le Callet, and Chang Wen Chen. 2021. Screen content quality assessment: Overview, benchmark, and beyond.ACM Computing Surveys (CSUR)54, 9 (2021), 1–36
work page 2021
-
[38]
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. 2012. No- reference image quality assessment in the spatial domain.IEEE Transactions on image processing21, 12 (2012), 4695–4708
work page 2012
-
[39]
Novita. 2024. Novita AI. https://novita.ai. Accessed: 2024-03-01
work page 2024
- [40]
-
[41]
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Fidelity Image Generation. https://arxiv.org/abs/ 2307.01952 arXiv preprint. Accessed: 2026-01-29
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhip- kin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, and Denis Dimitrov. 2023. Kandinsky: An Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion (Kandinsky 2). https://arxiv. org/abs/2310.03502 arXiv preprint. Accessed: 2026-01-29
-
[43]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans.Advances in neural information processing systems29 (2016)
work page 2016
-
[44]
Georgia Gabriela Sampaio, Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Josh Susskind, Navdeep Jaitly, and Yizhe Zhang. 2024. Typescore: A text fidelity metric for text-to-image generative models.arXiv preprint arXiv:2411.02437 9 Kirill Koltsov, Aleksandr Gushchin, Anastasia Antsiferova, and Dmitriy Vatolin (2024)
-
[45]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems 35 (2022), 25278–25294
work page 2022
-
[46]
ByteDance / Seedream. 2025. Seedream 4.5 (ByteDance / Seedream). https: //byteplus.com/en/product/Seedream Product / API page. Accessed: 2026-01-29
work page 2025
-
[47]
Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seiichi Uchida, and Kota Yamaguchi. 2025. Type-r: Automatically retouching typos for text-to- image generation. InProceedings of the Computer Vision and Pattern Recognition Conference. 2745–2754
work page 2025
-
[48]
Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. 2020. Blindly assess image quality in the wild guided by a self-adaptive hyper network. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3667–3676
work page 2020
-
[49]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE transactions on image processing27, 8 (2018), 3998–4011
work page 2018
-
[50]
RapidAI Team. 2021. Rapid OCR: OCR Toolbox. https://github.com/RapidAI/ RapidOCR
work page 2021
-
[51]
V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Haochen Li, Jiale Zhu, Jiali Chen, Ji...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[52]
Tongyi-MAI. 2025. Z-Image-Turbo (Tongyi-MAI / Alibaba). https://huggingface. co/Tongyi-MAI/Z-Image-Turbo Model card / repo. Accessed: 2026-01-29
work page 2025
-
[55]
Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, and Xuansong Xie
- [56]
-
[57]
UC Berkeley. n.d.. LMArena. https://lmarena.ai/leaderboard/text-to-image
- [58]
-
[59]
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. 2023. Imagereward: Learning and evaluating human prefer- ences for text-to-image generation.Advances in Neural Information Processing Systems36 (2023), 15903–15935
work page 2023
-
[60]
Yandex. n.d.. Yandex.Tasks. https://tasks.yandex.com. Accessed 20 December 2025
work page 2025
-
[61]
Peng Ye and David Doermann. 2013. Document image quality assessment: A brief survey. In2013 12th International Conference on Document Analysis and Recognition. IEEE, 723–727
work page 2013
-
[62]
Xingsong Ye, Yongkun Du, Yunbo Tao, and Zhineng Chen. 2025. Textssr: Diffusion-based data synthesis for scene text recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision. 17464–17473
work page 2025
-
[63]
Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. 2025. Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 14483–14494
work page 2025
-
[64]
Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong
-
[65]
InEuropean Conference on Computer Vision
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models. InEuropean Conference on Computer Vision. 259– 276
-
[66]
Zai-Org. 2025. CogView4 (repo / model). https://github.com/zai-org/CogView4 Project / model release. Accessed: 2026-01-29
work page 2025
-
[67]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang
-
[68]
InProceedings of the IEEE conference on computer vision and pattern recognition
The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595
- [69]
-
[70]
Zicheng Zhang, Tengchuan Kou, Shushi Wang, Chunyi Li, Wei Sun, Wei Wang, Xiaoyu Li, Zongyu Wang, Xuezhi Cao, Xiongkuo Min, et al. 2025. Q-eval-100k: Evaluating visual quality and alignment level for text-to-vision content. InPro- ceedings of the Computer Vision and Pattern Recognition Conference. 10621–10631
work page 2025
-
[71]
Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, and Shiqi Wang. 2024. Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=mHtOyh5taj 10 TIQA: Human...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.