pith:MUDDTA3E
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
LMMs achieve better visual scoring by predicting discrete text-defined rating levels instead of numerical scores.
arxiv:2312.17090 v1 · 2023-12-28 · cs.CV · cs.CL · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MUDDTA3EGAO3LWODE3IBSEHJDO}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA), as well as video quality assessment (VQA) tasks under the original LMM structure. With the syllabus, we further unify the three tasks into one model, termed the OneAlign.
That training LMMs with discrete text-defined levels emulates human subjective judgment processes more effectively than direct numerical score regression, leading to better performance without architectural changes or extra data.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:50.870033Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
6506398364301db5d9c326d01910e91b8e6f1cf78cfa923d524c2e204befdf59
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MUDDTA3EGAO3LWODE3IBSEHJDO \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6506398364301db5d9c326d01910e91b8e6f1cf78cfa923d524c2e204befdf59
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2d275bc65e5e2f8fe7a4b0e5f050b1ca381a4fb351da30dfaac05792e27d2383",
"cross_cats_sorted": [
"cs.CL",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2023-12-28T16:10:25Z",
"title_canon_sha256": "c390943399b25c80248b0569a0af41814140229c9ae0d4d47a6ae22dfe712147"
},
"schema_version": "1.0",
"source": {
"id": "2312.17090",
"kind": "arxiv",
"version": 1
}
}