pith. sign in
Pith Number

pith:EKKVNZMN

pith:2026:EKKVNZMNS5DASX2FSGHUW54TDN
not attested not anchored not stored refs resolved

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Basel Alomair, Bhaskar Ramasubramanian, Chunjiang Liu, Fengqing Jiang, Hang Hua, Kaiyuan Zheng, Luyao Niu, Misha Sra, Radha Poovendran, Xiangliang Zhang, Yichen Feng, Yuanyuan Chen, Yue Huang, Yuetai Li, Zhangchen Xu, Zhengqing Yuan, Zichen Chen

Frontier multimodal models correctly pick both the best and worst image in only 26.5 percent of controlled aesthetic tasks, while human experts reach 68.9 percent.

arxiv:2605.12684 v1 · 2026-05-12 · cs.CV · cs.AI · cs.HC

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EKKVNZMNS5DASX2FSGHUW54TDN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

the strongest system identifies both the best and the worst image correctly across three random permutations of the candidate order in only 26.5% of tasks, far below the 68.9% achieved by human experts.

C2weakest assumption

That expert consensus on comparative preference over matched-subject sets constitutes a stable and generalizable ground truth for aesthetic quality, and that the 400 tasks sufficiently represent the space of visual aesthetic judgment.

C3one line summary

The Visual Aesthetic Benchmark shows frontier MLLMs achieve 26.5% accuracy on comparative best/worst image selection versus 68.9% for human experts, with fine-tuning closing some of the gap.

References

60 extracted · 60 resolved · 0 Pith anchors

[2] Color and Tonal Relationships • Are the color relationships more harmonious or layered? • Is there stronger control over warmth/coolness and light/dark contrast? • Does the work avoid muddiness, dulln
[6] • Any contextual or background information beyond what is visible in the image itself
[10] Color or Tonal Rendering • Is white balance or color temperature more accurate or more intentional? • Are colors or tonal values more consistent and unified? • Does the image avoid unwanted color cast
[12] • Assumptions about shooting difficulty
[13] Visual Focus and Information Clarity • Is the visual focal point clearer? • Does the visual flow guide the viewer more effectively? • Is there less visual noise or competition for attention? 22 BakeAI
Receipt and verification
First computed 2026-05-18T03:09:49.953603Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

229556e58d9746095f45918f4b77931b5a73fce8b6b8bc53e195b2871542fd43

Aliases

arxiv: 2605.12684 · arxiv_version: 2605.12684v1 · doi: 10.48550/arxiv.2605.12684 · pith_short_12: EKKVNZMNS5DA · pith_short_16: EKKVNZMNS5DASX2F · pith_short_8: EKKVNZMN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EKKVNZMNS5DASX2FSGHUW54TDN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 229556e58d9746095f45918f4b77931b5a73fce8b6b8bc53e195b2871542fd43
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4596c4afdac08db2cc45041be0ab0a872a85d4a94f3f47bab1190d76c43d8153",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.HC"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-12T19:33:28Z",
    "title_canon_sha256": "72ad442ed7f2ab054cf1953df25d10a234f1634f7f5550ae5d1a254a1bc03be4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12684",
    "kind": "arxiv",
    "version": 1
  }
}