pith. machine review for the scientific record. sign in
Pith Number

pith:SNHEDWXO

pith:2026:SNHEDWXO7SQJMA6PWZ7WSUEOCR
not attested not anchored not stored refs resolved

MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models

Mo Fan, Ryotaro Shimizu, Takashi Wada, Takuya Furusawa, Tianwei Chen, Yuki Hirakawa

A multi-label benchmark with aggregated annotator votes shows recent MLLMs have advanced on visual emotion prediction but still leave substantial room for improvement.

arxiv:2605.14635 v1 · 2026-05-14 · cs.CV · cs.AI

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Recent MLLMs show measurable progress on visual emotion prediction with the new multi-label benchmark, yet substantial room for improvement remains and LLM-as-a-judge does not consistently improve performance.

C2weakest assumption

Aggregating independent selections from twenty annotators per image produces a reliable and representative distribution of the emotions actually evoked by each image.

C3one line summary

MultiEmo-Bench supplies 10,344 images with aggregated multi-label emotion votes from 20 annotators each to evaluate MLLMs on dominant emotion and full distribution prediction.

References

34 extracted · 34 resolved · 5 Pith anchors

[1] Achlioptas,P.,Ovsjanikov,M.,Haydarov,K.,Elhoseiny,M.,Guibas,L.J.:Artemis: Affective language for visual art. In: CVPR. pp. 11569–11579 (2021) 2021
[2] Anthropic: System card: Claude opus 4 & claude sonnet 4. Tech. rep., Anthropic (May 2025) 2025
[3] Qwen3-VL Technical Report 2025 · arXiv:2511.21631
[4] Bhattacharyya, S., Wang, J.Z.: Evaluating vision-language models for emotion recognition. In: Chiruzzo, L., Ritter, A., Wang, L. (eds.) NAACL Findings. pp. 1798–1820. Association for Computational Lin 2025
[5] Chen, L., Li, J., Dong, X., Zhang, P., He, C., Wang, J., Zhao, F., Lin, D.: Sharegpt4v: Improving large multi-modal models with better captions. In: ECCV. vol. 15075, pp. 370–387 (2024) 2024

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:03.934180Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

934e41daeefca09603cfb67f69508e1470e285b921bda65b6789932ca7343b45

Aliases

arxiv: 2605.14635 · arxiv_version: 2605.14635v1 · doi: 10.48550/arxiv.2605.14635 · pith_short_12: SNHEDWXO7SQJ · pith_short_16: SNHEDWXO7SQJMA6P · pith_short_8: SNHEDWXO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SNHEDWXO7SQJMA6PWZ7WSUEOCR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 934e41daeefca09603cfb67f69508e1470e285b921bda65b6789932ca7343b45
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "628715a9745f18954b667c7cf58ab3600d2f1fc7a7cabb81666699325a2f17ed",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T09:49:32Z",
    "title_canon_sha256": "137561c728ed7e7f958ed5bf73182b4dff0136a9a5fb4f7abcdfafa92a51e473"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14635",
    "kind": "arxiv",
    "version": 1
  }
}