Pith Number

pith:SNHEDWXO

pith:2026:SNHEDWXO7SQJMA6PWZ7WSUEOCR

not attested not anchored not stored refs resolved

MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models

Mo Fan, Ryotaro Shimizu, Takashi Wada, Takuya Furusawa, Tianwei Chen, Yuki Hirakawa

A multi-label benchmark with aggregated annotator votes shows recent MLLMs have advanced on visual emotion prediction but still leave substantial room for improvement.

arxiv:2605.14635 v1 · 2026-05-14 · cs.CV · cs.AI

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Recent MLLMs show measurable progress on visual emotion prediction with the new multi-label benchmark, yet substantial room for improvement remains and LLM-as-a-judge does not consistently improve performance.

C2weakest assumption

Aggregating independent selections from twenty annotators per image produces a reliable and representative distribution of the emotions actually evoked by each image.

C3one line summary

MultiEmo-Bench supplies 10,344 images with aggregated multi-label emotion votes from 20 annotators each to evaluate MLLMs on dominant emotion and full distribution prediction.

References

34 extracted · 34 resolved · 5 Pith anchors

[1] Achlioptas,P.,Ovsjanikov,M.,Haydarov,K.,Elhoseiny,M.,Guibas,L.J.:Artemis: Affective language for visual art. In: CVPR. pp. 11569–11579 (2021) 2021

[2] Anthropic: System card: Claude opus 4 & claude sonnet 4. Tech. rep., Anthropic (May 2025) 2025

[3] Qwen3-VL Technical Report 2025 · arXiv:2511.21631

[4] Bhattacharyya, S., Wang, J.Z.: Evaluating vision-language models for emotion recognition. In: Chiruzzo, L., Ritter, A., Wang, L. (eds.) NAACL Findings. pp. 1798–1820. Association for Computational Lin 2025

[5] Chen, L., Li, J., Dong, X., Zhang, P., He, C., Wang, J., Zhao, F., Lin, D.: Sharegpt4v: Improving large multi-modal models with better captions. In: ECCV. vol. 15075, pp. 370–387 (2024) 2024

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-17T23:39:03.934180Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

934e41daeefca09603cfb67f69508e1470e285b921bda65b6789932ca7343b45

Aliases

arxiv: 2605.14635 · arxiv_version: 2605.14635v1 · doi: 10.48550/arxiv.2605.14635 · pith_short_12: SNHEDWXO7SQJ · pith_short_16: SNHEDWXO7SQJMA6P · pith_short_8: SNHEDWXO

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/SNHEDWXO7SQJMA6PWZ7WSUEOCR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 934e41daeefca09603cfb67f69508e1470e285b921bda65b6789932ca7343b45

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "628715a9745f18954b667c7cf58ab3600d2f1fc7a7cabb81666699325a2f17ed",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T09:49:32Z",
    "title_canon_sha256": "137561c728ed7e7f958ed5bf73182b4dff0136a9a5fb4f7abcdfafa92a51e473"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14635",
    "kind": "arxiv",
    "version": 1
  }
}