pith. sign in
Pith Number

pith:2CC62ETS

pith:2023:2CC62ETSO5IOGBPTRWZ4EQRIIQ
not attested not anchored not stored refs resolved

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Dinesh Manocha, Furong Huang, Fuxiao Liu, Lichang Chen, Ruiqi Xian, Tianrui Guan, Tianyi Zhou, Xiaoyu Liu, Xijun Wang, Xiyang Wu, Yaser Yacoob, Zongxia Li

HallusionBench shows even GPT-4V reaches only 31.42 percent accuracy on paired questions that expose language hallucination and visual illusion in vision-language models.

arxiv:2310.14566 v5 · 2023-10-23 · cs.CV · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2CC62ETSO5IOGBPTRWZ4EQRIIQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

In our evaluation on HallusionBench, we benchmarked 15 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%.

C2weakest assumption

The assumption that human-expert-crafted questions with the novel control-group structure accurately isolate and measure entangled language hallucination and visual illusion without introducing confounding biases or subjective interpretations in scoring.

C3one line summary

HallusionBench shows GPT-4V reaches only 31.42% accuracy on paired questions testing language hallucination and visual illusion in LVLMs, with other models below 16%.

References

63 extracted · 63 resolved · 22 Pith anchors

[1] Gpt-4v(ision) system card. 2023. 6, 7 2023
[2] nocaps: novel object captioning at scale 2019
[3] Flamingo: a visual language model for few-shot learning
[4] Vqa: Visual question answering 2015
[5] Anas Awadalla, Irena Gao, Joshua Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bit- ton, Samir Gadre, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell 2023

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:45.995639Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d085ed12727750e305f38db3c2422844117bbc33a4caee2b605a4ef71814d040

Aliases

arxiv: 2310.14566 · arxiv_version: 2310.14566v5 · doi: 10.48550/arxiv.2310.14566 · pith_short_12: 2CC62ETSO5IO · pith_short_16: 2CC62ETSO5IOGBPT · pith_short_8: 2CC62ETS
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2CC62ETSO5IOGBPTRWZ4EQRIIQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d085ed12727750e305f38db3c2422844117bbc33a4caee2b605a4ef71814d040
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "75c5e52bb5f0a30f0f41eb1817f924d839d8a4359caf019c8aa79d5814b85e4d",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-10-23T04:49:09Z",
    "title_canon_sha256": "209a053a5e9fb6f85c51f383c22f0271269e5bf11b4b94c005535fb95ebeeca0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.14566",
    "kind": "arxiv",
    "version": 5
  }
}