Pith Number

pith:4QMBPWEV

pith:2023:4QMBPWEV753MSZFH4YSJGD4HVR

not attested not anchored not stored refs resolved

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

Guohai Xu, Haitao Jia, Haiyang Xu, Jiaqi Wang, Jing Zhang, Jitao Sang, Ji Zhang, Junyang Wang, Ming Yan, Yuhang Wang, Yukai Gu

AMBER provides an LLM-free benchmark to evaluate hallucinations in multi-modal models across existence, attribute and relation dimensions for generative and discriminative tasks.

arxiv:2311.07397 v2 · 2023-11-13 · cs.CL · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{4QMBPWEV753MSZFH4YSJGD4HVR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we propose an LLM-free multi-dimensional benchmark AMBER, which can be used to evaluate both generative task and discriminative task including existence, attribute and relation hallucination.

C2weakest assumption

That the proposed low-cost evaluation pipeline can accurately detect and categorize hallucinations without introducing new biases or missing important cases that would require LLM or human judgment.

C3one line summary

AMBER is an LLM-free multi-dimensional benchmark for evaluating hallucinations in MLLMs across generative and discriminative tasks.

References

15 extracted · 15 resolved · 8 Pith anchors

[1] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond · arXiv:2308.12966

[2] MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning · arXiv:2310.09478

[3] Holistic analysis of hallucination in gpt-4v(ision): Bias and interference challenges.CoRR, abs/2311.03287

[4] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning · arXiv:2305.06500

[5] Detecting and preventing hallucinations in large vi- sion language models

Formal links

1 machine-checked theorem link

Cited by

31 papers in Pith

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects?

How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

Receipt and verification

First computed	2026-05-17T23:38:48.794685Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

e41817d895ff76c964a7e624930f87ac49acc6a189d2e4522a183784060a9941

Aliases

arxiv: 2311.07397 · arxiv_version: 2311.07397v2 · doi: 10.48550/arxiv.2311.07397 · pith_short_12: 4QMBPWEV753M · pith_short_16: 4QMBPWEV753MSZFH · pith_short_8: 4QMBPWEV

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/4QMBPWEV753MSZFH4YSJGD4HVR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e41817d895ff76c964a7e624930f87ac49acc6a189d2e4522a183784060a9941

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d3b64a4984882eb45d3a0c8ed5040f1b03a382944439c0dc3cc149470cabe60e",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-11-13T15:25:42Z",
    "title_canon_sha256": "86a0484be9d98694bb255b98a4a8cc2ddd51a86237d4d5c0e8d1ab7780b830ad"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.07397",
    "kind": "arxiv",
    "version": 2
  }
}