pith. sign in
Pith Number

pith:4QMBPWEV

pith:2023:4QMBPWEV753MSZFH4YSJGD4HVR
not attested not anchored not stored refs resolved

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

Guohai Xu, Haitao Jia, Haiyang Xu, Jiaqi Wang, Jing Zhang, Jitao Sang, Ji Zhang, Junyang Wang, Ming Yan, Yuhang Wang, Yukai Gu

AMBER provides an LLM-free benchmark to evaluate hallucinations in multi-modal models across existence, attribute and relation dimensions for generative and discriminative tasks.

arxiv:2311.07397 v2 · 2023-11-13 · cs.CL · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4QMBPWEV753MSZFH4YSJGD4HVR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we propose an LLM-free multi-dimensional benchmark AMBER, which can be used to evaluate both generative task and discriminative task including existence, attribute and relation hallucination.

C2weakest assumption

That the proposed low-cost evaluation pipeline can accurately detect and categorize hallucinations without introducing new biases or missing important cases that would require LLM or human judgment.

C3one line summary

AMBER is an LLM-free multi-dimensional benchmark for evaluating hallucinations in MLLMs across generative and discriminative tasks.

References

15 extracted · 15 resolved · 8 Pith anchors

[1] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond · arXiv:2308.12966
[2] MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning · arXiv:2310.09478
[3] Holistic analysis of hallucination in gpt-4v(ision): Bias and interference challenges.CoRR, abs/2311.03287
[4] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning · arXiv:2305.06500
[5] Detecting and preventing hallucinations in large vi- sion language models

Formal links

1 machine-checked theorem link

Cited by

31 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.794685Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e41817d895ff76c964a7e624930f87ac49acc6a189d2e4522a183784060a9941

Aliases

arxiv: 2311.07397 · arxiv_version: 2311.07397v2 · doi: 10.48550/arxiv.2311.07397 · pith_short_12: 4QMBPWEV753M · pith_short_16: 4QMBPWEV753MSZFH · pith_short_8: 4QMBPWEV
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4QMBPWEV753MSZFH4YSJGD4HVR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e41817d895ff76c964a7e624930f87ac49acc6a189d2e4522a183784060a9941
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d3b64a4984882eb45d3a0c8ed5040f1b03a382944439c0dc3cc149470cabe60e",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-11-13T15:25:42Z",
    "title_canon_sha256": "86a0484be9d98694bb255b98a4a8cc2ddd51a86237d4d5c0e8d1ab7780b830ad"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.07397",
    "kind": "arxiv",
    "version": 2
  }
}