pith. sign in
Pith Number

pith:WYVVK7UF

pith:2023:WYVVK7UFB4B2WG5HOU57Q6FZLE
not attested not anchored not stored refs resolved

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Fuxiao Liu, Jianfeng Wang, Kevin Lin, Lijuan Wang, Linjie Li, Yaser Yacoob

Finetuning on a dataset with both positive and negative visual instructions reduces hallucinations in large multi-modal models.

arxiv:2306.14565 v4 · 2023-06-26 · cs.CV · cs.AI · cs.CE · cs.CL · cs.MM

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WYVVK7UFB4B2WG5HOU57Q6FZLE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We successfully mitigate hallucination by finetuning MiniGPT4 and mPLUG-Owl on LRV-Instruction while improving performance on several public datasets compared to state-of-the-art methods.

C2weakest assumption

That GPT-4-generated negative instructions at the three semantic levels accurately capture the hallucination behaviors that matter in real deployments and that the GAVIE GPT-4 judge produces scores that align with human judgment.

C3one line summary

A new dataset of 400k visual instructions including negative examples at three semantic levels reduces hallucinations in models like MiniGPT-4 when used for fine-tuning while improving benchmark performance.

References

34 extracted · 34 resolved · 15 Pith anchors

[1] Spice: Semantic propositional image caption evaluation 2016
[2] Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, et al · doi:10.5281/zenodo.7733589
[3] Language models are few-shot learners 1901
[4] MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning · arXiv:2310.09478
[5] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning · arXiv:2305.06500

Formal links

2 machine-checked theorem links

Cited by

38 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:22.305051Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b62b557e850f03ab1ba7753bf878b959185b9fecd2660f4f01f58fff3d4ad2e3

Aliases

arxiv: 2306.14565 · arxiv_version: 2306.14565v4 · doi: 10.48550/arxiv.2306.14565 · pith_short_12: WYVVK7UFB4B2 · pith_short_16: WYVVK7UFB4B2WG5H · pith_short_8: WYVVK7UF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WYVVK7UFB4B2WG5HOU57Q6FZLE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b62b557e850f03ab1ba7753bf878b959185b9fecd2660f4f01f58fff3d4ad2e3
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f37358e2dda34cc97dd49e498d272bd67d03fb70082bf90c64217715d5ba88bd",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CE",
      "cs.CL",
      "cs.MM"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-06-26T10:26:33Z",
    "title_canon_sha256": "9b26ffcce0d957b6538a8e449e36f9f06417f87d058c00bbf90192962d56c9d3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2306.14565",
    "kind": "arxiv",
    "version": 4
  }
}