Pith Number

pith:LE7ARFCN

pith:2024:LE7ARFCNHEWEZEENCZ3Z4QA3V5

not attested not anchored not stored refs resolved

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Bokai Xu, Chaoyue Tang, Junbo Cui, Junhao Ran, Maosong Sun, Shi Yu, Shuo Wang, Xu Han, Yukun Yan, Zhenghao Liu, Zhiyuan Liu

VisRAG retrieves and generates from multi-modal documents by embedding them directly as images rather than parsing to text.

arxiv:2410.10594 v2 · 2024-10-14 · cs.IR · cs.AI · cs.CL · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{LE7ARFCNHEWEZEENCZ3Z4QA3V5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments demonstrate that VisRAG outperforms traditional RAG in both the retrieval and generation stages, achieving a 20--40% end-to-end performance gain over traditional text-based RAG pipeline.

C2weakest assumption

That vision-language models can reliably embed and retrieve relevant information directly from document images without text parsing, and that the collected open-source plus synthetic training data generalizes to unseen real-world multi-modality documents.

C3one line summary

VisRAG achieves 20-40% better end-to-end performance than text-based RAG by directly embedding and retrieving document images with VLMs.

References

43 extracted · 43 resolved · 11 Pith anchors

[1] GPT-4 Technical Report · arXiv:2303.08774

[2] A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity 2023

[3] Allava: Harness- ing gpt4v-synthesized data for a lite vision-language model

[4] PP-OCR: A practical ultra lightweight OCR system.CoRR, abs/2009.09941 2009

[5] ColPali: Efficient Document Retrieval with Vision Language Models · arXiv:2407.01449

Formal links

1 machine-checked theorem link

Cited by

25 papers in Pith

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents

Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

MMSearch-R1: Incentivizing LMMs to Search

Receipt and verification

First computed	2026-05-17T23:38:47.418247Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

593e08944d392c4c908d16779e401baf6845fa73cb450646cc58fec8f40735bd

Aliases

arxiv: 2410.10594 · arxiv_version: 2410.10594v2 · doi: 10.48550/arxiv.2410.10594 · pith_short_12: LE7ARFCNHEWE · pith_short_16: LE7ARFCNHEWEZEEN · pith_short_8: LE7ARFCN

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/LE7ARFCNHEWEZEENCZ3Z4QA3V5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 593e08944d392c4c908d16779e401baf6845fa73cb450646cc58fec8f40735bd

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ef855c401c9db5f58828228443d2d54b7befe49e7a2d658a3c722ed3ecc37174",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2024-10-14T15:04:18Z",
    "title_canon_sha256": "06d798e0973d1a421d2517422dcf3d932d2229c42b4b5b9dc66fd712adbdc73e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.10594",
    "kind": "arxiv",
    "version": 2
  }
}