VisRAG achieves 20-40% better end-to-end performance than text-based RAG by directly embedding and retrieving document images with VLMs.
We report offline latencies per document, including document parsing and encoding latencies, as well as online latencies per query, including query encoding and search latencies
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IR 1years
2024 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
VisRAG achieves 20-40% better end-to-end performance than text-based RAG by directly embedding and retrieving document images with VLMs.