ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning for VRDU.
Jacob Cohen
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
unclear 1representative citing papers
Introduces ProcedureVQA benchmark and Chain-of-Procedure framework that improves VLM next-step prediction in procedures by up to 13% over baselines.
DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.
Unveil proposes a visual-textual embedding model for multi-modal documents that is distilled into an efficient visual-only retriever.
citing papers explorer
No citing papers match the current filters.