Jacob Cohen

Yew Ken Chia, Liying Cheng, Hou Pong Chan, Chaoqun Liu, Maojia Song, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing · 2024 · arXiv 2411.06176

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning for VRDU.

Chain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QA

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

Introduces ProcedureVQA benchmark and Chain-of-Procedure framework that improves VLM next-step prediction in procedures by up to 13% over baselines.

DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.

Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval

cs.CL · 2026-05-23 · unverdicted · novelty 4.0

Unveil proposes a visual-textual embedding model for multi-modal documents that is distilled into an efficient visual-only retriever.

citing papers explorer

Showing 4 of 4 citing papers.

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction cs.CV · 2026-04-26 · unverdicted · none · ref 4
ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning for VRDU.
Chain-of-Procedure: Hierarchical Visual-Language Reasoning for Procedural QA cs.CL · 2026-05-14 · unverdicted · none · ref 10
Introduces ProcedureVQA benchmark and Chain-of-Procedure framework that improves VLM next-step prediction in procedures by up to 13% over baselines.
DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark cs.CV · 2026-05-28 · unverdicted · none · ref 12
DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.
Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval cs.CL · 2026-05-23 · unverdicted · none · ref 2
Unveil proposes a visual-textual embedding model for multi-modal documents that is distilled into an efficient visual-only retriever.

Jacob Cohen

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer