hub

Nitesh Methani, Pritha Ganguly, Mitesh M Khapra, and Pratyush Kumar

Zheng Yuan, Haotian Liu, Chunyuan Li, et al · 2025 · arXiv 2504.08748

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

M³Att poisons medical multimodal RAG by pairing covert textual misinformation with query-agnostic visual perturbations that increase retrieval of the bad content, causing LLMs to generate clinically plausible but incorrect responses.

Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Evidence utility is defined as information gain on the model's output distribution, with ranking by gain on a latent helpfulness variable shown equivalent to answer-space utility under mild assumptions, enabling a training-free surrogate framework that outperforms baselines.

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

cs.IR · 2026-04-30 · unverdicted · novelty 7.0

FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.

Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

cs.CL · 2026-02-23 · unverdicted · novelty 7.0

Prune-then-Merge combines adaptive pruning of low-signal patches with hierarchical merging to achieve higher compression rates and better performance than prior single-stage methods in visual document retrieval.

Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

HyperEmo-RAG uses hierarchical hyperbolic embeddings and graph-based evidence injection to outperform prior methods in multimodal emotion recognition.

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

cs.CV · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

MEG-RAG defines a new MEG metric based on Semantic Certainty Anchoring and trains a multimodal reranker to select evidence aligned with ground-truth semantic anchors, yielding higher accuracy and consistency on the M²RAG benchmark.

Towards Long-horizon Agentic Multimodal Search

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.

Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation

cs.CL · 2025-05-28 · unverdicted · novelty 6.0

MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.

AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition

cs.CV · 2026-04-14 · unverdicted · novelty 5.0

AffectAgent deploys a query planner, evidence filter, and emotion generator as collaborative agents trained via MAPPO with shared reward, plus MB-MoE and RAAF modules, to achieve superior multimodal emotion recognition on MER-UniBench.

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

cs.CL · 2026-01-08 · unverdicted · novelty 4.0

Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding

cs.CL · 2025-10-17 · accept · novelty 4.0

A systematic survey of Multimodal RAG for document understanding proposing a taxonomy based on domain, retrieval modality, and granularity while reviewing graph structures, agentic frameworks, datasets, benchmarks, applications, and open challenges.

Automotive Engineering-Centric Agentic AI Workflow Framework

cs.AI · 2026-04-09 · unverdicted · novelty 3.0

The paper presents the Agentic Engineering Intelligence (AEI) framework for modeling automotive engineering workflows as sequential decision processes with AI agent support.

citing papers explorer

Showing 15 of 15 citing papers.

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation cs.CR · 2026-05-11 · unverdicted · none · ref 9
M³Att poisons medical multimodal RAG by pairing covert textual misinformation with query-agnostic visual perturbations that increase retrieval of the bad content, causing LLMs to generate clinically plausible but incorrect responses.
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation cs.CL · 2026-05-13 · unverdicted · none · ref 38
Evidence utility is defined as information gain on the model's output distribution, with ranking by gain on a latent helpfulness variable shown equivalent to answer-space utility under mild assumptions, enabling a training-free surrogate framework that outperforms baselines.
Very Efficient Listwise Multimodal Reranking for Long Documents cs.IR · 2026-05-12 · unverdicted · none · ref 22
ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.
Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval cs.CV · 2026-05-08 · unverdicted · none · ref 23
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG cs.IR · 2026-04-30 · unverdicted · none · ref 37
FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework cs.CL · 2026-02-23 · unverdicted · none · ref 8
Prune-then-Merge combines adaptive pruning of low-signal patches with hierarchical merging to achieve higher compression rates and better performance than prior single-stage methods in visual document retrieval.
Navigating the Emotion Tree: Hierarchical Hyperbolic RAG for Multimodal Emotion Recognition cs.LG · 2026-05-16 · unverdicted · none · ref 55
HyperEmo-RAG uses hierarchical hyperbolic embeddings and graph-based evidence injection to outperform prior methods in multimodal emotion recognition.
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs cs.CV · 2026-05-01 · unverdicted · none · ref 55 · 2 links
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG cs.CL · 2026-04-27 · unverdicted · none · ref 16
MEG-RAG defines a new MEG metric based on Semantic Certainty Anchoring and trains a multimodal reranker to select evidence aligned with ground-truth semantic anchors, yielding higher accuracy and consistency on the M²RAG benchmark.
Towards Long-horizon Agentic Multimodal Search cs.CV · 2026-04-14 · unverdicted · none · ref 9
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation cs.CL · 2025-05-28 · unverdicted · none · ref 26
MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.
AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition cs.CV · 2026-04-14 · unverdicted · none · ref 61
AffectAgent deploys a query planner, evidence filter, and emotion generator as collaborative agents trained via MAPPO with shared reward, plus MB-MoE and RAAF modules, to achieve superior multimodal emotion recognition on MER-UniBench.
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking cs.CL · 2026-01-08 · unverdicted · none · ref 17
Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding cs.CL · 2025-10-17 · accept · none · ref 5
A systematic survey of Multimodal RAG for document understanding proposing a taxonomy based on domain, retrieval modality, and granularity while reviewing graph structures, agentic frameworks, datasets, benchmarks, applications, and open challenges.
Automotive Engineering-Centric Agentic AI Workflow Framework cs.AI · 2026-04-09 · unverdicted · none · ref 8
The paper presents the Agentic Engineering Intelligence (AEI) framework for modeling automotive engineering workflows as sequential decision processes with AI agent support.

Nitesh Methani, Pritha Ganguly, Mitesh M Khapra, and Pratyush Kumar

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer