ImageAuditor is the first MIA for IRAG that achieves over 80% AUROC with four queries by using reward-guided policy optimization for cross-modal retrieval and task-specific prompting for signal extraction.
ImageRAG: Dynamic Image Retrieval for Reference -Guided Image Generation
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
SAR-RAG augments an MLLM baseline with semantic retrieval of similar known SAR target images, yielding measurable gains in classification accuracy and dimension regression.
RAVA retrieves view-consistent target-subject images via a learned cross-instance embedding and LogDet subset selection, then uses them in a multi-reference generator to improve cross-subject viewpoint alignment.
citing papers explorer
-
ImageAuditor: Membership Inference Attack against Image-based Retrieval-Augmented Generation
ImageAuditor is the first MIA for IRAG that achieves over 80% AUROC with four queries by using reward-guided policy optimization for cross-modal retrieval and task-specific prompting for signal extraction.
-
ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding
ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
-
SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation
SAR-RAG augments an MLLM baseline with semantic retrieval of similar known SAR target images, yielding measurable gains in classification accuracy and dimension regression.
-
RAVA: Retrieval-Augmented Viewpoint Alignment for Subject-Driven Image Generation
RAVA retrieves view-consistent target-subject images via a learned cross-instance embedding and LogDet subset selection, then uses them in a multi-reference generator to improve cross-subject viewpoint alignment.