Histopathology Multi-modal Embedding for Pathology Composed Retrieval

Hehuan Ma; Junzhou Huang; Qifeng Zhou; Saiyang Na; Thao M. Dang; Wenliang Zhong; Yuzhi Guo

arxiv: 2502.07221 · v4 · pith:BCLF7BBEnew · submitted 2025-02-11 · 💻 cs.CV

Histopathology Multi-modal Embedding for Pathology Composed Retrieval

Qifeng Zhou , Wenliang Zhong , Thao M. Dang , Hehuan Ma , Saiyang Na , Yuzhi Guo , Junzhou Huang This is my paper

classification 💻 cs.CV

keywords pathologyretrievalcomposedhomiemismatchmodelstextbfclinical

0 comments

read the original abstract

To overcome the black-box nature of predictive AI and the hallucination risks of generative models, retrieval-based models offer an interpretable, evidence-based paradigm for pathology clinical workflow. However, real-world clinical queries are inherently interleaved (e.g., pathology images and text). Current dual-encoders suffer from an \textbf{Architectural Mismatch}, lacking the mechanism to fuse such composed queries. To address this, we formalize the task of Pathology Composed Retrieval (PCR). While Multimodal Large Language Models (MLLMs) offer deep-fusion capabilities, directly applying them exposes a \textbf{Task Mismatch} and a \textbf{Domain Mismatch}. To resolve these challenges, we propose HOMIE, a model-agnostic adaptation framework that transforms any generative MLLM into a specialized pathology retrieval expert. Evaluated on our newly introduced PCR Benchmark, a lightweight 2B-parameter HOMIE variant substantially outperforms existing paradigms, surpassing specialized 7B pathology MLLMs and dual-encoders by large margins on composed retrieval, while maintaining strong performance on traditional simple retrieval. The project page is available at https://qfchou.github.io/HOMIE_page/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mitigating Batch Effects in Histopathology via Language-Mediated Robust Embedding Generation
cs.CV 2026-06 unverdicted novelty 5.0

GLMP generates robust pathology embeddings by routing histology images through an intermediate textual representation produced by general-purpose MLLMs to mitigate batch effects.
PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow
cs.AI 2026-05 unverdicted novelty 5.0

PathoSage is a three-stage framework using Structured Evidence Deliberation and a Beta-Bernoulli experience system to improve patch-level pathology reasoning by mitigating hallucinations and tool conflicts.