RoiMAM integrates a training-free ROI Generation Module with Semantic Selective Suppression and a Text Prompt Enhancer to produce a compact VLM that reports 2 percent and 4.6 percent accuracy gains on SLAKE and PMC-VQA at less than 20 percent the size of MedVInT-TD.
Guid- ing medical vision-language models with diverse visual prompts: Framework design and comprehensive explo- ration of prompt variations,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RoiMAM: Region-of-Interest Medical Attention Model for Efficient Vision-Language Understanding
RoiMAM integrates a training-free ROI Generation Module with Semantic Selective Suppression and a Text Prompt Enhancer to produce a compact VLM that reports 2 percent and 4.6 percent accuracy gains on SLAKE and PMC-VQA at less than 20 percent the size of MedVInT-TD.