CT-SpatialVQA benchmark shows 3D medical VLMs achieve only 34% average accuracy on semantic-spatial reasoning tasks in CT volumes, often below random chance.
arXiv preprint arXiv:2501.14548 (2025)
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
JANUS conditions Vision Transformer embeddings on macro-radiomic priors via anatomically guided gating, reaching macro-AUROC 0.88 on an internal test set of 5082 cases and 0.87 on an external set of 2000 cases while improving calibration and reducing high-confidence false positives under domainshift
CA-GCL adds global contrastive separation and clinical text augmentation to fine-grained vision-language pretraining, reducing textual embedding collapse and prompt variance in 3D medical image tasks.
DCP-PD improves macro F1 scores on CT report generation benchmarks and introduces a hierarchical location-aware evaluation protocol that reveals ongoing challenges in pathology spatial grounding.
citing papers explorer
-
Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models
CT-SpatialVQA benchmark shows 3D medical VLMs achieve only 34% average accuracy on semantic-spatial reasoning tasks in CT volumes, often below random chance.
-
JANUS: Anatomy-Conditioned Gating for Robust CT Triage Under Distribution Shift
JANUS conditions Vision Transformer embeddings on macro-radiomic priors via anatomically guided gating, reaching macro-AUROC 0.88 on an internal test set of 5082 cases and 0.87 on an external set of 2000 cases while improving calibration and reducing high-confidence false positives under domainshift
-
CA-GCL: Cross-Anatomy Global-Local Contrastive Learning for Robust 3D Medical Image Understanding
CA-GCL adds global contrastive separation and clinical text augmentation to fine-grained vision-language pretraining, reducing textual embedding collapse and prompt variance in 3D medical image tasks.
-
Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
DCP-PD improves macro F1 scores on CT report generation benchmarks and introduces a hierarchical location-aware evaluation protocol that reveals ongoing challenges in pathology spatial grounding.