hub Canonical reference

arXiv preprint arXiv:2404.00578 (2024)

· 2024 · arXiv 2404.00578

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 1

citation-polarity summary

background 4 baseline 1

representative citing papers

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

cs.CV · 2026-05-19 · accept · novelty 8.0

NeuroQA is a large-scale 3D brain MRI visual question answering benchmark with verified image-grounded QA pairs, multi-domain coverage, and baseline evaluations showing current models lag behind text-only performance.

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

cs.CV · 2026-05-10 · accept · novelty 8.0

DeepTumorVQA is a new stage-wise 3D CT VQA benchmark showing that quantitative measurement is the main failure point for current medical VLMs and that tool augmentation substantially improves later reasoning stages.

Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

CT-SpatialVQA benchmark shows 3D medical VLMs achieve only 34% average accuracy on semantic-spatial reasoning tasks in CT volumes, often below random chance.

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

cs.CV · 2026-05-07 · conditional · novelty 7.0

Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.

Agentic Large Language Models for Training-Free Neuro-Radiological Image Analysis

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

Agentic LLMs autonomously execute complex neuro-radiological workflows like glioma segmentation and multi-timepoint response assessment by directing off-the-shelf tools, without any model training.

IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation

cs.CV · 2026-01-06 · conditional · novelty 7.0

IBISAgent enables MLLMs to perform iterative pixel-level visual reasoning for biomedical object referring and segmentation via text-based clicks and agentic RL, outperforming prior SOTA methods without model modifications.

Segmentation, Detection and Explanation: A Unified Framework for CT Appearance Reasoning

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

A unified autoregressive vision-language framework integrates segmentation, detection, and appearance reasoning for CT images via task-routing tokens and progressive refinement, with gains on public benchmarks.

CA-GCL: Cross-Anatomy Global-Local Contrastive Learning for Robust 3D Medical Image Understanding

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

CA-GCL adds global contrastive separation and clinical text augmentation to fine-grained vision-language pretraining, reducing textual embedding collapse and prompt variance in 3D medical image tasks.

RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.

MedScribe: Clinically Grounded CT Reporting through Agentic Workflows

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

MedScribe reformulates CT radiology reporting as an agentic evidence-acquisition workflow using LLM-invoked diagnostic tools and pathology-aligned retrieval, yielding higher clinical accuracy and consistency than standard VLMs on CT-RATE and RadChestCT.

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

cs.AI · 2026-04-16 · unverdicted · novelty 6.0

RadAgent generates stepwise, tool-augmented chest CT reports with traceable decisions, improving accuracy, robustness, and adding a 37% faithfulness score absent in standard 3D VLMs.

Representation geometry shapes task performance in vision-language modeling for CT enterography

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Mean pooling and multi-window RGB encoding optimize vision-language performance on CT enterography, with retrieval-augmented generation substantially improving automated report severity accuracy over fine-tuning alone.

Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis

cs.CV · 2026-04-11 · unverdicted · novelty 6.0

Transferring a 2D MLLM to 3D CT inputs via parameter reuse, a Text-Guided Hierarchical MoE framework, and two-stage training yields better performance than prior 3D medical MLLMs on medical report generation and visual question answering.

Learning Robust Visual Features in Computed Tomography Enables Efficient Transfer Learning for Clinical Tasks

cs.CV · 2026-04-05 · conditional · novelty 6.0

VoxelFM learns robust 3D CT visual features via DINO self-distillation that transfer effectively to seven clinical task categories using frozen backbones and lightweight heads, outperforming prior CT foundation models even on report generation.

Visual Instruction-Finetuned Language Model for Versatile Brain MR Image Tasks

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

LLaBIT is a single instruction-finetuned LLM that performs report generation, VQA, segmentation, and translation on brain MRI images while outperforming task-specific models.

An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation

eess.IV · 2026-05-20 · unverdicted · novelty 5.0

A multi-center whole-body FDG PET/CT foundation model with early fusion and masked autoencoding pretraining achieves label-efficient tumor segmentation on downstream tasks.

Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

TIF-GRPO uses integral feedback on pseudo-temporal trajectories to regulate anatomy-aware rewards in RL for clinical faithfulness in volumetric CT analysis.

M3Net: A Macro-to-Meso-to-Micro Clinical-inspired Hierarchical 3D Network for Pulmonary Nodule Classification

cs.CV · 2026-05-12 · conditional · novelty 5.0

M3Net achieves state-of-the-art accuracies of 86.96% on LIDC-IDRI and 84.24% on USTC-FHLN for pulmonary nodule classification using a hierarchical multi-scale 3D network with cross-scale consistency.

citing papers explorer

Showing 18 of 18 citing papers.

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding cs.CV · 2026-05-19 · accept · none · ref 73
NeuroQA is a large-scale 3D brain MRI visual question answering benchmark with verified image-grounded QA pairs, multi-domain coverage, and baseline evaluations showing current models lag behind text-only performance.
DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents cs.CV · 2026-05-10 · accept · none · ref 7
DeepTumorVQA is a new stage-wise 3D CT VQA benchmark showing that quantitative measurement is the main failure point for current medical VLMs and that tool augmentation substantially improves later reasoning stages.
Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models cs.CV · 2026-05-09 · unverdicted · none · ref 2
CT-SpatialVQA benchmark shows 3D medical VLMs achieve only 34% average accuracy on semantic-spatial reasoning tasks in CT volumes, often below random chance.
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs cs.CV · 2026-05-07 · conditional · none · ref 2
Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
Agentic Large Language Models for Training-Free Neuro-Radiological Image Analysis cs.CV · 2026-04-17 · unverdicted · none · ref 3
Agentic LLMs autonomously execute complex neuro-radiological workflows like glioma segmentation and multi-timepoint response assessment by directing off-the-shelf tools, without any model training.
IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation cs.CV · 2026-01-06 · conditional · none · ref 1
IBISAgent enables MLLMs to perform iterative pixel-level visual reasoning for biomedical object referring and segmentation via text-based clicks and agentic RL, outperforming prior SOTA methods without model modifications.
Segmentation, Detection and Explanation: A Unified Framework for CT Appearance Reasoning cs.CV · 2026-05-15 · unverdicted · none · ref 37
A unified autoregressive vision-language framework integrates segmentation, detection, and appearance reasoning for CT images via task-routing tokens and progressive refinement, with gains on public benchmarks.
CA-GCL: Cross-Anatomy Global-Local Contrastive Learning for Robust 3D Medical Image Understanding cs.CV · 2026-05-13 · unverdicted · none · ref 1
CA-GCL adds global contrastive separation and clinical text augmentation to fine-grained vision-language pretraining, reducing textual embedding collapse and prompt variance in 3D medical image tasks.
RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology cs.CV · 2026-05-11 · unverdicted · none · ref 10
RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.
MedScribe: Clinically Grounded CT Reporting through Agentic Workflows cs.CV · 2026-05-03 · unverdicted · none · ref 15
MedScribe reformulates CT radiology reporting as an agentic evidence-acquisition workflow using LLM-invoked diagnostic tools and pathology-aligned retrieval, yielding higher clinical accuracy and consistency than standard VLMs on CT-RATE and RadChestCT.
RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography cs.AI · 2026-04-16 · unverdicted · none · ref 1
RadAgent generates stepwise, tool-augmented chest CT reports with traceable decisions, improving accuracy, robustness, and adding a 37% faithfulness score absent in standard 3D VLMs.
Representation geometry shapes task performance in vision-language modeling for CT enterography cs.CV · 2026-04-14 · unverdicted · none · ref 13
Mean pooling and multi-window RGB encoding optimize vision-language performance on CT enterography, with retrieval-augmented generation substantially improving automated report severity accuracy over fine-tuning alone.
Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis cs.CV · 2026-04-11 · unverdicted · none · ref 11
Transferring a 2D MLLM to 3D CT inputs via parameter reuse, a Text-Guided Hierarchical MoE framework, and two-stage training yields better performance than prior 3D medical MLLMs on medical report generation and visual question answering.
Learning Robust Visual Features in Computed Tomography Enables Efficient Transfer Learning for Clinical Tasks cs.CV · 2026-04-05 · conditional · none · ref 10
VoxelFM learns robust 3D CT visual features via DINO self-distillation that transfer effectively to seven clinical task categories using frozen backbones and lightweight heads, outperforming prior CT foundation models even on report generation.
Visual Instruction-Finetuned Language Model for Versatile Brain MR Image Tasks cs.CV · 2026-04-03 · unverdicted · none · ref 3
LLaBIT is a single instruction-finetuned LLM that performs report generation, VQA, segmentation, and translation on brain MRI images while outperforming task-specific models.
An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation eess.IV · 2026-05-20 · unverdicted · none · ref 26
A multi-center whole-body FDG PET/CT foundation model with early fusion and masked autoencoding pretraining achieves label-efficient tumor segmentation on downstream tasks.
Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis cs.CV · 2026-05-19 · unverdicted · none · ref 10
TIF-GRPO uses integral feedback on pseudo-temporal trajectories to regulate anatomy-aware rewards in RL for clinical faithfulness in volumetric CT analysis.
M3Net: A Macro-to-Meso-to-Micro Clinical-inspired Hierarchical 3D Network for Pulmonary Nodule Classification cs.CV · 2026-05-12 · conditional · none · ref 7
M3Net achieves state-of-the-art accuracies of 86.96% on LIDC-IDRI and 84.24% on USTC-FHLN for pulmonary nodule classification using a hierarchical multi-scale 3D network with cross-scale consistency.

arXiv preprint arXiv:2404.00578 (2024)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer