MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
hub Mixed citations
MONAI: An open-source framework for deep learning in healthcare
Mixed citation behavior. Most common role is background (40%).
abstract
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
VISTA is a source-free TTA framework for multi-sequence MRI segmentation that uses inter-sequence spectral/patch interventions and cross-view variance gating to handle modality-interaction shifts, reporting Dice gains of 1.89% and 2.82% on SSA and PED cohorts.
SegWithU treats uncertainty as perturbation energy via rank-1 probes in a post-hoc head for efficient single-pass risk-aware medical image segmentation, outperforming other single-forward-pass methods on ACDC, BraTS2024, and LiTS.
A sequential diffusion framework generates controllable abdominal anatomies with a Volume Control Scalar that decouples organ size from body habitus, achieving Dice scores around 0.83 and reducing distributional mismatch by 73.6% in a hepatomegaly example.
Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.
MP-ViT uses dual transformers and cross-attention on axial and sagittal MRI to classify hemorrhages, reporting 5.5% higher AUC than standard ViT and 1.8% higher than CNNs on a dataset of 12,869 subjects.
Tabular clinical data guides contrastive learning on cardiac MR images to build better visual representations by identifying patient similarities, outperforming image-only augmentation on downstream disease prediction tasks.
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
A uniform benchmark across 77 experiments finds SRGAN superior to latent diffusion models for 3D medical image translation, with synthetic volumes indistinguishable from real ones in a 17-physician Turing test.
Tumor-aware augmentation and anisotropic cropping improve CT-to-MRI transfer for rectal cancer segmentation in hierarchical transformers by reducing attention dilution from padding and enhancing feature adaptation.
SIAM achieves state-of-the-art whole-head MRI segmentation of 16 structures including extra-cerebral tissues by training on synthetic data from just six manual templates, matching or exceeding prior methods on 301 scans across eight heterogeneous datasets.
GeoSAE extracts a compact, interpretable feature set from frozen brain MRI foundation models that predicts MCI-to-AD conversion (AUC 0.746) with age-deconfounded annotations and replicates across cohorts.
ESICA delivers state-of-the-art accuracy on a five-modality 3D medical segmentation benchmark while offering a compact variant with far fewer parameters.
A 4D diffusion generative model learns topology-preserving spatiotemporal deformations to synthesize realistic longitudinal brain anatomy trajectories in neurodegenerative diseases from sparse follow-up scans.
DAGMaN uses co-distilled attention-guided masked image modeling with a noisy teacher to enable effective self-supervised pretraining on medical images by selective masking of co-occurring patches and maintenance of attention head diversity, with demonstrations on nodule classification, immunotherapy
Neuro-Oracle distills longitudinal MRI changes into trajectory vectors via a 3D Siamese encoder, retrieves similar cases, and generates LLM-based prognoses, achieving AUC 0.834-0.905 on a resection-type proxy task versus 0.793 for single-timepoint baseline.
SUMI distills photon-counting CT quality into routine chest CT by learning to reverse clinically validated acquisition degradations, yielding 15-20% gains in image metrics, better radiologist utility, and up to 15% higher lesion detection sensitivity.
The paper reports a new annotated 7T ToF MRA dataset for small vessel segmentation and shows that top deep learning methods reach Dice scores of 0.838 on internal test data and 0.716 on an external secret dataset.
FlexiCT provides CT foundation models via agglomerative pretraining on 266227 volumes from 56 datasets that match or exceed task-specific models on five task families while organizing embeddings along tumor-stage gradients.
Semi-MedRef introduces T-PatchMix, PosAug, and ITCL within a teacher-student SSL setup to preserve image-text alignment under augmentation for medical referring segmentation on QaTa-COV19 and MosMedData+.
NeuroAgent uses a hierarchical LLM agent framework with Generate-Execute-Validate loops to automate neuroimaging preprocessing, reaching 84.8% end-to-end correctness and 0.9518 AUC for Alzheimer's classification on 1470 ADNI subjects using four modalities.
The autoPET3 challenge finds that leading AI models reach a mean Dice score of 0.66 for multitracer PET/CT lesion segmentation, with compositional generalization to unseen tracer-center pairs remaining an open problem driven by volume overestimation and case heterogeneity.
A latent diffusion model jointly synthesizes MRI volumes and mixed-type tabular clinical data in a shared space via cross-attention and separate decoders after VAE fusion.
MIGF improves multi-modal prostate MRI segmentation robustness via modality-isolated streams and dropout training, yielding ranking score gains of 2.8-13.4% across backbones and better tolerance to degraded diffusion sequences on PI-CAI and Prostate158.
citing papers explorer
-
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
-
VISTA: Variance-Gated Inter-Sequence Test-Time Adaptation for Multi-Sequence MRI Segmentation
VISTA is a source-free TTA framework for multi-sequence MRI segmentation that uses inter-sequence spectral/patch interventions and cross-view variance gating to handle modality-interaction shifts, reporting Dice gains of 1.89% and 2.82% on SSA and PED cohorts.
-
SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation
SegWithU treats uncertainty as perturbation energy via rank-1 probes in a post-hoc head for efficient single-pass risk-aware medical image segmentation, outperforming other single-forward-pass methods on ACDC, BraTS2024, and LiTS.
-
AbdomenGen: Sequential Volume-Conditioned Diffusion Framework for Abdominal Anatomy Generation
A sequential diffusion framework generates controllable abdominal anatomies with a Volume Control Scalar that decouples organ size from body habitus, achieving Dice scores around 0.83 and reducing distributional mismatch by 73.6% in a hepatomegaly example.
-
Camyla: Scaling Autonomous Research in Medical Image Segmentation
Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.
-
Multi-Plane Vision Transformer for Hemorrhage Classification Using Axial and Sagittal MRI Data
MP-ViT uses dual transformers and cross-attention on axial and sagittal MRI to classify hemorrhages, reporting 5.5% higher AUC than standard ViT and 1.8% higher than CNNs on a dataset of 12,869 subjects.
-
Tables Guide Vision: Learning to See the Heart through Tabular Data
Tabular clinical data guides contrastive learning on cardiac MR images to build better visual representations by identifying patient similarities, outperforming image-only augmentation on downstream disease prediction tasks.
-
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
-
Cross Modality Image Translation In Medical Imaging Using Generative Frameworks
A uniform benchmark across 77 experiments finds SRGAN superior to latent diffusion models for 3D medical image translation, with synthetic volumes indistinguishable from real ones in a 17-physician Turing test.
-
Tumor-aware augmentation with task-guided attention analysis improves rectal cancer segmentation from magnetic resonance images
Tumor-aware augmentation and anisotropic cropping improve CT-to-MRI transfer for rectal cancer segmentation in hierarchical transformers by reducing attention dilution from padding and enhancing feature adaptation.
-
SIAM: Head and Brain MRI Segmentation from Few High-Quality Templates via Synthetic Training
SIAM achieves state-of-the-art whole-head MRI segmentation of 16 structures including extra-cerebral tissues by training on synthetic data from just six manual templates, matching or exceeding prior methods on 301 scans across eight heterogeneous datasets.
-
GeoSAE: Geometric Prior-Guided Layer-Wise Sparse Autoencoder Annotation of Brain MRI Foundation Models
GeoSAE extracts a compact, interpretable feature set from frozen brain MRI foundation models that predicts MCI-to-AD conversion (AUC 0.746) with age-deconfounded annotations and replicates across cohorts.
-
ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation
ESICA delivers state-of-the-art accuracy on a five-modality 3D medical segmentation benchmark while offering a compact variant with far fewer parameters.
-
Generative Modeling of Neurodegenerative Brain Anatomy with 4D Longitudinal Diffusion Model
A 4D diffusion generative model learns topology-preserving spatiotemporal deformations to synthesize realistic longitudinal brain anatomy trajectories in neurodegenerative diseases from sparse follow-up scans.
-
Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images
DAGMaN uses co-distilled attention-guided masked image modeling with a noisy teacher to enable effective self-supervised pretraining on medical images by selective masking of co-occurring patches and maintenance of attention head diversity, with demonstrations on nodule classification, immunotherapy
-
Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis
Neuro-Oracle distills longitudinal MRI changes into trajectory vectors via a 3D Siamese encoder, retrieves similar cases, and generates LLM-based prognoses, achieving AUC 0.834-0.905 on a resection-type proxy task versus 0.793 for single-timepoint baseline.
-
Distilling Photon-Counting CT into Routine Chest CT through Clinically Validated Degradation Modeling
SUMI distills photon-counting CT quality into routine chest CT by learning to reverse clinically validated acquisition degradations, yielding 15-20% gains in image metrics, better radiologist utility, and up to 15% higher lesion detection sensitivity.
-
SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms
The paper reports a new annotated 7T ToF MRA dataset for small vessel segmentation and shows that top deep learning methods reach Dice scores of 0.838 on internal test data and 0.716 on an external secret dataset.
-
Universal CT Representations from Anatomy to Disease Phenotype through Agglomerative Pretraining
FlexiCT provides CT foundation models via agglomerative pretraining on 266227 volumes from 56 datasets that match or exceed task-specific models on five task families while organizing embeddings along tumor-stage gradients.
-
Semi-MedRef: Semi-Supervised Medical Referring Image Segmentation with Cross-Modal Alignment
Semi-MedRef introduces T-PatchMix, PosAug, and ITCL within a teacher-student SSL setup to preserve image-text alignment under augmentation for medical referring segmentation on QaTa-COV19 and MosMedData+.
-
NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
NeuroAgent uses a hierarchical LLM agent framework with Generate-Execute-Validate loops to automate neuroimaging preprocessing, reaching 84.8% end-to-end correctness and 0.9518 AUC for Alzheimer's classification on 1470 ADNI subjects using four modalities.
-
The autoPET3 Challenge: Automated Lesion Segmentation in Whole-Body PET/CT $\unicode{x2013}$ Multitracer Multicenter Generalization
The autoPET3 challenge finds that leading AI models reach a mean Dice score of 0.66 for multitracer PET/CT lesion segmentation, with compositional generalization to unseen tracer-center pairs remaining an open problem driven by volume overestimation and case heterogeneity.
-
Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention
A latent diffusion model jointly synthesizes MRI volumes and mixed-type tabular clinical data in a shared space via cross-attention and separate decoders after VAE fusion.
-
Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation
MIGF improves multi-modal prostate MRI segmentation robustness via modality-isolated streams and dropout training, yielding ranking score gains of 2.8-13.4% across backbones and better tolerance to degraded diffusion sequences on PI-CAI and Prostate158.
-
Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It
MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.
-
Submanifold Sparse Convolutional Networks for Automated 3D Segmentation of Kidneys and Kidney Tumours in Computed Tomography
A two-stage sparse convolutional network pipeline for native high-resolution 3D kidney and tumor segmentation in CT that matches top Dice scores while reducing VRAM and runtime versus nnU-Net and SegVol.
-
One Sequence to Segment Them All: Efficient Data Augmentation for CT and MRI Cross-Domain 3D Spine Segmentation
Targeted data augmentations let single-sequence 3D spine segmentation models generalize to seven unseen CT and MRI datasets with 155% average Dice gain and almost no in-domain loss.
-
AMO-ENE: Attention-based Multi-Omics Fusion Model for Outcome Prediction in Extra Nodal Extension and HPV-associated Oropharyngeal Cancer
An attention-based fusion model combining semi-supervised CT segmentation, radiomics, and clinical features predicts metastatic recurrence, overall survival, and disease-free survival in HPV+ oropharyngeal cancer with AUCs of 88.2%, 79.2%, and 78.1% on an internal cohort of 397 patients.
-
GPAFormer: Graph-guided Patch Aggregation Transformer for Efficient 3D Medical Image Segmentation
GPAFormer with 1.81M parameters reports top Dice scores on BTCV (75.70%), Synapse (81.20%), ACDC (89.32%), and BraTS (82.74%) while running inference in under one second on consumer GPUs.
-
PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction
PR3DICTR is a new open-access modular framework for 3D medical image classification and outcome prediction that works with as little as two lines of code.
-
In search of truth: Evaluating concordance of AI-based anatomy segmentation models
A harmonization framework enables comparison of six AI segmentation models on 31 structures in NLST CT scans, revealing strong agreement for lungs but invalid outputs for some vertebrae and ribs.
-
Improving Prostate Gland Segmentation Using Transformer based Architectures
SwinUNETR outperforms 3D UNet with Dice scores up to 0.902 on larger gland subsets using mixed-cohort five-fold training, while UNETR performs poorly on the same subsets.
-
Dante: An Open Source Model Pre-Training and Fine-Tuning Tool for the Dafne Federated Framework for Medical Image Segmentation
Dante is a new open-source backend for the Dafne ecosystem that implements configurable training from scratch, layer freezing, and channel-wise LoRA for medical image segmentation, with validation showing faster convergence and higher Dice scores in cross-domain MRI tasks.