IMCBench is a new benchmark for image-grounded multi-turn medical conversations that evaluates eight multimodal LLMs on safety, accuracy, and uncertainty, finding Claude Opus highest overall but safety drops for malignant and rare conditions.
hub
The HAM10000 dataset, a large collection of multi -source dermatoscopic images of common pigmented skin lesions
15 Pith papers cite this work, alongside 3,113 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
DermAgent orchestrates seven vision-language tools in a Plan-Execute-Reflect loop with dual-modality retrieval from 413k cases and a critic module to outperform GPT-4o by 17.6% in zero-shot dermatological diagnosis accuracy.
MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.
Rough-set analysis finds 16.4% of 305 concept profiles in Derm7pt inconsistent (306 images), capping hard CBM accuracy at 92.1%; symmetric filtering produces a 705-image consistent benchmark where EfficientNet-B5 reaches 0.90 label accuracy.
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
The authors introduce predicted-weighted balanced accuracy (pBA), a utility-weighted evaluation metric that uses predicted subconcept posteriors to reduce bias from within-class heterogeneity in imbalanced data.
FSS-TIs models cross-domain few-shot segmentation as an ODE process with Fourier-based spectral perturbations to create domain-agnostic features and enable effective fine-tuning on limited support samples.
MARVEL introduces a multi-expert NvMF-based system with an outlier expert that reduces FPR95 in OOD detection on medical datasets by 8-37%.
FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.
IViT applies quadratic programming to a pre-trained Vision Transformer with a multi-objective loss, achieving 93.80% accuracy on six skin disease datasets (0.21% below baseline) while reducing feature redundancy by 29.5% and producing clinically consistent activations.
Cascade classification improves macro F1 over single-stage for some models by allowing sensitivity control but reveals a large generalization gap on external clinical data.
Describes a methodology and the resulting dataset of 1,026 dermoscopic images with structured metadata and verified diagnostic labels for medical informatics research.
MedGemma 1.5 4B reports absolute gains of 11% on 3D MRI classification, 3% on 3D CT, 47% macro F1 on pathology slides, 35% IoU on anatomical localization, and 5-22% on clinical QA tasks over MedGemma 1.
Fine-tuned MedGemma outperforms untuned GPT-4 in zero-shot medical image disease classification, achieving 80.37% versus 69.58% mean test accuracy with higher sensitivity for cancer and pneumonia.
Post-hoc normalizing flows for OOD detection in medical imaging achieve 84.61% AUROC on MedOOD and 93.8% on MedMNIST, outperforming ViM, MDS, and ReAct.
citing papers explorer
No citing papers match the current filters.