SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.
Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Rad-VLSM is a cross-modal two-stage framework that converts semantic guidance from BLIP-2 into box prompts for SAM-based lesion segmentation and then uses the resulting masks as spatial priors in a visual-radiomics fusion head for diagnosis.
citing papers explorer
-
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.
-
Rad-VLSM: A Cross-Modal Framework with Semantics-Assisted Prompting for Medical Segmentation and Diagnosis
Rad-VLSM is a cross-modal two-stage framework that converts semantic guidance from BLIP-2 into box prompts for SAM-based lesion segmentation and then uses the resulting masks as spatial priors in a visual-radiomics fusion head for diagnosis.