Mean pooling and multi-window RGB encoding optimize vision-language performance on CT enterography, with retrieval-augmented generation substantially improving automated report severity accuracy over fine-tuning alone.
Self-training with noisy student improves imagenet classification.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Representation geometry shapes task performance in vision-language modeling for CT enterography
Mean pooling and multi-window RGB encoding optimize vision-language performance on CT enterography, with retrieval-augmented generation substantially improving automated report severity accuracy over fine-tuning alone.