Current MLLMs show weak performance on small object understanding tasks, but fine-tuning with the new SOU-Train dataset measurably improves their capabilities.
IEEE Transactions on Image Processing32, 364–376 (2022)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A monotonic diff-based scale loss and learnable Gaussian convolution with adaptive pinwheel masking improve mIoU, Pd, and Fa for infrared small target detection on three benchmarks.
citing papers explorer
-
Can Multimodal Large Language Models Truly Understand Small Objects?
Current MLLMs show weak performance on small object understanding tasks, but fine-tuning with the new SOU-Train dataset measurably improves their capabilities.
-
Revisiting the Scale Loss Function and Gaussian-Shape Convolution for Infrared Small Target Detection
A monotonic diff-based scale loss and learnable Gaussian convolution with adaptive pinwheel masking improve mIoU, Pd, and Fa for infrared small target detection on three benchmarks.