Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.
arXiv preprint arXiv:2509.24878 , year =
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
A conditional U-Net with weather conditioning at the bottleneck plus pre- and post-processing translates aerial RGB to thermal images, reaching PSNR 14.55, SSIM 0.81, LPIPS 0.17 and outperforming the ThermalGen baseline on a held-out test set.
citing papers explorer
-
Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception
Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.
-
Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
-
A Conditional U-Net Pipeline with Pre- and Post-Processing for Aerial RGB-to-Thermal Image Translation
A conditional U-Net with weather conditioning at the bottleneck plus pre- and post-processing translates aerial RGB to thermal images, reaching PSNR 14.55, SSIM 0.81, LPIPS 0.17 and outperforming the ThermalGen baseline on a held-out test set.