Presents the first large-scale infrared off-road dataset and a flow-free temporal model achieving state-of-the-art freespace detection performance with real-time inference.
Convmae: Masked convolution meets masked autoencoders
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4verdicts
UNVERDICTED 4representative citing papers
UNIV introduces Patch Cross-modal Contrastive Learning (PCCL) to build a unified semantic feature space for infrared and visible modalities, supported by the new MVIP dataset of 98,992 aligned pairs, with reported gains on infrared segmentation and detection tasks.
Sapiens2 improves pretraining, data scale, and architecture over its predecessor to set new state-of-the-art results on human pose estimation, body-part segmentation, normal estimation, and new tasks like pointmap and albedo estimation.
Survey benchmarks SSL instance discrimination and masked image modeling for object detection, finding instance discrimination suits CNN encoders while MIM suits ViT encoders and custom pre-training, especially for small objects.
citing papers explorer
-
Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark
Presents the first large-scale infrared off-road dataset and a flow-free temporal model achieving state-of-the-art freespace detection performance with real-time inference.
-
UNIV: Unified Foundation Model for Infrared and Visible Modalities
UNIV introduces Patch Cross-modal Contrastive Learning (PCCL) to build a unified semantic feature space for infrared and visible modalities, supported by the new MVIP dataset of 98,992 aligned pairs, with reported gains on infrared segmentation and detection tasks.
-
Sapiens2
Sapiens2 improves pretraining, data scale, and architecture over its predecessor to set new state-of-the-art results on human pose estimation, body-part segmentation, normal estimation, and new tasks like pointmap and albedo estimation.
-
Self-Supervised Learning for Real-World Object Detection: a Survey
Survey benchmarks SSL instance discrimination and masked image modeling for object detection, finding instance discrimination suits CNN encoders while MIM suits ViT encoders and custom pre-training, especially for small objects.