Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
LVIS: A dataset for large vocabulary instance segmentation
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4verdicts
UNVERDICTED 4roles
dataset 1polarities
background 1representative citing papers
SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.
Introduces the TSBOW dataset and benchmark for occluded vehicle detection in traffic surveillance under diverse and extreme weather conditions.
A literature review that categorizes deep learning approaches for visual hand gesture recognition, summarizes state-of-the-art methods across tasks, reviews datasets and metrics, and identifies challenges and future directions.
citing papers explorer
-
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
-
SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation
SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.
-
TSBOW -- Traffic Surveillance Benchmark for Occluded Vehicles Under Various Weather Conditions
Introduces the TSBOW dataset and benchmark for occluded vehicle detection in traffic surveillance under diverse and extreme weather conditions.
-
Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions
A literature review that categorizes deep learning approaches for visual hand gesture recognition, summarizes state-of-the-art methods across tasks, reviews datasets and metrics, and identifies challenges and future directions.