nuScenes provides the first public autonomous-driving dataset that includes synchronized 360-degree data from cameras, radars, and lidar together with 3D bounding-box annotations across 1000 scenes.
Deep residual learning for image recognition
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
DiTs achieve SOTA FID of 2.27 on ImageNet 256x256 by scaling transformer-based latent diffusion models, with performance improving consistently as Gflops increase.
PRIX presents an efficient camera-only planner with a novel CaRT module that matches larger multimodal models on NavSim and nuScenes while reducing model size and inference time.
NFL is a buffer-free continual learning framework that decomposes networks, applies stepwise freezing with knowledge distillation, and adds an auto-encoder in NFL+ to match replay-based performance on image benchmarks while using only 2.53% of the memory.
Learn2Synth optimizes data synthesis parameters with hypergradients to train segmentation networks solely on synthetic brain images that generalize to real scans.
MM-REACT uses textual prompts to let ChatGPT collaborate with external vision experts for zero-shot multimodal reasoning and action on advanced visual tasks.
YOLOX exceeds prior YOLO models by adopting anchor-free detection, decoupled heads, and SimOTA assignment to reach 50.0% AP on COCO for the large variant.
citing papers explorer
-
nuScenes: A multimodal dataset for autonomous driving
nuScenes provides the first public autonomous-driving dataset that includes synchronized 360-degree data from cameras, radars, and lidar together with 3D bounding-box annotations across 1000 scenes.
-
Scalable Diffusion Models with Transformers
DiTs achieve SOTA FID of 2.27 on ImageNet 256x256 by scaling transformer-based latent diffusion models, with performance improving consistently as Gflops increase.
-
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
PRIX presents an efficient camera-only planner with a novel CaRT module that matches larger multimodal models on NavSim and nuScenes while reducing model size and inference time.
-
No Forgetting Learning: Buffer-free Continual Learning Classification
NFL is a buffer-free continual learning framework that decomposes networks, applies stepwise freezing with knowledge distillation, and adds an auto-encoder in NFL+ to match replay-based performance on image benchmarks while using only 2.53% of the memory.
-
Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation
Learn2Synth optimizes data synthesis parameters with hypergradients to train segmentation networks solely on synthetic brain images that generalize to real scans.
-
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
MM-REACT uses textual prompts to let ChatGPT collaborate with external vision experts for zero-shot multimodal reasoning and action on advanced visual tasks.
-
YOLOX: Exceeding YOLO Series in 2021
YOLOX exceeds prior YOLO models by adopting anchor-free detection, decoupled heads, and SimOTA assignment to reach 50.0% AP on COCO for the large variant.