Objects as Points

Xingyi Zhou , Dequan Wang , Philipp Kr\"ahenb\"uhl

Authors on Pith no claims yet

classification 💻 cs.CV

keywords objectapproachboundingcenterpointcenternetcocodataset

read the original abstract

Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning
eess.SP 2026-05 unverdicted novelty 7.0

Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.
Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes
cs.CV 2026-04 unverdicted novelty 7.0

SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom ...
FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception
cs.CV 2026-04 unverdicted novelty 7.0

FishRoPE reparameterizes attention mechanisms in fisheye images to use angular separation in spherical coordinates, enabling frozen vision foundation models to achieve state-of-the-art results on 2D detection and BEV ...
DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather
cs.CV 2026-04 unverdicted novelty 7.0

DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset ...
WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects
cs.CV 2026-04 unverdicted novelty 7.0

WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours
cs.CV 2026-05 unverdicted novelty 6.0

FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness
cs.CV 2026-05 unverdicted novelty 6.0

RefCD enables unsupervised category-aware object detection by using feature similarity between predicted objects and unlabeled reference images to guide category learning.
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection
cs.CV 2026-04 unverdicted novelty 6.0

Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
SFFNet: Synergistic Feature Fusion Network With Dual-Domain Edge Enhancement for UAV Image Object Detection
cs.CV 2026-04 unverdicted novelty 6.0

SFFNet uses multi-scale dynamic dual-domain coupling and a synergistic feature pyramid network to reach 36.8 AP on VisDrone and 20.6 AP on UAVDT for UAV object detection.
From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes
cs.CV 2026-03 unverdicted novelty 6.0

L2G-Det detects and segments novel object instances in open scenes by using local template patch matches to generate points that prompt an augmented SAM for global masks.
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
cs.CV 2023-03 accept novelty 6.0

Grounding DINO fuses language and vision via feature enhancer, language-guided query selection, and cross-modality decoder in a DINO backbone, achieving 52.5 AP zero-shot on COCO and a new record of 26.1 AP mean on ODinW.
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
cs.CV 2021-12 conditional novelty 6.0

BEVDet achieves 39.3% mAP and 47.2% NDS on nuScenes val set with a fast BEV-based multi-camera 3D detector that outperforms FCOS3D while using far less compute in its tiny variant.
YOLOX: Exceeding YOLO Series in 2021
cs.CV 2021-07 accept novelty 6.0

YOLOX exceeds prior YOLO models by adopting anchor-free detection, decoupled heads, and SimOTA assignment to reach 50.0% AP on COCO for the large variant.
Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking
cs.CV 2026-05 unverdicted novelty 5.0

TCMP achieves SOTA MOT metrics (HOTA 63.4%, IDF1 65.0%, AssA 49.1%) with 0.014x parameters and 0.05x FLOPs of the previous best method by using a simple dilated TCN regressor.
Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection
cs.CV 2026-04 unverdicted novelty 5.0

Caries-DETR adds tooth-structure query initialization and lesion-aware loss reweighting to DETR, reaching state-of-the-art caries detection on AlphaDent and DentalAI datasets.
Class-Adaptive Cooperative Perception for Multi-Class LiDAR-based 3D Object Detection in V2X Systems
cs.CV 2026-04 unverdicted novelty 5.0

A new class-adaptive fusion architecture improves multi-class LiDAR 3D object detection in V2X cooperative perception by routing small and large objects through attentive pathways and balancing training objectives.
Frozen Vision Transformers for Dense Prediction on Small Datasets: A Case Study in Arrow Localization
cs.CV 2026-04 conditional novelty 4.0

A frozen DINOv3 ViT-L/16 with AnyUp upsampling and lightweight CenterNet heads achieves 0.893 F1 and 1.41 mm localization error on arrow punctures using 48 training images.