CrowdHuman: A Benchmark for Detecting Human in a Crowd
read the original abstract
Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human in highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented in current human detection benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. There are a total of $470K$ human instances from the train and validation subsets, and $~22.6$ persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. Baseline performance of state-of-the-art detection frameworks on CrowdHuman is presented. The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.
This paper has not been read by Pith yet.
Forward citations
Cited by 8 Pith papers
-
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
DETR-ViP boosts visual-prompted detection performance by learning globally discriminative prompts through integration and distillation on top of image-text contrastive learning, with a selective fusion step for stability.
-
Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark
TinySet-9M dataset and DEAL point-prompted framework deliver 31.4% relative AP75 gain over supervised baselines for small object detection with one click at inference and generalization to unseen categories.
-
Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues
Language-guided semantic cues from MLLM visual pipelines, steered by text embeddings, refine object semantics and boost grounding accuracy against occlusion and small objects.
-
SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks
SpikeDet reaches 52.2% AP on COCO 2017 with spiking networks by optimizing firing patterns via MDSNet and SMFM, using half the energy of prior SNN detectors.
-
InsHuman: Towards Natural and Identity-Preserving Human Insertion
InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.
-
NOOUGAT: Towards Unified Online and Offline Multi-Object Tracking
NOOUGAT unifies online and offline multi-object tracking with a GNN that processes non-overlapping subclips fused by an Autoregressive Long-term Tracking layer, reporting SOTA gains on DanceTrack, SportsMOT, and MOT20.
-
Adapted Center and Scale Prediction: More Stable and More Accurate
Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
-
Attention Is not Everything: Efficient Alternatives for Vision
A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.