pith. sign in

arxiv: 1805.00123 · v1 · pith:OVODHZJSnew · submitted 2018-04-30 · 💻 cs.CV

CrowdHuman: A Benchmark for Detecting Human in a Crowd

classification 💻 cs.CV
keywords humandatasetcrowdhumandetectionbounding-boxcrowdbaselinedetecting
0
0 comments X
read the original abstract

Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human in highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented in current human detection benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. There are a total of $470K$ human instances from the train and validation subsets, and $~22.6$ persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. Baseline performance of state-of-the-art detection frameworks on CrowdHuman is presented. The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts

    cs.CV 2026-04 unverdicted novelty 7.0

    DETR-ViP boosts visual-prompted detection performance by learning globally discriminative prompts through integration and distillation on top of image-text contrastive learning, with a selective fusion step for stability.

  2. Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark

    cs.CV 2026-04 unverdicted novelty 7.0

    TinySet-9M dataset and DEAL point-prompted framework deliver 31.4% relative AP75 gain over supervised baselines for small object detection with one click at inference and generalization to unseen categories.

  3. Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues

    cs.CV 2026-04 unverdicted novelty 6.0

    Language-guided semantic cues from MLLM visual pipelines, steered by text embeddings, refine object semantics and boost grounding accuracy against occlusion and small objects.

  4. SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks

    cs.CV 2025-01 unverdicted novelty 6.0

    SpikeDet reaches 52.2% AP on COCO 2017 with spiking networks by optimizing firing patterns via MDSNet and SMFM, using half the energy of prior SNN detectors.

  5. InsHuman: Towards Natural and Identity-Preserving Human Insertion

    cs.CV 2026-05 unverdicted novelty 5.0

    InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.

  6. NOOUGAT: Towards Unified Online and Offline Multi-Object Tracking

    cs.CV 2025-09 unverdicted novelty 5.0

    NOOUGAT unifies online and offline multi-object tracking with a GNN that processes non-overlapping subclips fused by an Autoregressive Long-term Tracking layer, reporting SOTA gains on DanceTrack, SportsMOT, and MOT20.

  7. Adapted Center and Scale Prediction: More Stable and More Accurate

    cs.CV 2020-02 unverdicted novelty 4.0

    Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.

  8. Attention Is not Everything: Efficient Alternatives for Vision

    cs.CV 2026-04 unverdicted novelty 3.0

    A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.