pith. sign in

arxiv: 1807.00601 · v1 · pith:AJT4SICEnew · submitted 2018-07-02 · 💻 cs.CV

Crowd Counting using Deep Recurrent Spatial-Aware Network

classification 💻 cs.CV
keywords networkcrowdrecurrentrefinementspatial-awarechallengingcountingdataset
0
0 comments X
read the original abstract

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera's perspective that causes huge appearance variations in people's scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12% on the largest dataset WorldExpo'10 and 22.8% on the most challenging dataset UCF_CC_50.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Locality-constrained Spatial Transformer Network for Video Crowd Counting

    cs.CV 2019-07 unverdicted novelty 6.0

    LSTN combines CNN density estimation per frame with a locality-constrained spatial transformer to relate density maps across neighboring video frames for crowd counting and introduces a new 15K-frame video dataset.