Optimising CSRNet with parameter-free attention mechanisms for crowd counting in public transport

Aida Rostamza; Cristina Olaverri-Monreal; Enrico Del Re; Joshua Cherian Varughese

arxiv: 2605.18349 · v1 · pith:J2WO64YAnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

Optimising CSRNet with parameter-free attention mechanisms for crowd counting in public transport

Aida Rostamza , Enrico Del Re , Joshua Cherian Varughese , Cristina Olaverri-Monreal This is my paper

Pith reviewed 2026-05-20 11:11 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords crowd countingparameter-free attentionCSRNetpublic transportdensity estimationoccupancy monitoringShanghaiTechattention mechanisms

0 comments

The pith

Parameter-free attention mechanisms let CSRNet match or exceed the accuracy of parameterized versions for crowd counting while adding no extra model parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether parameter-free attention modules can improve the CSRNet backbone for estimating crowd density in public transport settings. It evaluates channel-wise PFCA, spatial SA, 3-D SimAM, and a new PFCASA combination on the ShanghaiTech dataset against attention modules that add up to 1 percent more parameters. The experiments show these zero-parameter additions reach comparable or better counting accuracy. The work matters because public transport vehicles need lightweight models that run on edge hardware to monitor occupancy from sparse to dense conditions. Performance varies by density, with PFCASA stronger below 40 people and PFCA stronger at higher densities.

Core claim

Using CSRNet as the backbone, experiments on the ShanghaiTech dataset demonstrate that parameter-free attention mechanisms achieve comparable or superior accuracy without introducing additional model parameters.

What carries the argument

Parameter-free attention modules (PFCA channel-wise, SA spatial-wise, SimAM 3-D, and their PFCASA combination) inserted into CSRNet to enhance representational power for density map estimation without increasing parameter count.

If this is right

Model size and computational cost stay identical to the original CSRNet.
PFCASA delivers the best results in scenes containing fewer than 40 individuals.
PFCA becomes more effective as crowd density rises above that level.
The approach supports direct integration into resource-limited edge devices for real-time occupancy monitoring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-time passenger counting could run on inexpensive onboard processors without cloud offloading.
The same parameter-free modules might transfer to other transport vision tasks such as queue length estimation.
A follow-up study measuring accuracy on diverse vehicle camera data would test generalization beyond the benchmark.

Load-bearing premise

Performance measured on the ShanghaiTech benchmark will transfer to real public transport camera feeds that differ in lighting, camera angles, motion blur, and passenger behavior.

What would settle it

Apply the PFCASA-augmented CSRNet to video from actual onboard public transport cameras and check whether mean absolute error stays within the range reported on ShanghaiTech.

Figures

Figures reproduced from arXiv: 2605.18349 by Aida Rostamza, Cristina Olaverri-Monreal, Enrico Del Re, Joshua Cherian Varughese.

**Figure 2.** Figure 2: Histogram showing the distribution of the number of people [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Model performance in terms of accuracy for crowd densities [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 3.** Figure 3: Figures (a) and (b) show the accuracy comparison at different [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Occupancy estimation and crowd counting are critical tasks in designing smart and efficient public transport vehicles. Given that public transport loading can vary from sparse to crowded, classical models for occupancy estimation must be adapted to suit this purpose. Attention mechanisms have shown remarkable capability in enhancing the representational power of deep neural networks for crowd counting in congested scenes with occlusion, complex backgrounds, and perspective distortion. However, conventional approaches, often implemented as parameterized sub-networks within convolutional layers, inevitably increase model size and computational cost, limiting deployment on resource-constrained edge devices. This paper investigates the effectiveness of state-of-the-art parameter-free attention mechanisms for crowd counting and density map estimation in highly congested scenes. We evaluate channel-wise (PFCA), spatial-wise (SA), and 3-D (SimAM) modules and compare their performance with parameterized attention modules constrained to introduce no more than 1% additional parameters. Furthermore, we present a novel combination of attention mechanisms that combines the strengths of PFCA and SA (PFCASA) customized for analyzing video streams onboard public transport systems. Using CSRNet as the backbone, experiments on the ShanghaiTech dataset demonstrate that parameter-free attention mechanisms achieve comparable or superior accuracy without introducing additional model parameters. A detailed performance analysis further reveals that PFCASA outperforms other attention modules in scenes with fewer than 40 individuals, while PFCA shows greater effectiveness as crowd density increases, underscoring their potential applicability for integration into smart public transport modalities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Parameter-free attention modules plugged into CSRNet match or beat lightly parameterized versions on ShanghaiTech for crowd counting, with a PFCASA combo that helps in low-density scenes.

read the letter

The one thing to know is that this paper combines a few existing parameter-free attention modules with the CSRNet backbone and reports that the resulting models perform at least as well as versions with small numbers of extra parameters on the ShanghaiTech crowd counting dataset. Their PFCASA combination of channel and spatial attention appears to do better in scenes with under 40 people. What the paper does well is focus on a real deployment constraint: running crowd counting on edge devices in public transport without increasing model size. They evaluate PFCA, SA, and SimAM individually and then propose their combined PFCASA version customized for video streams in vehicles. The comparison to parameterized attention modules kept under 1% added parameters is a fair way to show the benefit of going parameter-free. The evaluation on a public dataset with density-specific analysis is also a plus, as it highlights where each module shines. The soft spots are mostly around the lack of detail in the reported results. The abstract states that the parameter-free approaches achieve comparable or superior accuracy, but it does not include any specific metrics, standard deviations, or baseline comparisons with numbers. This makes it difficult to gauge the size of any improvement. In addition, while ShanghaiTech is a standard benchmark, the paper's own weakest assumption is that these gains will hold for actual public transport camera feeds, which often involve different viewpoints, motion blur, and lighting conditions not well represented in the dataset. Overall, this work is for practitioners who need lightweight models for occupancy estimation in smart mobility applications. A reader interested in parameter-efficient modifications to existing crowd counting networks would get some value from the comparisons and the PFCASA idea. It is coherent enough and addresses a practical problem with a clear experimental plan, so it deserves a serious referee. I would recommend sending it to peer review, with the expectation that reviewers will want more quantitative details and perhaps some discussion of generalization.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes integrating parameter-free attention mechanisms (PFCA, SA, SimAM, and a novel PFCASA combination) into the CSRNet backbone for crowd counting and density estimation. Motivated by public transport occupancy monitoring, it evaluates these modules on the ShanghaiTech dataset against parameterized attention baselines constrained to add no more than 1% parameters, claiming comparable or superior accuracy without increasing model size. Density-specific analysis is presented, with PFCASA performing better below 40 individuals and PFCA in denser scenes.

Significance. If supported by concrete metrics, the parameter-free approach would be valuable for edge deployment in resource-constrained public transport cameras, avoiding the parameter overhead of conventional attention sub-networks. The work explicitly names strengths such as the PFCASA combination tailored to video streams and the density-thresholded performance breakdown, but the absence of numerical results limits assessment of whether these constitute a genuine advance over existing CSRNet variants.

major comments (2)

[Abstract] Abstract: the central claim that 'experiments on the ShanghaiTech dataset demonstrate that parameter-free attention mechanisms achieve comparable or superior accuracy' is unsupported by any quantitative metrics (MAE, MSE), baseline tables, or error bars. This is load-bearing because the evaluation on the public dataset is the sole evidence offered for the performance assertions.
[Abstract] Abstract: the density-specific findings (PFCASA superior for scenes with fewer than 40 individuals, PFCA for higher densities) reference thresholds whose selection criteria, statistical significance, or sensitivity analysis are not described, and no per-density error breakdown or cross-validation details are supplied. This weakens the applicability claim for variable public-transport loading.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one key numerical comparison (e.g., MAE on ShanghaiTech Part A/B) to allow immediate assessment of the 'comparable or superior' statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and outline the revisions we will make to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'experiments on the ShanghaiTech dataset demonstrate that parameter-free attention mechanisms achieve comparable or superior accuracy' is unsupported by any quantitative metrics (MAE, MSE), baseline tables, or error bars. This is load-bearing because the evaluation on the public dataset is the sole evidence offered for the performance assertions.

Authors: We agree that the abstract would be strengthened by the inclusion of concrete quantitative metrics. In the revised manuscript we will update the abstract to report the key MAE and MSE values achieved by the best parameter-free variants (including PFCASA) on ShanghaiTech Part A and Part B, together with the corresponding baseline CSRNet results and the parameterized attention comparisons. These numbers are already present in the experimental tables of the full manuscript; their addition to the abstract will make the central performance claim directly verifiable. revision: yes
Referee: [Abstract] Abstract: the density-specific findings (PFCASA superior for scenes with fewer than 40 individuals, PFCA for higher densities) reference thresholds whose selection criteria, statistical significance, or sensitivity analysis are not described, and no per-density error breakdown or cross-validation details are supplied. This weakens the applicability claim for variable public-transport loading.

Authors: We acknowledge that the abstract does not explain the rationale for the density threshold of 40 or supply supporting statistical details. We will revise the abstract to state that the threshold was chosen after examining the empirical distribution of crowd counts in the ShanghaiTech training set. In addition, we will expand the main text (Section 4) to include a per-density MAE/MSE breakdown, a brief sensitivity analysis around the chosen threshold, and any cross-validation steps performed. These additions will better substantiate the density-dependent performance claims and their relevance to variable public-transport occupancy. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper is an empirical evaluation study that applies existing CSRNet backbone and standard parameter-free attention modules (PFCA, SA, SimAM, PFCASA) to the public ShanghaiTech dataset. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted inputs or self-defined quantities. The central claim rests on benchmark comparisons with an external dataset and a standard model, which constitutes independent evidence rather than internal circularity. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz that would force the result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper builds on the established CSRNet architecture and previously published parameter-free attention modules; the primary addition is their combination and application to public transport video streams.

axioms (1)

domain assumption CSRNet serves as a suitable backbone for density map estimation in congested scenes
The model is adopted without new justification or ablation in the abstract.

invented entities (1)

PFCASA no independent evidence
purpose: A custom combination of PFCA and SA attention to leverage strengths in low-density public transport scenes
Newly proposed in this work with no independent evidence outside the reported experiments.

pith-pipeline@v0.9.0 · 5802 in / 1297 out tokens · 52896 ms · 2026-05-20T11:11:56.353471+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate channel-wise (PFCA), spatial-wise (SA), and 3-D (SimAM) modules ... Vj = (Uj − µ)² + 2(σ² + λ) / 4(σ² + λ)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using CSRNet as the backbone, experiments on the ShanghaiTech dataset demonstrate that parameter-free attention mechanisms achieve comparable or superior accuracy without introducing additional model parameters.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

[1]

(2015) Shanghai new year crush kills 36

BBC News. (2015) Shanghai new year crush kills 36. Accessed: 2025- 07-14

work page 2015
[2]

158 deaths at halloween night: An accimap analysis of 2022 itaewon crowd crush in south korea,

C. Son, D.-H. Ham, S. Jin, and T. Park, “158 deaths at halloween night: An accimap analysis of 2022 itaewon crowd crush in south korea,”Safety Science, vol. 184, p. 106741, 2025

work page 2022
[3]

Opti- mization of passenger distribution at metro stations through a guidance system,

J. C ¸ apalar, A. Nemec, C. Zahradnik, and C. Olaverri-Monreal, “Opti- mization of passenger distribution at metro stations through a guidance system,” inComputer Aided Systems Theory - EUROCAST 2017 - 16th International Conference, Las Palmas de Gran Canaria, Spain, Febru- ary 19-24, 2017, Revised Selected Papers, Part II, ser. Lecture Notes in Computer ...

work page 2017
[4]

Optimization analysis of the transportation organization during each peak period of guangzhou metro line 3 (including the third north line),

X. Zhou, “Optimization analysis of the transportation organization during each peak period of guangzhou metro line 3 (including the third north line),”Technol. Develop. Enterprise, vol. 34, no. 20, pp. 72–74, 2015

work page 2015
[5]

Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,

Y . Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018, pp. 1091–1100

work page 2018
[6]

Combinatorial progressive architecture search for crowd counting,

Q. Li, C. Ma, H. Chen, X. Chen, and X. Yang, “Combinatorial progressive architecture search for crowd counting,”Displays, vol. 83, p. 102686, 2024

work page 2024
[7]

Parameter-free channel attention for image classification and super-resolution,

Y . Shi, L. Yang, W. An, X. Zhen, and L. Wang, “Parameter-free channel attention for image classification and super-resolution,”arXiv preprint arXiv:2303.11055, 2023

work page arXiv 2023
[8]

Parameter-Free Spatial Attention Network for Person Re-Identification

H. Wang, Y . Fan, Z. Wang, L. Jiao, and B. Schiele, “Parameter- free spatial attention network for person re-identification,”CoRR, vol. abs/1811.12150, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Simam: A simple, parameter-free attention module for convolutional neural networks,

L. Yang, R.-Y . Zhang, L. Li, and X. Xie, “Simam: A simple, parameter-free attention module for convolutional neural networks,” inInternational conference on machine learning. PMLR, 2021, pp. 11 863–11 874

work page 2021
[10]

Single-image crowd counting via multi-column convolutional neural network,

Y . Zhang, D. Zhou, S. Chen, S. Gao, and Y . Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 589–597

work page 2016
[11]

Approaches on crowd counting and density estimation: a review,

B. Li, H. Huang, A. Zhang, P. Liu, and C. Liu, “Approaches on crowd counting and density estimation: a review,”Pattern Analysis and Applications, vol. 24, no. 3, pp. 853–874, 2021

work page 2021
[12]

Learning to count objects in images,

V . S. Lempitsky and A. Zisserman, “Learning to count objects in images,” inAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems

work page
[13]

Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, J. D. Lafferty, C. K. I. Williams, J. Shawe- Taylor, R. S. Zemel, and A. Culotta, Eds. Curran Associates, Inc., 2010, pp. 1324–1332

work page 2010
[14]

Density-aware person detection and tracking in crowds,

M. Rodriguez, I. Laptev, J. Sivic, and J. Audibert, “Density-aware person detection and tracking in crowds,” inIEEE International Con- ference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, D. N. Metaxas, L. Quan, A. Sanfeliu, and L. V . Gool, Eds. IEEE Computer Society, 2011, pp. 2423–2430

work page 2011
[15]

Soft-csrnet: Real-time dilated convolutional neural networks for crowd counting with drones,

I. Bakour, H. N. Bouchali, S. Allali, and H. Lacheheb, “Soft-csrnet: Real-time dilated convolutional neural networks for crowd counting with drones,” in2020 2nd International Workshop on Human-Centric Smart Environments for Health and Well-being (IHSH). IEEE, 2021, pp. 28–33

work page 2021
[16]

Crowd counting method based on improved csrnet,

H. Zhao, S. Lu, L. Wang, Z. Nie, and Y . Li, “Crowd counting method based on improved csrnet,”International Conference on Artificial Life and Robots, vol. 25, pp. 605–610, 01 2020

work page 2020
[17]

A location-enhanced and multiscale-friendly crowd detecting approach for tram,

R. Zhao, Z. Han, Z. Liu, H. Wang, and J. Zhong, “A location-enhanced and multiscale-friendly crowd detecting approach for tram,”IEEE Trans. Instrum. Meas., vol. 71, pp. 1–9, 2022

work page 2022
[18]

Scale aggregation network for accurate and efficient crowd counting,

X. Cao, Z. Wang, Y . Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 734–750

work page 2018
[19]

Crowd counting and density estimation by trellis encoder- decoder networks,

X. Jiang, Z. Xiao, B. Zhang, X. Zhen, X. Cao, D. Doermann, and L. Shao, “Crowd counting and density estimation by trellis encoder- decoder networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6133–6142

work page 2019
[20]

Single-image crowd counting via multi-column convolutional neural network,

Y . Zhang, D. Zhou, S. Chen, S. Gao, and Y . Ma, “Single-image crowd counting via multi-column convolutional neural network,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016. IEEE Computer Society, 2016, pp. 589–597

work page 2016
[21]

Switching convolutional neural network for crowd counting,

D. B. Sam, S. Surya, and R. V . Babu, “Switching convolutional neural network for crowd counting,” in2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017, pp. 4031–4039

work page 2017
[22]

Multi scale attention network for crowd count- ing,

X. Yang and X. Lu, “Multi scale attention network for crowd count- ing,” inCSAE 2021: The 5th International Conference on Computer Science and Application Engineering, Sanya, China, October 19 - 21, 2021, A. Emrouznejad and J. R. Chou, Eds. ACM, 2021, pp. 22:1– 22:8

work page 2021
[23]

SCAR: spatial-/channel-wise attention regression networks for crowd counting,

J. Gao, Q. Wang, and Y . Yuan, “SCAR: spatial-/channel-wise attention regression networks for crowd counting,”Neurocomputing, vol. 363, pp. 1–8, 2019

work page 2019
[24]

Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

L. Zhu, Z. Zhao, C. Lu, Y . Lin, Y . Peng, and T. Yao, “Dual path multi- scale fusion networks with attention for crowd counting,”CoRR, vol. abs/1902.01115, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[25]

Cbam: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19

work page 2018
[26]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141

work page 2018
[27]

Coordinate attention for efficient mobile network design,

Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” inIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 13 713–13 722

work page 2021

[1] [1]

(2015) Shanghai new year crush kills 36

BBC News. (2015) Shanghai new year crush kills 36. Accessed: 2025- 07-14

work page 2015

[2] [2]

158 deaths at halloween night: An accimap analysis of 2022 itaewon crowd crush in south korea,

C. Son, D.-H. Ham, S. Jin, and T. Park, “158 deaths at halloween night: An accimap analysis of 2022 itaewon crowd crush in south korea,”Safety Science, vol. 184, p. 106741, 2025

work page 2022

[3] [3]

Opti- mization of passenger distribution at metro stations through a guidance system,

J. C ¸ apalar, A. Nemec, C. Zahradnik, and C. Olaverri-Monreal, “Opti- mization of passenger distribution at metro stations through a guidance system,” inComputer Aided Systems Theory - EUROCAST 2017 - 16th International Conference, Las Palmas de Gran Canaria, Spain, Febru- ary 19-24, 2017, Revised Selected Papers, Part II, ser. Lecture Notes in Computer ...

work page 2017

[4] [4]

Optimization analysis of the transportation organization during each peak period of guangzhou metro line 3 (including the third north line),

X. Zhou, “Optimization analysis of the transportation organization during each peak period of guangzhou metro line 3 (including the third north line),”Technol. Develop. Enterprise, vol. 34, no. 20, pp. 72–74, 2015

work page 2015

[5] [5]

Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,

Y . Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018, pp. 1091–1100

work page 2018

[6] [6]

Combinatorial progressive architecture search for crowd counting,

Q. Li, C. Ma, H. Chen, X. Chen, and X. Yang, “Combinatorial progressive architecture search for crowd counting,”Displays, vol. 83, p. 102686, 2024

work page 2024

[7] [7]

Parameter-free channel attention for image classification and super-resolution,

Y . Shi, L. Yang, W. An, X. Zhen, and L. Wang, “Parameter-free channel attention for image classification and super-resolution,”arXiv preprint arXiv:2303.11055, 2023

work page arXiv 2023

[8] [8]

Parameter-Free Spatial Attention Network for Person Re-Identification

H. Wang, Y . Fan, Z. Wang, L. Jiao, and B. Schiele, “Parameter- free spatial attention network for person re-identification,”CoRR, vol. abs/1811.12150, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Simam: A simple, parameter-free attention module for convolutional neural networks,

L. Yang, R.-Y . Zhang, L. Li, and X. Xie, “Simam: A simple, parameter-free attention module for convolutional neural networks,” inInternational conference on machine learning. PMLR, 2021, pp. 11 863–11 874

work page 2021

[10] [10]

Single-image crowd counting via multi-column convolutional neural network,

Y . Zhang, D. Zhou, S. Chen, S. Gao, and Y . Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 589–597

work page 2016

[11] [11]

Approaches on crowd counting and density estimation: a review,

B. Li, H. Huang, A. Zhang, P. Liu, and C. Liu, “Approaches on crowd counting and density estimation: a review,”Pattern Analysis and Applications, vol. 24, no. 3, pp. 853–874, 2021

work page 2021

[12] [12]

Learning to count objects in images,

V . S. Lempitsky and A. Zisserman, “Learning to count objects in images,” inAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems

work page

[13] [13]

Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, J. D. Lafferty, C. K. I. Williams, J. Shawe- Taylor, R. S. Zemel, and A. Culotta, Eds. Curran Associates, Inc., 2010, pp. 1324–1332

work page 2010

[14] [14]

Density-aware person detection and tracking in crowds,

M. Rodriguez, I. Laptev, J. Sivic, and J. Audibert, “Density-aware person detection and tracking in crowds,” inIEEE International Con- ference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, D. N. Metaxas, L. Quan, A. Sanfeliu, and L. V . Gool, Eds. IEEE Computer Society, 2011, pp. 2423–2430

work page 2011

[15] [15]

Soft-csrnet: Real-time dilated convolutional neural networks for crowd counting with drones,

I. Bakour, H. N. Bouchali, S. Allali, and H. Lacheheb, “Soft-csrnet: Real-time dilated convolutional neural networks for crowd counting with drones,” in2020 2nd International Workshop on Human-Centric Smart Environments for Health and Well-being (IHSH). IEEE, 2021, pp. 28–33

work page 2021

[16] [16]

Crowd counting method based on improved csrnet,

H. Zhao, S. Lu, L. Wang, Z. Nie, and Y . Li, “Crowd counting method based on improved csrnet,”International Conference on Artificial Life and Robots, vol. 25, pp. 605–610, 01 2020

work page 2020

[17] [17]

A location-enhanced and multiscale-friendly crowd detecting approach for tram,

R. Zhao, Z. Han, Z. Liu, H. Wang, and J. Zhong, “A location-enhanced and multiscale-friendly crowd detecting approach for tram,”IEEE Trans. Instrum. Meas., vol. 71, pp. 1–9, 2022

work page 2022

[18] [18]

Scale aggregation network for accurate and efficient crowd counting,

X. Cao, Z. Wang, Y . Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 734–750

work page 2018

[19] [19]

Crowd counting and density estimation by trellis encoder- decoder networks,

X. Jiang, Z. Xiao, B. Zhang, X. Zhen, X. Cao, D. Doermann, and L. Shao, “Crowd counting and density estimation by trellis encoder- decoder networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6133–6142

work page 2019

[20] [20]

Single-image crowd counting via multi-column convolutional neural network,

Y . Zhang, D. Zhou, S. Chen, S. Gao, and Y . Ma, “Single-image crowd counting via multi-column convolutional neural network,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016. IEEE Computer Society, 2016, pp. 589–597

work page 2016

[21] [21]

Switching convolutional neural network for crowd counting,

D. B. Sam, S. Surya, and R. V . Babu, “Switching convolutional neural network for crowd counting,” in2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017, pp. 4031–4039

work page 2017

[22] [22]

Multi scale attention network for crowd count- ing,

X. Yang and X. Lu, “Multi scale attention network for crowd count- ing,” inCSAE 2021: The 5th International Conference on Computer Science and Application Engineering, Sanya, China, October 19 - 21, 2021, A. Emrouznejad and J. R. Chou, Eds. ACM, 2021, pp. 22:1– 22:8

work page 2021

[23] [23]

SCAR: spatial-/channel-wise attention regression networks for crowd counting,

J. Gao, Q. Wang, and Y . Yuan, “SCAR: spatial-/channel-wise attention regression networks for crowd counting,”Neurocomputing, vol. 363, pp. 1–8, 2019

work page 2019

[24] [24]

Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

L. Zhu, Z. Zhao, C. Lu, Y . Lin, Y . Peng, and T. Yao, “Dual path multi- scale fusion networks with attention for crowd counting,”CoRR, vol. abs/1902.01115, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[25] [25]

Cbam: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19

work page 2018

[26] [26]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141

work page 2018

[27] [27]

Coordinate attention for efficient mobile network design,

Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” inIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 13 713–13 722

work page 2021