C^3 Framework: An Open-source PyTorch Code for Crowd Counting

Bin Zhao; Chenyu Gao; Dong Wang; Jun Wen; Junyu Gao; Wei Lin

arxiv: 1907.02724 · v1 · pith:2U7ESPNBnew · submitted 2019-07-05 · 💻 cs.CV

C³ Framework: An Open-source PyTorch Code for Crowd Counting

Junyu Gao , Wei Lin , Bin Zhao , Dong Wang , Chenyu Gao , Jun Wen This is my paper

Pith reviewed 2026-05-25 02:38 UTC · model grok-4.3

classification 💻 cs.CV

keywords crowd countingPyTorchopen sourcebaseline networksreproducibilitycomputer vision

0 comments

The pith

The C^3 Framework releases open-source PyTorch code with baseline networks that achieve state-of-the-art results on crowd counting benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the Crowd Counting Code Framework (C^3F), an open-source PyTorch toolkit for crowd counting. It includes solid baseline networks that reach state-of-the-art performance, flexible parameter settings to improve results, and a log system to record experiments for better reproducibility. The code is publicly available on GitHub, along with a detailed Chinese blog. A sympathetic reader would care because it provides ready-to-use, high-performing implementations that can standardize experiments in the field.

Core claim

The C^3 Framework presents baseline networks that have achieved the state-of-the-arts on crowd counting tasks, along with parameter setting strategies and a log system to enhance reproducibility.

What carries the argument

The C^3 Framework, which consists of baseline networks, flexible parameter settings, and a powerful log system for recording experiments.

If this is right

The released code allows researchers to reproduce and build upon state-of-the-art crowd counting results.
Flexible parameter settings can further promote performance on standard benchmarks.
The log system enhances the reproducibility of each experiment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Releasing such frameworks could reduce duplication of effort in implementing common baselines for crowd counting.
This approach might be extended to other computer vision tasks where code reproducibility is an issue.
Users could test the baselines on new datasets to verify generalization.

Load-bearing premise

The released baseline networks truly achieve state-of-the-art performance on standard crowd counting benchmarks.

What would settle it

Running the provided code on standard benchmarks like ShanghaiTech and finding that the reported metrics do not match the claimed state-of-the-art results.

Figures

Figures reproduced from arXiv: 1907.02724 by Bin Zhao, Chenyu Gao, Dong Wang, Jun Wen, Junyu Gao, Wei Lin.

**Figure 2.** Figure 2: Decoder Structure. Method MAE MSE VGG 10.3 16.5 VGG + decoder 10.5 17.4 [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

read the original abstract

This technical report attempts to provide efficient and solid kits addressed on the field of crowd counting, which is denoted as Crowd Counting Code Framework (C$^3$F). The contributions of C$^3$F are in three folds: 1) Some solid baseline networks are presented, which have achieved the state-of-the-arts. 2) Some flexible parameter setting strategies are provided to further promote the performance. 3) A powerful log system is developed to record the experiment process, which can enhance the reproducibility of each experiment. Our code is made publicly available at \url{https://github.com/gjy3035/C-3-Framework}. Furthermore, we also post a Chinese blog\footnote{\url{https://zhuanlan.zhihu.com/p/65650998}} to describe the details and insights of crowd counting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents C³F, an open-source PyTorch framework for crowd counting. It claims three contributions: (1) baseline networks (e.g., CSRNet, MCNN variants) that achieve state-of-the-art performance, (2) flexible parameter-setting strategies to improve results, and (3) a logging system to enhance reproducibility. The code is hosted on GitHub; the text itself contains no experimental results or tables.

Significance. A verified, well-documented PyTorch implementation of standard crowd-counting baselines together with reproducible logging tools would be a modest but useful service to the community, lowering the barrier to fair comparisons on ShanghaiTech, UCF-QNRF and similar benchmarks. The significance is currently undercut by the complete absence of any quantitative evidence inside the manuscript.

major comments (2)

[Abstract] Abstract, first contribution: the assertion that the released baselines 'have achieved the state-of-the-arts' is unsupported by any numbers, tables, or comparisons. No MAE/MSE values, no benchmark names, and no reference to original papers' reported scores appear anywhere in the document.
[Full text (no experimental section present)] No experimental section or results table exists. The central reproducibility claim therefore rests entirely on an external GitHub repository whose training loops, density-map generation, normalization, and evaluation protocols are not described or validated inside the manuscript.

minor comments (2)

[Abstract] The phrase 'in three folds' should read 'threefold'.
[Full text] The manuscript would benefit from a short 'Code Structure' subsection that maps the GitHub repository layout to the three claimed contributions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our technical report. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract, first contribution: the assertion that the released baselines 'have achieved the state-of-the-arts' is unsupported by any numbers, tables, or comparisons. No MAE/MSE values, no benchmark names, and no reference to original papers' reported scores appear anywhere in the document.

Authors: We agree that the state-of-the-art claim requires explicit quantitative support inside the manuscript. In the revised version we will add a concise experimental section reporting MAE and MSE on ShanghaiTech (Part A/B) and UCF-QNRF, together with direct numerical comparisons to the scores published in the original CSRNet and MCNN papers. The abstract will be updated to reference these results. revision: yes
Referee: [Full text (no experimental section present)] No experimental section or results table exists. The central reproducibility claim therefore rests entirely on an external GitHub repository whose training loops, density-map generation, normalization, and evaluation protocols are not described or validated inside the manuscript.

Authors: We acknowledge the absence of an experimental section. The revision will include a new section that briefly describes the training and evaluation pipelines (density-map generation, normalization, and metrics) and reports sample results obtained from the released code. This will allow the reproducibility claims to be assessed from the manuscript itself while the full implementation remains on GitHub. revision: yes

Circularity Check

0 steps flagged

No circularity: code release with no derivations or fitted predictions

full rationale

The manuscript is a technical report for a PyTorch code framework (C^3F) whose contributions are the released repository, baseline network implementations, parameter strategies, and logging system. No equations, derivations, predictions, or fitted parameters appear in the abstract or described content. The SOTA assertion is an unsupported claim about external code performance rather than any self-referential reduction or self-citation load-bearing step. No patterns from the enumerated circularity kinds are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities appear in the abstract; the paper is an engineering artifact rather than a theoretical contribution.

pith-pipeline@v0.9.0 · 5674 in / 967 out tokens · 50171 ms · 2026-05-25T02:38:58.452076+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This technical report attempts to provide efficient and solid kits addressed on the field of crowd counting... baseline networks... preprocessing strategies... log system

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 2 canonical work pages · 2 internal anchors

[1]

https://pytorch.org/

Pytorch. https://pytorch.org/
[2]

X. Cao, Z. Wang, Y . Zhao, and F. Su. Scale aggregation network for accurate and efﬁcient crowd counting. In Pro- ceedings of the European Conference on Computer Vision , pages 734–750, 2018

2018
[3]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, pages 770–778, 2016

2016
[4]

Idrees, I

H. Idrees, I. Saleemi, C. Seibert, and M. Shah. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2547–2554, 2013

2013
[5]

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah. Composition loss for counting, density map estimation and localization in dense crowds. arXiv preprint arXiv:1808.01050, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012

2012
[7]

Y . Li, X. Zhang, and D. Chen. Csrnet: Dilated convo- lutional neural networks for understanding the highly con- gested scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1091– 1100, 2018

2018
[8]

M. Shi, Z. Yang, C. Xu, and Q. Chen. Revisiting perspective information for efﬁcient crowd counting. 2019

2019
[9]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

V . A. Sindagi and V . M. Patel. Cnn-based cascaded multi- task learning of high-level prior and density estimation for crowd counting. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveil- lance, pages 1–6, 2017

2017
[11]

Q. Wang, J. Gao, W. Lin, and Y . Yuan. Learning from syn- thetic data for crowd counting in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 8198–8207, 2019

2019
[12]

Zhang, K

C. Zhang, K. Kang, H. Li, X. Wang, R. Xie, and X. Yang. Data-driven crowd understanding: a baseline for a large- scale crowd dataset. IEEE Transactions on Multimedia , 18(6):1048–1061, 2016

2016
[13]

Zhang, D

Y . Zhang, D. Zhou, S. Chen, S. Gao, and Y . Ma. Single- image crowd counting via multi-column convolutional neu- 7https://github.com/BIGKnight/SANet_ implementation 8https://github.com/gjy3035/enet.pytorch 9https://github.com/gjy3035/GCC-SFCN 10https://github.com/gjy3035/PCC-Net ral network. In Proceedings of the IEEE conference on Com- puter Vision and ...

2016

[1] [1]

https://pytorch.org/

Pytorch. https://pytorch.org/

[2] [2]

X. Cao, Z. Wang, Y . Zhao, and F. Su. Scale aggregation network for accurate and efﬁcient crowd counting. In Pro- ceedings of the European Conference on Computer Vision , pages 734–750, 2018

2018

[3] [3]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, pages 770–778, 2016

2016

[4] [4]

Idrees, I

H. Idrees, I. Saleemi, C. Seibert, and M. Shah. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2547–2554, 2013

2013

[5] [5]

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah. Composition loss for counting, density map estimation and localization in dense crowds. arXiv preprint arXiv:1808.01050, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012

2012

[7] [7]

Y . Li, X. Zhang, and D. Chen. Csrnet: Dilated convo- lutional neural networks for understanding the highly con- gested scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1091– 1100, 2018

2018

[8] [8]

M. Shi, Z. Yang, C. Xu, and Q. Chen. Revisiting perspective information for efﬁcient crowd counting. 2019

2019

[9] [9]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

V . A. Sindagi and V . M. Patel. Cnn-based cascaded multi- task learning of high-level prior and density estimation for crowd counting. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveil- lance, pages 1–6, 2017

2017

[11] [11]

Q. Wang, J. Gao, W. Lin, and Y . Yuan. Learning from syn- thetic data for crowd counting in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 8198–8207, 2019

2019

[12] [12]

Zhang, K

C. Zhang, K. Kang, H. Li, X. Wang, R. Xie, and X. Yang. Data-driven crowd understanding: a baseline for a large- scale crowd dataset. IEEE Transactions on Multimedia , 18(6):1048–1061, 2016

2016

[13] [13]

Zhang, D

Y . Zhang, D. Zhou, S. Chen, S. Gao, and Y . Ma. Single- image crowd counting via multi-column convolutional neu- 7https://github.com/BIGKnight/SANet_ implementation 8https://github.com/gjy3035/enet.pytorch 9https://github.com/gjy3035/GCC-SFCN 10https://github.com/gjy3035/PCC-Net ral network. In Proceedings of the IEEE conference on Com- puter Vision and ...

2016