Spatio-Temporal Analysis of Public Transportation Undercrowding: Leveraging APC Data for a Comprehensive Evaluation of Usage Rates
Pith reviewed 2026-05-23 19:10 UTC · model grok-4.3
The pith
APC data combined with mixed models identifies undercrowded segments and rides on Milan public transport routes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study proposes and applies a methodology based on Automatic People Counting data processed through Generalized Linear Mixed Effects Models and Generalized Mixed-Effect Random Forests to analyze the probability of undercrowding at the segment level and the ride level on a radial surface transport route in Milan, identifying factors that influence undercrowding.
What carries the argument
Generalized Linear Mixed Effects Model and Generalized Mixed-Effect Random Forest fitted to an undercrowding indicator derived from APC passenger counts, used to model probability at both segment and ride levels.
If this is right
- Segments can be ranked by their modeled probability of undercrowding.
- Covariates such as time of day and location are shown to affect the probability of undercrowding at the segment level.
- The same models extend the analysis to the full ride, revealing the temporal distribution of undercrowding across the journey.
- The occupancy-rate indicator directly compares observed demand against vehicle capacity on each segment.
Where Pith is reading between the lines
- Transit planners could prioritize schedule or vehicle-size adjustments on the segments the models flag as most undercrowded.
- The same APC-plus-mixed-model pipeline could be rerun on other routes or cities that already collect comparable passenger-count data.
- Adding ride-level predictions to segment-level ones gives operators a way to assess whether undercrowding on one part of a trip affects the whole journey.
Load-bearing premise
The APC sensors record passenger numbers and occupancy without systematic measurement error, missing counts, or route-specific calibration issues that would distort the undercrowding indicator.
What would settle it
A side-by-side manual passenger count on the same Milan route segments and time periods that produces occupancy rates differing substantially from the APC-derived rates would undermine the undercrowding classifications.
Figures
read the original abstract
The analysis of the transportation usage rate provides opportunities for evaluating the efficacy of the transportation service offered by proposing an indicator that integrates actual demand and capacity. This study aims to develop a methodology for analyzing the occupancy rate from large-scale datasets to identify gaps between supply and demand in public transportation. Leveraging the spatio-temporal granularity of data from Automatic People Counting (APC) and relying on the Generalized Linear Mixed Effects Model and the Generalized Mixed-Effect Random Forest, in this study we propose a methodology for analyzing factors determining undercrowding. The results of the model are examined at both the segment and ride levels. Initially, the analysis focuses on identifying segments more likely associated with undercrowding, understanding factors influencing the probability of undercrowding, and exploring their relationships. Subsequently, the analysis extends to the temporal distribution of undercrowding, encompassing its impact on the entire journey. The proposed methodology is applied to analyze APC data, provided by the company responsible for public transport management in Milan, on a radial route of the surface transportation network.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a methodology to analyze undercrowding in public transport by defining an occupancy-based indicator from Automatic Passenger Counting (APC) data on a Milan radial route. It applies Generalized Linear Mixed Effects Models (GLMM) and Generalized Mixed-Effect Random Forests (GMERF) to identify segments and ride-level factors associated with undercrowding probability, then extends the analysis to temporal patterns across entire journeys.
Significance. If APC counts prove accurate and the models are validated with appropriate metrics, the work could offer a data-driven framework for detecting supply-demand gaps in transit networks, with potential applications in service planning. The hierarchical modeling approach suits the spatio-temporal structure of ride and segment data.
major comments (2)
- [Abstract/Methods] Abstract and Methods: The manuscript provides no description of APC sensor calibration, handling of missing scans, route-specific bias correction, or validation against manual counts. This is load-bearing because the binary undercrowding indicator is derived directly from APC occupancy values; any systematic error would propagate to all segment probabilities and temporal analyses.
- [Results] Results: No equations, model specifications, goodness-of-fit metrics (e.g., AUC, R², or deviance), error bars, or cross-validation details are referenced for the GLMM and GMERF fits, preventing assessment of whether the identified factors are statistically supported or merely descriptive.
minor comments (2)
- [Introduction] Notation for the undercrowding threshold and occupancy rate should be defined explicitly with a formula early in the text.
- [Methods] Clarify the distinction between segment-level and ride-level random effects in the GMERF specification.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. Below we respond point-by-point to the major comments, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract/Methods] Abstract and Methods: The manuscript provides no description of APC sensor calibration, handling of missing scans, route-specific bias correction, or validation against manual counts. This is load-bearing because the binary undercrowding indicator is derived directly from APC occupancy values; any systematic error would propagate to all segment probabilities and temporal analyses.
Authors: We agree that explicit documentation of APC data quality is essential. The manuscript will be revised to add a dedicated subsection in Methods describing the computation of the occupancy indicator from raw APC counts, our handling of missing scans, and any route-level adjustments performed. We will also discuss known limitations of APC data. However, detailed sensor calibration protocols and results from manual-count validation campaigns are proprietary to the data provider (the Milan public transport operator) and are not available to the authors; we will state this limitation clearly. revision: partial
-
Referee: [Results] Results: No equations, model specifications, goodness-of-fit metrics (e.g., AUC, R², or deviance), error bars, or cross-validation details are referenced for the GLMM and GMERF fits, preventing assessment of whether the identified factors are statistically supported or merely descriptive.
Authors: We accept this criticism. The revised Results section will include the explicit model equations for both the GLMM and GMERF, tables of estimated coefficients with standard errors, goodness-of-fit statistics (AUC, deviance, and pseudo-R² where applicable), and a description of the cross-validation strategy used to assess predictive performance. These additions will allow readers to evaluate the statistical support for the reported associations. revision: yes
- Proprietary details on APC sensor calibration and independent manual-count validation, which are not accessible to the authors.
Circularity Check
No circularity; modeling applies standard GLMM/GMERF to external APC-derived indicators
full rationale
The paper applies GLMM and GMERF to undercrowding indicators computed directly from APC passenger counts supplied by the Milan operator. No equations, predictions, or uniqueness claims reduce outputs to fitted parameters or self-citations by construction. The derivation chain processes observed spatio-temporal data through off-the-shelf mixed-effects models; the central results (segment probabilities and ride-level factors) are statistical outputs of those models on external inputs, not tautological re-expressions of the inputs themselves.
Axiom & Free-Parameter Ledger
free parameters (1)
- GLMM and GMERF hyperparameters and random effect variances
axioms (2)
- domain assumption APC sensor data provides unbiased measurements of passenger counts and vehicle capacity
- standard math Standard assumptions of generalized linear mixed models (linearity on link scale, correct random effect distribution) hold for the occupancy data
Reference graph
Works this paper leans on
-
[1]
S. Arhin, B. Manandhar, and H. Baba Adam. Predicting travel times of bus transit in washington, d.c. using artificial neural networks. Civil Engineering Journal, 6: 0 2245--2261, 11 2020. doi:10.28991/cej-2020-03091615
-
[2]
M. Asgharzadeh and Y. Shafahi. Real-time bus-holding control strategy to reduce passenger waiting time. Transportation Research Record, 2647 0 (1): 0 9--16, 2017. doi:10.3141/2647-02
-
[3]
Y. Baghoussi, J. Mendes-Moreira, and M. T. Emmerich. Updating a robust optimization model for improving bus schedules. In 2018 10th International Conference on Communication Systems & Networks (COMSNETS), pages 619--624. IEEE, 2018
work page 2018
-
[4]
B. Barabino, M. Di Francesco, and S. Mozzoni. An offline framework for handling automatic passenger counting raw data. IEEE Transactions on Intelligent Transportation Systems, 15 0 (6): 0 2443--2456, 2014. doi:10.1109/TITS.2014.2315573
- [5]
-
[6]
L. Breiman. Random forests. Machine learning, 45: 0 5--32, 2001
work page 2001
-
[7]
S. Buchmueller, U. Weidmann, and A. Nash. Development of a dwell time calculation model for timetable planning. In WIT Transactions on The Built Environment, pages 525--534, 08 2008. ISBN 9781845641269. doi:10.2495/CR080511
-
[8]
Z. Christoforou, E. Chandakas, and I. Kaparias. Investigating the impact of dwell time on the reliability of urban light rail operations. Urban Rail Transit, 6 0 (2): 0 116--131, 2020
work page 2020
-
[9]
J. De O \ n a, R. De O \ n a, L. Eboli, and G. Mazzulla. Perceived service quality in bus transit service: a structural equation approach. Transport Policy, 29: 0 219--226, 2013
work page 2013
-
[10]
L. Dell’Olio, A. Ibeas, and P. Cecin. The quality of service desired by public transport users. Transport Policy, 18 0 (1): 0 217--227, 2011
work page 2011
-
[11]
Y. Hadas and M. Shnaiderman. Public-transit frequency setting using minimum-cost approach with stochastic demand and travel time. Transportation Research Part B: Methodological, 46 0 (8): 0 1068--1084, 2012. ISSN 0191-2615. doi:https://doi.org/10.1016/j.trb.2012.02.010
-
[12]
B. Hellinga, M. Mandelzys, F. Yang, and M. Saavedra. Automatically diagnosing bus transit operational deficiencies. Proceedings, Annual Conference - Canadian Society for Civil Engineering, 2: 0 1403--1412, 01 2010
work page 2010
-
[13]
B. Hellinga, F. Yang, and J. Hart-Bishop. Estimating signalized intersection delays to transit vehicles. Transportation Research Record: Journal of the Transportation Research Board, 2259: 0 158--167, 12 2011. doi:10.3141/2259-15
-
[14]
J. Hoppe, F. Schwinger, H. Haeger, J. Wernz, and M. Jarke. Improving the prediction of passenger numbers in public transit networks by combining short-term forecasts with real-time occupancy data. IEEE Open Journal of Intelligent Transportation Systems, 4: 0 153--174, 2023. doi:10.1109/OJITS.2023.3251564
-
[15]
B. Hu, S. Feng, J. Li, and H. Zhao. Statistical analysis of passenger-crowding in bus transport network of harbin. Physica A: Statistical Mechanics and its Applications, 490: 0 426--438, 2018. ISSN 0378-4371. doi:https://doi.org/10.1016/j.physa.2017.08.004
-
[16]
S. Jara-D \' az and A. Gschwender. Towards a general microeconomic model for the operation of public transport. Transport Reviews, 23 0 (4): 0 453--469, 2003
work page 2003
-
[17]
S. Karnberger and C. Antoniou. Network--wide prediction of public transportation ridership using spatio--temporal link--level information. Journal of Transport Geography, 82: 0 102549, 2020
work page 2020
-
[18]
J. Khiari, L. Moreira-Matias, V. Cerqueira, and O. Cats. Automated setting of bus schedule coverage using unsupervised machine learning. pages 552--564, 04 2016. ISBN 978-3-319-31752-6. doi:10.1007/978-3-319-31753-3_44
-
[19]
C. Kim, C. G. Choi, S. Cho, and D. Kim. A comparative study of aggregate and disaggregate gravity models using seoul metropolitan subway trip data. Transportation Planning and Technology, 32 0 (1): 0 59--70, 2009
work page 2009
-
[20]
S. Komatsu, R. Furuta, and Y. Taniguchi. Passenger flow estimation with bipartite matching on bus surveillance cameras. In 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pages 206--212. IEEE, 2021
work page 2021
- [21]
-
[22]
E. Mai, G. List, and R. Hranac. Simulating the travel time impact of missed transit connections. Transportation Research Record, 2274 0 (1): 0 69--76, 2012. doi:10.3141/2274-07
-
[23]
M. Mandelzys and B. Hellinga. Identifying causes of performance issues in bus schedule adherence with automatic vehicle location and passenger count data. Transportation Research Record: Journal of the Transportation Research Board, 2143, 12 2010. doi:10.3141/2143-02
-
[24]
M. N. Milkovits. Modeling the factors affecting bus stop dwell time: Use of automatic passenger counting, automatic fare counting, and automatic vehicle location data. Transportation Research Record, 2072 0 (1): 0 125--130, 2008. doi:10.3141/2072-13
-
[25]
S. S. Moghaddam, R. Noroozi, J. M. Casello, and B. Hellinga. Predicting the mean and variance of transit segment and route travel times. Transportation Research Record, 2217 0 (1): 0 30--37, 2011. doi:10.3141/2217-04
-
[26]
I. Moser, C. McCarthy, P. P. Jayaraman, H. Ghaderi, H. Dia, R. Li, M. Simmons, U. Mehmood, A. M. Tan, Y. Weizman, et al. A methodology for empirically evaluating passenger counting technologies in public transport. In Proceedings of the 41st Australasian Transport Research Forum (ATRF), Canberra, Australia, volume 30, 2019
work page 2019
-
[27]
J. A. Nelder and R. W. M. Wedderburn. Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135 0 (3): 0 370--384, 1972. ISSN 00359238
work page 1972
- [28]
-
[29]
M. Pellagatti, C. Masci, F. Ieva, and A. M. Paganoni. Generalized mixed-effects random forest: A flexible approach to predict university student dropout. Statistical Analysis and Data Mining: The ASA Data Science Journal, 14 0 (3): 0 241--257, 2021. doi:https://doi.org/10.1002/sam.11505
-
[30]
X. Pi, M. Egge, J. Whitmore, A. Silbermann, and S. Qian. Understanding transit system performance using avl-apc data: An analytics platform with case studies for the pittsburgh region. Journal of Public Transportation, 21: 0 19--40, 07 2018. doi:10.5038/2375-0901.21.2.2
-
[31]
P. Rousseeuw, I. Ruts, and J. Tukey. The bagplot: A bivariate boxplot. The American Statistician, 53 0 (4): 0 382--387, 1999
work page 1999
-
[32]
M. Saavedra, B. Hellinga, and J. Casello. Automated quality assurance methodology for archived transit data from automatic vehicle location and passenger counting systems. Transportation research record, 2256 0 (1): 0 130--141, 2011
work page 2011
-
[33]
P. Samaras, A. Fachantidis, G. Tsoumakas, and I. Vlahavas. A prediction model of passenger demand using avl and apc data from a bus fleet. pages 129--134, 10 2015. doi:10.1145/2801948.2801984
-
[34]
M. Siebert and D. Ellenberger. Validation of automatic passenger counting: introducing the t-test-induced equivalence test. Transportation, 47 0 (6): 0 3031--3045, 2020
work page 2020
-
[35]
R. Thiagarajan and S. Prakashkumar. Identification of passenger demand in public transport using machine learning. Webology, 18 0 (Special Issue on Information Retrieval and Web Search): 0 223--236, 2021
work page 2021
-
[36]
N. van Oort, D. Sparing, T. Brands, and R. M. Goverde. Data driven improvements in public transport: the dutch example. Public transport, 7: 0 369--389, 2015
work page 2015
-
[37]
W. N. Venables and B. D. Ripley. Random and Mixed Effects, pages 271--300. Springer New York, New York, NY, 2002. ISBN 978-0-387-21706-2. doi:10.1007/978-0-387-21706-2_10
-
[38]
P. Wang, X. Chen, J. Chen, M. Hua, and Z. Pu. A two‐stage method for bus passenger load prediction using automatic passenger counting data. IET Intelligent Transport Systems, 15, 02 2021. doi:10.1049/itr2.12018
-
[39]
G. Whelan and J. Crockett. An investigation of the willingness to pay to reduce rail overcrowding. In Proceedings of the first International Conference on Choice Modelling, Harrogate, England, volume 30. Citeseer, 2009
work page 2009
-
[40]
N. H. Wilson, J. Zhao, and A. Rahbee. The potential impact of automated data collection systems on urban public transport planning. In Schedule-based modeling of transportation networks: Theory and applications, pages 1--25. Springer, 2008
work page 2008
-
[41]
F. Yang and B. Hellinga. Estimating transit vehicle delays caused by signalized intersections using archived avl/apc data. Proceedings, Annual Conference - Canadian Society for Civil Engineering, 3: 0 1817--1831, 01 2012
work page 2012
-
[42]
P. Zippenfenig. Open-Meteo.com Weather API , 2023. URL https://open-meteo.com/
work page 2023
-
[43]
L. Zou, S. Shu, X. Lin, K. Lin, J. Zhu, and L. Li. Passenger flow prediction using smart card data from connected bus system based on interpretable xgboost. Wireless Communications and Mobile Computing, 2022: 0 1--13, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.