pith. sign in

arxiv: 2410.12618 · v1 · submitted 2024-10-16 · 📊 stat.AP

Spatio-Temporal Analysis of Public Transportation Undercrowding: Leveraging APC Data for a Comprehensive Evaluation of Usage Rates

Pith reviewed 2026-05-23 19:10 UTC · model grok-4.3

classification 📊 stat.AP
keywords public transportationundercrowdingAPC dataGLMMGMERFoccupancy rateMilanspatio-temporal analysis
0
0 comments X

The pith

APC data combined with mixed models identifies undercrowded segments and rides on Milan public transport routes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a methodology that processes large-scale Automatic People Counting data through Generalized Linear Mixed Effects Models and Generalized Mixed-Effect Random Forests to estimate the probability of undercrowding. It examines this probability first at the level of individual route segments and then across entire rides on a radial surface line in Milan. A sympathetic reader would care because the approach supplies a repeatable way to locate mismatches between supplied capacity and observed demand using existing sensor streams rather than new surveys. The analysis traces both spatial patterns across stops and how undercrowding evolves over the course of a full journey.

Core claim

The study proposes and applies a methodology based on Automatic People Counting data processed through Generalized Linear Mixed Effects Models and Generalized Mixed-Effect Random Forests to analyze the probability of undercrowding at the segment level and the ride level on a radial surface transport route in Milan, identifying factors that influence undercrowding.

What carries the argument

Generalized Linear Mixed Effects Model and Generalized Mixed-Effect Random Forest fitted to an undercrowding indicator derived from APC passenger counts, used to model probability at both segment and ride levels.

If this is right

  • Segments can be ranked by their modeled probability of undercrowding.
  • Covariates such as time of day and location are shown to affect the probability of undercrowding at the segment level.
  • The same models extend the analysis to the full ride, revealing the temporal distribution of undercrowding across the journey.
  • The occupancy-rate indicator directly compares observed demand against vehicle capacity on each segment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Transit planners could prioritize schedule or vehicle-size adjustments on the segments the models flag as most undercrowded.
  • The same APC-plus-mixed-model pipeline could be rerun on other routes or cities that already collect comparable passenger-count data.
  • Adding ride-level predictions to segment-level ones gives operators a way to assess whether undercrowding on one part of a trip affects the whole journey.

Load-bearing premise

The APC sensors record passenger numbers and occupancy without systematic measurement error, missing counts, or route-specific calibration issues that would distort the undercrowding indicator.

What would settle it

A side-by-side manual passenger count on the same Milan route segments and time periods that produces occupancy rates differing substantially from the APC-derived rates would undermine the undercrowding classifications.

Figures

Figures reproduced from arXiv: 2410.12618 by Arianna Burzacchi, Giovanni Azzone, Marika Arena, Piercesare Secchi, Simone Vantini, Valeria Maria Urbano.

Figure 1
Figure 1. Figure 1: Distribution of the number of rides between weeks (a) and day types (b). [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Compared to the original dataset of raw APC measurements, the elaborated version [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: 10-Fold CV MSE for different degrees of time slot [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ROC curve evaluated on the test set for both Model 5 (in red) and Model 6 (in black). [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Random effects The estimated random effects of the GLMM are depicted in [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Marginal effect of Time slot and Day type on the probability of undercrowding. x axis: [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Marginal effect of Week and Day type on the probability of undercrowding. x axis: Week; [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Number of fully undercrowded rides at level [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Fully undercrowded aggregated rides by time slot, day type, and month. Limit value for [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

The analysis of the transportation usage rate provides opportunities for evaluating the efficacy of the transportation service offered by proposing an indicator that integrates actual demand and capacity. This study aims to develop a methodology for analyzing the occupancy rate from large-scale datasets to identify gaps between supply and demand in public transportation. Leveraging the spatio-temporal granularity of data from Automatic People Counting (APC) and relying on the Generalized Linear Mixed Effects Model and the Generalized Mixed-Effect Random Forest, in this study we propose a methodology for analyzing factors determining undercrowding. The results of the model are examined at both the segment and ride levels. Initially, the analysis focuses on identifying segments more likely associated with undercrowding, understanding factors influencing the probability of undercrowding, and exploring their relationships. Subsequently, the analysis extends to the temporal distribution of undercrowding, encompassing its impact on the entire journey. The proposed methodology is applied to analyze APC data, provided by the company responsible for public transport management in Milan, on a radial route of the surface transportation network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a methodology to analyze undercrowding in public transport by defining an occupancy-based indicator from Automatic Passenger Counting (APC) data on a Milan radial route. It applies Generalized Linear Mixed Effects Models (GLMM) and Generalized Mixed-Effect Random Forests (GMERF) to identify segments and ride-level factors associated with undercrowding probability, then extends the analysis to temporal patterns across entire journeys.

Significance. If APC counts prove accurate and the models are validated with appropriate metrics, the work could offer a data-driven framework for detecting supply-demand gaps in transit networks, with potential applications in service planning. The hierarchical modeling approach suits the spatio-temporal structure of ride and segment data.

major comments (2)
  1. [Abstract/Methods] Abstract and Methods: The manuscript provides no description of APC sensor calibration, handling of missing scans, route-specific bias correction, or validation against manual counts. This is load-bearing because the binary undercrowding indicator is derived directly from APC occupancy values; any systematic error would propagate to all segment probabilities and temporal analyses.
  2. [Results] Results: No equations, model specifications, goodness-of-fit metrics (e.g., AUC, R², or deviance), error bars, or cross-validation details are referenced for the GLMM and GMERF fits, preventing assessment of whether the identified factors are statistically supported or merely descriptive.
minor comments (2)
  1. [Introduction] Notation for the undercrowding threshold and occupancy rate should be defined explicitly with a formula early in the text.
  2. [Methods] Clarify the distinction between segment-level and ride-level random effects in the GMERF specification.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. Below we respond point-by-point to the major comments, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract/Methods] Abstract and Methods: The manuscript provides no description of APC sensor calibration, handling of missing scans, route-specific bias correction, or validation against manual counts. This is load-bearing because the binary undercrowding indicator is derived directly from APC occupancy values; any systematic error would propagate to all segment probabilities and temporal analyses.

    Authors: We agree that explicit documentation of APC data quality is essential. The manuscript will be revised to add a dedicated subsection in Methods describing the computation of the occupancy indicator from raw APC counts, our handling of missing scans, and any route-level adjustments performed. We will also discuss known limitations of APC data. However, detailed sensor calibration protocols and results from manual-count validation campaigns are proprietary to the data provider (the Milan public transport operator) and are not available to the authors; we will state this limitation clearly. revision: partial

  2. Referee: [Results] Results: No equations, model specifications, goodness-of-fit metrics (e.g., AUC, R², or deviance), error bars, or cross-validation details are referenced for the GLMM and GMERF fits, preventing assessment of whether the identified factors are statistically supported or merely descriptive.

    Authors: We accept this criticism. The revised Results section will include the explicit model equations for both the GLMM and GMERF, tables of estimated coefficients with standard errors, goodness-of-fit statistics (AUC, deviance, and pseudo-R² where applicable), and a description of the cross-validation strategy used to assess predictive performance. These additions will allow readers to evaluate the statistical support for the reported associations. revision: yes

standing simulated objections not resolved
  • Proprietary details on APC sensor calibration and independent manual-count validation, which are not accessible to the authors.

Circularity Check

0 steps flagged

No circularity; modeling applies standard GLMM/GMERF to external APC-derived indicators

full rationale

The paper applies GLMM and GMERF to undercrowding indicators computed directly from APC passenger counts supplied by the Milan operator. No equations, predictions, or uniqueness claims reduce outputs to fitted parameters or self-citations by construction. The derivation chain processes observed spatio-temporal data through off-the-shelf mixed-effects models; the central results (segment probabilities and ride-level factors) are statistical outputs of those models on external inputs, not tautological re-expressions of the inputs themselves.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that APC counts are reliable proxies for occupancy and that standard mixed-effects modeling assumptions hold for this transportation context; no new entities are postulated.

free parameters (1)
  • GLMM and GMERF hyperparameters and random effect variances
    Fitted to the APC dataset to capture segment- and ride-level variation; values not reported in abstract.
axioms (2)
  • domain assumption APC sensor data provides unbiased measurements of passenger counts and vehicle capacity
    Invoked when defining the undercrowding indicator from raw counts; no calibration or error model is mentioned in the abstract.
  • standard math Standard assumptions of generalized linear mixed models (linearity on link scale, correct random effect distribution) hold for the occupancy data
    Required for the GLMM component to produce valid probability estimates.

pith-pipeline@v0.9.0 · 5733 in / 1465 out tokens · 25417 ms · 2026-05-23T19:10:20.850470+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Arhin, B

    S. Arhin, B. Manandhar, and H. Baba Adam. Predicting travel times of bus transit in washington, d.c. using artificial neural networks. Civil Engineering Journal, 6: 0 2245--2261, 11 2020. doi:10.28991/cej-2020-03091615

  2. [2]

    Asgharzadeh and Y

    M. Asgharzadeh and Y. Shafahi. Real-time bus-holding control strategy to reduce passenger waiting time. Transportation Research Record, 2647 0 (1): 0 9--16, 2017. doi:10.3141/2647-02

  3. [3]

    Baghoussi, J

    Y. Baghoussi, J. Mendes-Moreira, and M. T. Emmerich. Updating a robust optimization model for improving bus schedules. In 2018 10th International Conference on Communication Systems & Networks (COMSNETS), pages 619--624. IEEE, 2018

  4. [4]

    Barabino, M

    B. Barabino, M. Di Francesco, and S. Mozzoni. An offline framework for handling automatic passenger counting raw data. IEEE Transactions on Intelligent Transportation Systems, 15 0 (6): 0 2443--2456, 2014. doi:10.1109/TITS.2014.2315573

  5. [5]

    Berkow, A

    M. Berkow, A. M. El-Geneidy, R. L. Bertini, and D. Crout. Beyond generating transit performance measures: visualizations and statistical analysis with historical data. Transportation Research Record, 2111 0 (1): 0 158--168, 2009

  6. [6]

    L. Breiman. Random forests. Machine learning, 45: 0 5--32, 2001

  7. [7]

    Buchmueller, U

    S. Buchmueller, U. Weidmann, and A. Nash. Development of a dwell time calculation model for timetable planning. In WIT Transactions on The Built Environment, pages 525--534, 08 2008. ISBN 9781845641269. doi:10.2495/CR080511

  8. [8]

    Christoforou, E

    Z. Christoforou, E. Chandakas, and I. Kaparias. Investigating the impact of dwell time on the reliability of urban light rail operations. Urban Rail Transit, 6 0 (2): 0 116--131, 2020

  9. [9]

    De O \ n a, R

    J. De O \ n a, R. De O \ n a, L. Eboli, and G. Mazzulla. Perceived service quality in bus transit service: a structural equation approach. Transport Policy, 29: 0 219--226, 2013

  10. [10]

    Dell’Olio, A

    L. Dell’Olio, A. Ibeas, and P. Cecin. The quality of service desired by public transport users. Transport Policy, 18 0 (1): 0 217--227, 2011

  11. [11]

    Hadas and M

    Y. Hadas and M. Shnaiderman. Public-transit frequency setting using minimum-cost approach with stochastic demand and travel time. Transportation Research Part B: Methodological, 46 0 (8): 0 1068--1084, 2012. ISSN 0191-2615. doi:https://doi.org/10.1016/j.trb.2012.02.010

  12. [12]

    Hellinga, M

    B. Hellinga, M. Mandelzys, F. Yang, and M. Saavedra. Automatically diagnosing bus transit operational deficiencies. Proceedings, Annual Conference - Canadian Society for Civil Engineering, 2: 0 1403--1412, 01 2010

  13. [13]

    Hellinga, F

    B. Hellinga, F. Yang, and J. Hart-Bishop. Estimating signalized intersection delays to transit vehicles. Transportation Research Record: Journal of the Transportation Research Board, 2259: 0 158--167, 12 2011. doi:10.3141/2259-15

  14. [14]

    Hoppe, F

    J. Hoppe, F. Schwinger, H. Haeger, J. Wernz, and M. Jarke. Improving the prediction of passenger numbers in public transit networks by combining short-term forecasts with real-time occupancy data. IEEE Open Journal of Intelligent Transportation Systems, 4: 0 153--174, 2023. doi:10.1109/OJITS.2023.3251564

  15. [15]

    B. Hu, S. Feng, J. Li, and H. Zhao. Statistical analysis of passenger-crowding in bus transport network of harbin. Physica A: Statistical Mechanics and its Applications, 490: 0 426--438, 2018. ISSN 0378-4371. doi:https://doi.org/10.1016/j.physa.2017.08.004

  16. [16]

    Jara-D \' az and A

    S. Jara-D \' az and A. Gschwender. Towards a general microeconomic model for the operation of public transport. Transport Reviews, 23 0 (4): 0 453--469, 2003

  17. [17]

    Karnberger and C

    S. Karnberger and C. Antoniou. Network--wide prediction of public transportation ridership using spatio--temporal link--level information. Journal of Transport Geography, 82: 0 102549, 2020

  18. [18]

    Khiari, L

    J. Khiari, L. Moreira-Matias, V. Cerqueira, and O. Cats. Automated setting of bus schedule coverage using unsupervised machine learning. pages 552--564, 04 2016. ISBN 978-3-319-31752-6. doi:10.1007/978-3-319-31753-3_44

  19. [19]

    C. Kim, C. G. Choi, S. Cho, and D. Kim. A comparative study of aggregate and disaggregate gravity models using seoul metropolitan subway trip data. Transportation Planning and Technology, 32 0 (1): 0 59--70, 2009

  20. [20]

    Komatsu, R

    S. Komatsu, R. Furuta, and Y. Taniguchi. Passenger flow estimation with bipartite matching on bus surveillance cameras. In 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pages 206--212. IEEE, 2021

  21. [21]

    Kovacs, L

    R. Kovacs, L. Nadai, and G. Horvath. Automatic passenger counting system for trams. In 15th World Congress on Intelligent Transport Systems and ITS America's 2008 Annual MeetingITS AmericaERTICOITS JapanTransCore, 2008

  22. [22]

    E. Mai, G. List, and R. Hranac. Simulating the travel time impact of missed transit connections. Transportation Research Record, 2274 0 (1): 0 69--76, 2012. doi:10.3141/2274-07

  23. [23]

    Mandelzys and B

    M. Mandelzys and B. Hellinga. Identifying causes of performance issues in bus schedule adherence with automatic vehicle location and passenger count data. Transportation Research Record: Journal of the Transportation Research Board, 2143, 12 2010. doi:10.3141/2143-02

  24. [24]

    M. N. Milkovits. Modeling the factors affecting bus stop dwell time: Use of automatic passenger counting, automatic fare counting, and automatic vehicle location data. Transportation Research Record, 2072 0 (1): 0 125--130, 2008. doi:10.3141/2072-13

  25. [25]

    S. S. Moghaddam, R. Noroozi, J. M. Casello, and B. Hellinga. Predicting the mean and variance of transit segment and route travel times. Transportation Research Record, 2217 0 (1): 0 30--37, 2011. doi:10.3141/2217-04

  26. [26]

    Moser, C

    I. Moser, C. McCarthy, P. P. Jayaraman, H. Ghaderi, H. Dia, R. Li, M. Simmons, U. Mehmood, A. M. Tan, Y. Weizman, et al. A methodology for empirically evaluating passenger counting technologies in public transport. In Proceedings of the 41st Australasian Transport Research Forum (ATRF), Canberra, Australia, volume 30, 2019

  27. [27]

    J. A. Nelder and R. W. M. Wedderburn. Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135 0 (3): 0 370--384, 1972. ISSN 00359238

  28. [28]

    Olivo, G

    A. Olivo, G. Maternini, and B. Barabino. Empirical study on the accuracy and precision of automatic passenger counting in european bus services. The Open Transportation Journal, 13 0 (1), 2019

  29. [29]

    Pellagatti, C

    M. Pellagatti, C. Masci, F. Ieva, and A. M. Paganoni. Generalized mixed-effects random forest: A flexible approach to predict university student dropout. Statistical Analysis and Data Mining: The ASA Data Science Journal, 14 0 (3): 0 241--257, 2021. doi:https://doi.org/10.1002/sam.11505

  30. [30]

    X. Pi, M. Egge, J. Whitmore, A. Silbermann, and S. Qian. Understanding transit system performance using avl-apc data: An analytics platform with case studies for the pittsburgh region. Journal of Public Transportation, 21: 0 19--40, 07 2018. doi:10.5038/2375-0901.21.2.2

  31. [31]

    Rousseeuw, I

    P. Rousseeuw, I. Ruts, and J. Tukey. The bagplot: A bivariate boxplot. The American Statistician, 53 0 (4): 0 382--387, 1999

  32. [32]

    Saavedra, B

    M. Saavedra, B. Hellinga, and J. Casello. Automated quality assurance methodology for archived transit data from automatic vehicle location and passenger counting systems. Transportation research record, 2256 0 (1): 0 130--141, 2011

  33. [33]

    Samaras, A

    P. Samaras, A. Fachantidis, G. Tsoumakas, and I. Vlahavas. A prediction model of passenger demand using avl and apc data from a bus fleet. pages 129--134, 10 2015. doi:10.1145/2801948.2801984

  34. [34]

    Siebert and D

    M. Siebert and D. Ellenberger. Validation of automatic passenger counting: introducing the t-test-induced equivalence test. Transportation, 47 0 (6): 0 3031--3045, 2020

  35. [35]

    Thiagarajan and S

    R. Thiagarajan and S. Prakashkumar. Identification of passenger demand in public transport using machine learning. Webology, 18 0 (Special Issue on Information Retrieval and Web Search): 0 223--236, 2021

  36. [36]

    van Oort, D

    N. van Oort, D. Sparing, T. Brands, and R. M. Goverde. Data driven improvements in public transport: the dutch example. Public transport, 7: 0 369--389, 2015

  37. [37]

    W. N. Venables and B. D. Ripley. Random and Mixed Effects, pages 271--300. Springer New York, New York, NY, 2002. ISBN 978-0-387-21706-2. doi:10.1007/978-0-387-21706-2_10

  38. [38]

    P. Wang, X. Chen, J. Chen, M. Hua, and Z. Pu. A two‐stage method for bus passenger load prediction using automatic passenger counting data. IET Intelligent Transport Systems, 15, 02 2021. doi:10.1049/itr2.12018

  39. [39]

    Whelan and J

    G. Whelan and J. Crockett. An investigation of the willingness to pay to reduce rail overcrowding. In Proceedings of the first International Conference on Choice Modelling, Harrogate, England, volume 30. Citeseer, 2009

  40. [40]

    N. H. Wilson, J. Zhao, and A. Rahbee. The potential impact of automated data collection systems on urban public transport planning. In Schedule-based modeling of transportation networks: Theory and applications, pages 1--25. Springer, 2008

  41. [41]

    Yang and B

    F. Yang and B. Hellinga. Estimating transit vehicle delays caused by signalized intersections using archived avl/apc data. Proceedings, Annual Conference - Canadian Society for Civil Engineering, 3: 0 1817--1831, 01 2012

  42. [42]

    Zippenfenig

    P. Zippenfenig. Open-Meteo.com Weather API , 2023. URL https://open-meteo.com/

  43. [43]

    L. Zou, S. Shu, X. Lin, K. Lin, J. Zhu, and L. Li. Passenger flow prediction using smart card data from connected bus system based on interpretable xgboost. Wireless Communications and Mobile Computing, 2022: 0 1--13, 2022