Recognition: unknown
Fusing Cellular Network Data and Tollbooth Counts for Urban Traffic Flow Estimation
Pith reviewed 2026-05-10 09:02 UTC · model grok-4.3
The pith
Machine learning corrects aggregated cellular mobility data using sparse tollbooth counts to produce vehicle-specific origin-destination matrices for urban traffic planning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework infers destinations from transit routes and implements routing logic to distribute corrected flows between OD pairs. This approach is applied to a bus depot expansion in Trondheim, Norway, generating hourly OD matrices by vehicle length category. The results show how limited but accurate sensor measurements can correct extensive but aggregated mobility data to produce grounded estimates of background vehicular traffic flows. These macro-scale estimates can be refined for micro-scale analysis at desired locations.
What carries the argument
Machine learning model using temporal and spatial features to map aggregated cellular mobility data onto vehicle-category counts from tollbooths, followed by destination inference from transit routes and routing logic to allocate flows to specific origin-destination pairs.
If this is right
- Hourly origin-destination matrices broken down by vehicle length category become available for running traffic simulations of infrastructure projects.
- Background vehicular flows can be estimated across a city even in locations lacking direct sensors.
- The outputs support evaluation of interventions such as depot expansions by providing category-specific flow inputs.
- Macro-level estimates can be zoomed in for detailed analysis at chosen sites.
- The overall method offers a repeatable way to create origin-destination data from cellular sources in settings with limited ground truth.
Where Pith is reading between the lines
- The same correction step could be applied to other cities that already collect cellular aggregates and occasional toll counts, testing transferability of the learned mapping.
- Adding public transport schedule data might further separate modes within the cellular signals for even finer disaggregation.
- The matrices could serve as a low-cost baseline to decide where to place additional sensors for ongoing validation.
- Streaming versions of the cellular input might allow periodic updates to the matrices for dynamic planning use.
Load-bearing premise
That temporal and spatial features alone let the model learn the full relationship between cellular aggregates and real vehicle counts without leaving systematic biases that would distort the resulting matrices.
What would settle it
Direct comparison of the generated hourly OD matrices against independent vehicle counts or camera observations collected at multiple non-tollbooth road segments would reveal whether the estimates align with actual flows.
Figures
read the original abstract
Traffic simulations, essential for planning urban transit infrastructure interventions, require vehicle-category-specific origin-destination (OD) data. Existing data sources are imperfect: sparse tollbooth sensors provide accurate vehicle counts by category, while extensive mobility data from cellular network activity captures aggregated crowd movement, but lack modal disaggregation and have systematic biases. This study develops a machine learning framework to correct and disaggregate cellular network data using sparse tollbooth counts as ground truth. The model uses temporal and spatial features to learn the complex relationship between aggregated mobility data and vehicular data. The framework infers destinations from transit routes and implements routing logic to distribute corrected flows between OD pairs. This approach is applied to a bus depot expansion in Trondheim, Norway, generating hourly OD matrices by vehicle length category. The results show how limited but accurate sensor measurements can correct extensive but aggregated mobility data to produce grounded estimates of background vehicular traffic flows. These macro-scale estimates can be refined for micro-scale analysis at desired locations. The framework provides a generalisable approach for generating origin-destination data from cellular network data. This enables downstream tasks, like detailed traffic simulations for infrastructure planning in data-scarce contexts, supporting urban planners in making informed decisions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a machine learning framework to fuse sparse but accurate tollbooth vehicle counts with extensive but aggregated cellular network mobility data. Using temporal and spatial features, the model corrects and disaggregates the cellular data to produce vehicle-category-specific origin-destination (OD) matrices. The framework includes logic to infer destinations from transit routes and distribute flows via routing. It is applied to a case study of bus depot expansion in Trondheim, Norway, to generate hourly OD matrices by vehicle length category for background traffic estimation.
Significance. If the quantitative performance is strong, this work could provide a practical, generalizable method for creating detailed vehicular OD data in urban areas with limited sensor coverage. By leveraging limited ground-truth toll data to calibrate broader mobility datasets, it addresses a key data gap for traffic simulations used in infrastructure planning. The approach's applicability to data-scarce contexts is a notable strength.
major comments (3)
- [Abstract and §3 (Model Description)] Abstract and §3 (Model Description): The manuscript provides no details on the machine learning model architecture, specific temporal and spatial features used, loss function, or training/validation procedure. Without these, it is impossible to assess whether the model can adequately capture biases in cellular data (e.g., demographic sampling, modal misattribution) using only the chosen features.
- [§5 (Results)] §5 (Results): No quantitative results, error metrics (such as MAE, RMSE for counts or OD flows), validation against held-out tollbooth data, or comparisons to baselines are reported. The claim that the framework produces 'grounded estimates' lacks empirical support, which is load-bearing for the central contribution.
- [§4 (Framework)] §4 (Framework): The routing logic and destination inference from transit routes are described at a high level, but no pseudocode, algorithmic details, or sensitivity analysis to routing assumptions are provided. This could introduce unquantified errors in the OD matrix generation.
minor comments (2)
- [Abstract] The abstract mentions 'the results show' but does not preview any specific findings or metrics, which is atypical for a methods paper.
- [Notation] Clarify the exact definition of 'aggregated mobility data' versus 'vehicular data' early in the paper to avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and will make revisions to enhance the technical completeness, empirical support, and transparency of the work.
read point-by-point responses
-
Referee: [Abstract and §3 (Model Description)] The manuscript provides no details on the machine learning model architecture, specific temporal and spatial features used, loss function, or training/validation procedure. Without these, it is impossible to assess whether the model can adequately capture biases in cellular data (e.g., demographic sampling, modal misattribution) using only the chosen features.
Authors: We agree that the model description in §3 is currently high-level and insufficient for full reproducibility or bias assessment. In the revised manuscript, we will expand §3 (and update the abstract if needed) to specify the model architecture, the exact temporal features (e.g., hour-of-day, weekday/weekend indicators) and spatial features (e.g., zone-level aggregates, distance to tollbooths), the loss function, and the training/validation procedure including data partitioning and any regularization techniques. This will clarify how the model uses tollbooth ground truth to correct cellular biases. revision: yes
-
Referee: [§5 (Results)] No quantitative results, error metrics (such as MAE, RMSE for counts or OD flows), validation against held-out tollbooth data, or comparisons to baselines are reported. The claim that the framework produces 'grounded estimates' lacks empirical support, which is load-bearing for the central contribution.
Authors: We acknowledge that the current §5 focuses on the Trondheim case study application without quantitative metrics or validation. This is a substantive gap. In the revision, we will add error metrics (MAE, RMSE) on held-out tollbooth counts and OD flows, describe the validation split, and include baseline comparisons (e.g., uncorrected cellular data or naive disaggregation). These additions will provide empirical grounding for the estimates. revision: yes
-
Referee: [§4 (Framework)] The routing logic and destination inference from transit routes are described at a high level, but no pseudocode, algorithmic details, or sensitivity analysis to routing assumptions are provided. This could introduce unquantified errors in the OD matrix generation.
Authors: We agree that the framework in §4 would benefit from greater specificity. We will add pseudocode and step-by-step algorithmic details for destination inference and flow distribution, along with a sensitivity analysis varying key routing assumptions (e.g., route choice parameters or transit route interpretations). This will help quantify and bound potential errors in the generated OD matrices. revision: yes
Circularity Check
No significant circularity; external tollbooth counts serve as independent ground truth for ML training
full rationale
The paper trains an ML model on temporal and spatial features to correct and disaggregate cellular mobility data, explicitly using sparse tollbooth counts as external ground truth for vehicle-category-specific flows. It then applies separate routing logic derived from transit routes to produce OD matrices. No load-bearing step reduces by construction to a fitted input renamed as prediction, a self-definition, or a self-citation chain; the derivation remains self-contained against the distinct data sources and does not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
free parameters (1)
- Machine learning model parameters
axioms (2)
- domain assumption Sparse tollbooth sensors provide accurate vehicle counts by category that can serve as ground truth
- domain assumption Cellular network activity captures aggregated crowd movement but contains systematic biases and lacks modal disaggregation
Reference graph
Works this paper leans on
-
[1]
Cars, planes, trains: Where do CO 2 emissions from transport come from?
H. Ritchie and M. Roser, “Cars, planes, trains: Where do CO 2 emissions from transport come from?” Our World in Data , Oct. 2020. [Online]. Available: https://ourworldindata.org/co2-emissions-from-transport
2020
-
[2]
Sustainable transport modes, travel satisfaction, and emotions: Evidence from car- dependent compact cities,
K. Mouratidis, J. De V os, A. Yiannakou, and I. Politis, “Sustainable transport modes, travel satisfaction, and emotions: Evidence from car- dependent compact cities,” Travel Behaviour and Society , vol. 33, p. 100613, Oct. 2023
2023
-
[3]
About Trafikkdata,
Statens vegvesen, “About Trafikkdata,” Jan. 2023. [Online]. Available: https://trafikkdata.atlas.vegvesen.no/om-trafikkdata
2023
-
[4]
Crowd Insights Methodology - Telia,
Telia, “Crowd Insights Methodology - Telia,” Oct. 2021. [Online]. Available: https://coda.io/@data-insights/telia-webinars-and-training/ crowd-insights-methodology-training-27
2021
-
[5]
Advancing Dynamic Origin-Destination Matrices Estimation Models Using Crowd-Sourced Flexibility Data,
M. Castiglione, G. Cantelmo, M. Nigro, and E. Cipriani, “Advancing Dynamic Origin-Destination Matrices Estimation Models Using Crowd-Sourced Flexibility Data,” in 12th Triennial Symposium on Transportation Analysis , Okinawa, Japan, Jun. 2025. [Online]. Available: https://tristan2025.org/proceedings/TRISTAN2025 ExtendedAbstract 207.pdf
2025
-
[6]
The CMS experiment at the CERN LHC
E. Fernandes, “Estimating Origin-Destination Matrices in Helsinki’s Public Transport through Multi-Source Data Fusion,” Master’s thesis, Aalto University, Sep. 2025. [Online]. Available: https: //aaltodoc.aalto.fi/handle/123456789/140249
-
[7]
Using Multiple Biased Data Sets to Recover Missing Trips with a Behaviorally Informed Model,
X. Guan, S. Huang, and C. Chen, “Using Multiple Biased Data Sets to Recover Missing Trips with a Behaviorally Informed Model,” Trans- portation Science, vol. 59, no. 4, pp. 743–762, Jul. 2025
2025
-
[8]
Origin–destination prediction via knowledge-enhanced hybrid learn- ing,
Z. Xing, E. Chung, Y . Wang, A. Toriumi, T. Oguchi, and Y . Wu, “Origin–destination prediction via knowledge-enhanced hybrid learn- ing,” Computer-Aided Civil and Infrastructure Engineering , vol. 40, no. 17, pp. 2498–2521, 2025
2025
-
[9]
Origin-destination prediction from road average speed data using GraphResLSTM model,
G. Hu and J. Zhang, “Origin-destination prediction from road average speed data using GraphResLSTM model,” PeerJ Computer Science , vol. 11, p. e2709, Feb. 2025
2025
-
[10]
Estimating Erratic Measurement Errors in Network-Wide Traffic Flow via Virtual Balance Sensors,
Z. Zheng, Z. Wang, H. Fu, and W. Ma, “Estimating Erratic Measurement Errors in Network-Wide Traffic Flow via Virtual Balance Sensors,” Transportation Science, vol. 59, no. 4, pp. 721–742, Jul. 2025
2025
-
[11]
Traffic Prediction Using LSTM, RF and XGBoost,
K. N. Lam, “Traffic Prediction Using LSTM, RF and XGBoost,” in Proceedings of the 2nd International Conference on Data Analysis and Machine Learning - DAML. SciTePress / INSTICC, 2025, pp. 267–274
2025
-
[12]
CTCam: Enhancing Trans- portation Evaluation through Fusion of Cellular Traffic and Camera- Based Vehicle Flows,
C. Lin, S.-L. Tung, H.-T. Su, and W. H. Hsu, “CTCam: Enhancing Trans- portation Evaluation through Fusion of Cellular Traffic and Camera- Based Vehicle Flows,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management , ser. CIKM ’23. New York, NY , USA: Association for Computing Machinery, Oct. 2023, pp. 5341–5345
2023
-
[13]
XGBoost: A scalable tree boosting system,
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. Kdd ’16. New York, NY , USA: ACM, 2016, pp. 785–794
2016
-
[14]
S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems , ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., Dec. 2017, pp. 4768–4777. [Online]. Available: https://dl.acm.org/doi/10.5555/3295222.3295230
-
[15]
Austroads Extended Vehicle Classification Scheme for Traffic and Transport Surveys,
D. Gaynor, D. Johnston, M. Coleman, C. Chin, and S. Cropley, “Austroads Extended Vehicle Classification Scheme for Traffic and Transport Surveys,” Austroads, Sydney, New South Wales, Publication AP-G104-23, Sep. 2023. [Online]. Available: https: //austroads.gov.au/publications/traffic-management/ap-g104-23
2023
-
[16]
Aimsun Next Traffic Modelling Software,
Aimsun, “Aimsun Next Traffic Modelling Software,” Barcelona, Spain,
-
[17]
Available: https://www.aimsun.com
[Online]. Available: https://www.aimsun.com
-
[18]
How More Buses Could Affect Traffic:: A Digital Twin of Trondheim’s Sandmoen Bus Depot,
S. Tabassum, U. Oluwaleke, B.-A. Raanes, G. Kiss, and F. Lindseth, “How More Buses Could Affect Traffic:: A Digital Twin of Trondheim’s Sandmoen Bus Depot,” Moderne mobilitet og infrastruktur, vol. 3, no. 2, Nov. 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.