Estimating Traffic Disruption Patterns with Volunteered Geographic Information
Pith reviewed 2026-05-24 23:08 UTC · model grok-4.3
The pith
Static OpenStreetMap features explain more than half the variation in traffic volume and disruptions across sampled road points.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Linear regressions that treat OpenStreetMap features as predictors can explain more than half the variation in traffic disruption counts and traffic volume at 6,500 sampled points within 112 Oxfordshire regions; models built on granular point-of-interest data outperform those built on aggregate land-use categories.
What carries the argument
Linear regression with recursive feature elimination, using static OpenStreetMap attributes as predictors for traffic disruption and volume counts.
If this is right
- Traffic volume and disruption counts can be estimated at many locations without installing sensors or purchasing proprietary data.
- Granular point-of-interest records improve prediction accuracy over the aggregate land-use categories standard in transport planning.
- Cross-validated models demonstrate that static features alone carry substantial predictive power for network-level traffic patterns.
- Recursive feature elimination ranks the relative contribution of different land-use and road attributes to observed traffic outcomes.
Where Pith is reading between the lines
- The same static-feature approach could be tested in cities outside the UK to check how far the relationships travel.
- Adding time-of-day or weather variables on top of the OSM baseline would show how much extra variation remains to be explained.
- If the relationships generalize, planners could generate rough traffic estimates for entire networks from openly available map data.
- The method might extend to other volunteered geographic datasets beyond OpenStreetMap if similar point and line attributes are available.
Load-bearing premise
The linear relationships observed between static map features and traffic counts hold without large omitted effects from time of day, weather, or unmeasured road capacity differences.
What would settle it
Collecting traffic counts and OSM features in a fresh set of regions and finding that the R-squared falls below 0.5 after the same regression procedure would falsify the central claim.
Figures
read the original abstract
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use OSM features as predictors for linear regressions of counts of traffic disruptions and traffic volume at 6,500 points in the road network within 112 regions of Oxfordshire, UK. We show that more than half the variation in traffic volume and disruptions can be explained with static features alone, and use cross-validation and recursive feature elimination to evaluate the predictive power and importance of different land use categories. Finally, we show that using OSM's granular point of interest data allows for better predictions than the aggregate categories typically used in studies of transportation and land use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that static features extracted from OpenStreetMap can explain more than half the observed spatial variation in traffic volume and disruption counts across 6,500 sampled points in 112 Oxfordshire regions. Linear regressions are fitted using these features as predictors, with cross-validation and recursive feature elimination employed to evaluate predictive performance and the relative importance of land-use and POI categories; granular POI data is shown to outperform aggregate categories.
Significance. If the reported explanatory power holds under fuller validation, the work demonstrates a practical, low-cost approach to traffic estimation using volunteered geographic information where direct counts are unavailable. The explicit use of cross-validation and recursive feature elimination to quantify predictive utility (rather than in-sample fit alone) is a methodological strength that supports the central claim within the studied dataset.
major comments (3)
- [Abstract and Results] Abstract and Results: the claim of explanatory power 'above 50%' is not accompanied by the actual OLS or cross-validated R² values, regression coefficients, standard errors, or RMSE; without these quantities it is impossible to judge the magnitude or stability of the reported relationships.
- [Methods and Results] Methods and Results: no diagnostics are reported for multicollinearity among the OSM predictors, spatial autocorrelation in residuals, or omitted-variable bias from time-of-day or weather effects; these checks are load-bearing for interpreting the linear-regression R² in a spatial setting.
- [Validation procedure] Validation details: the cross-validation procedure is described at a high level but lacks specification of fold structure (e.g., whether regions are held out), the precise performance metric used for recursive feature elimination, and any spatial cross-validation safeguards against leakage.
minor comments (1)
- [Data and Methods] Notation for the 112 regions and 6,500 points should be defined consistently when first introduced; a small table summarizing the feature set and its aggregation level would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which highlight important areas for improving the transparency and robustness of our analysis. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and additional reporting.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: the claim of explanatory power 'above 50%' is not accompanied by the actual OLS or cross-validated R² values, regression coefficients, standard errors, or RMSE; without these quantities it is impossible to judge the magnitude or stability of the reported relationships.
Authors: We agree that the specific quantitative results are necessary to allow readers to evaluate the strength and stability of the reported relationships. In the revised manuscript we will add tables in the Results section that report the OLS R², cross-validated R², selected regression coefficients with standard errors, and RMSE for the primary models. These additions will directly support the claim that more than half the variation is explained. revision: yes
-
Referee: [Methods and Results] Methods and Results: no diagnostics are reported for multicollinearity among the OSM predictors, spatial autocorrelation in residuals, or omitted-variable bias from time-of-day or weather effects; these checks are load-bearing for interpreting the linear-regression R² in a spatial setting.
Authors: We acknowledge that these diagnostics are important for interpreting R² in a spatial context. We will add variance inflation factor (VIF) calculations to assess multicollinearity among the OSM predictors and report Moran's I statistics on the model residuals to evaluate spatial autocorrelation. For omitted-variable bias, we will add an explicit discussion noting that the models are intentionally limited to static OSM land-use and POI features; time-of-day and weather effects lie outside the volunteered geographic information scope of the study and would require external datasets. The cross-validation results still demonstrate predictive utility within the available static features. revision: yes
-
Referee: [Validation procedure] Validation details: the cross-validation procedure is described at a high level but lacks specification of fold structure (e.g., whether regions are held out), the precise performance metric used for recursive feature elimination, and any spatial cross-validation safeguards against leakage.
Authors: We will revise the Methods section to provide complete specification of the cross-validation procedure. This will include the number of folds, whether entire regions are held out as a spatial safeguard against leakage, the exact performance metric (R² or mean squared error) used for recursive feature elimination, and any additional steps taken to mitigate spatial dependence during validation. revision: yes
Circularity Check
No significant circularity
full rationale
The paper reports an empirical OLS regression (with cross-validation and recursive feature elimination) of observed traffic volume and disruption counts on static OSM land-use and point-of-interest features across 6,500 sampled points. The central R² claim is the fraction of variance captured by these external predictors within the dataset; it is not obtained by fitting a parameter to the target variable itself or by any self-citation chain that reduces the result to the inputs by construction. No load-bearing step matches any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- regression coefficients for each OSM feature
axioms (1)
- domain assumption Linear relationship between static OSM features and traffic metrics holds sufficiently for explanatory purposes
Reference graph
Works this paper leans on
-
[1]
Department for Transport, Transport Statistics Great Britain 2016, https://bit.ly/2tsCsvq
work page 2016
-
[2]
E. I. Vlahogianni, M. G. Karlaftis, J. C. Golias, Short-term traffic forecasting: Where we are and where were going. Transportation Research Part C: Emerging Technologies 43, 3–19 (2014)
work page 2014
-
[3]
G. McNeill, J. Bright, S. A. Hale, Estimating local commuting patterns from geolocated twitter data. EPJ Data Science 6, 24 (2017)
work page 2017
-
[4]
M. Wegener, F. F ¨urst, Land-use transport interaction: State of the art, http://dx.doi.org/10.2139/ssrn.1434678 (2004). 12
-
[5]
M. Lenormand, M. Picornell, O. G. Cant ´u-Ros, T. Louail, R. Herranz, M. Barthelemy, E. Fr´ıas-Mart´ınez, M. S. Miguel, J. J. Ramasco, Comparing and modelling land use orga- nization in cities. Royal Society Open Science 2, 150449 (2015)
work page 2015
- [6]
-
[7]
Y . Liu, F. Wang, Y . Xiao, S. Gao, Urban land uses and traffic source-sink areas: Evidence from gps-enabled taxi data in shanghai.Landscape and Urban Planning 106, 73–87 (2012)
work page 2012
-
[8]
M. Haklay, How good is volunteered geographical information? a comparative study of openstreetmap and ordnance survey datasets. Environment and planning B: Planning and design 37, 682–703 (2010)
work page 2010
- [9]
-
[10]
D. Zielstra, A. Zipf, 13th AGILE international conference on geographic information sci- ence (2010), vol. 2010
work page 2010
-
[11]
M. Helbich, C. Amelunxen, P. Neis, A. Zipf, Comparative spatial analysis of positional accuracy of openstreetmap and proprietary geodata. Proceedings of GI F orumpp. 24–33 (2012)
work page 2012
-
[12]
A. Mashhadi, G. Quattrone, L. Capra, OpenStreetMap in GIScience (Springer, 2015), pp. 125–141
work page 2015
-
[13]
J. J. Arsanjani, P. Mooney, A. Zipf, A. Schauss, OpenStreetMap in GIScience (Springer, 2015), pp. 37–58. 13
work page 2015
-
[14]
H. Senaratne, A. Mobasheri, A. L. Ali, C. Capineri, M. Haklay, A review of volunteered ge- ographic information quality assessment methods. International Journal of Geographical Information Science 31, 139–167 (2017)
work page 2017
- [15]
- [16]
-
[17]
C. Q. Camargo, J. Bright, S. A. Hale, Diagnosing the performance of human mobility models at small spatial scales using volunteered geographic information. arXiv preprint arXiv:1905.07964 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[18]
H. Choi, H. Varian, Predicting the present with google trends. Economic Record 88, 2–9 (2012)
work page 2012
-
[19]
L. Wu, E. Brynjolfsson, Economic analysis of the digital economy (University of Chicago Press, 2015), pp. 89–118
work page 2015
-
[20]
A. Y . Lin, J. Cranshaw, S. Counts, Proceedings of the 2019 World Wide Web Conference (WWW19), May (2019), pp. 13–17
work page 2019
-
[21]
F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011). 14
work page 2011
-
[22]
N. Meinshausen, P. B ¨uhlmann, Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 417–473 (2010)
work page 2010
-
[23]
OpenStreetMap contributors, Openstreetmap mapnik and cartocss update, https://github.com/gravitystorm/openstreetmap-carto/blob/master/CHANGELOG.md (2017)
work page 2017
-
[24]
S. Srinivasan, R. Provost, R. Steiner, Modeling the land-use correlates of vehicle-trip lengths for assessing the transportation impacts of land developments.Journal of Transport and Land Use (2013)
work page 2013
-
[25]
B. Sana, J. Castiglione, D. Cooper, D. Tischler, Using Googles Aggregated and Anonymized Trip Data to Support Freeway Corridor Management Planning in San Fran- cisco, California. Transportation Research Record: Journal of the Transportation Research Board 2643, 65–73 (2017)
work page 2017
-
[26]
V . L. Knoop, P. B. C. van Erp, L. Leclercq, S. P. Hoogendoorn, 2018 21st International Conference on Intelligent Transportation Systems (ITSC) (2018), pp. 3832–3839
work page 2018
-
[27]
EDINA Digimap Ordnance Survey Service, OS MasterMap Topography Layer [Shape geospatial data], Scale 1, Tile: Oxfordshire, Ordnance Survey, Using: EDINA Digimap Ordnance Survey Service, https://digimap.edina.ac.uk/ (Downloaded in June 2018)
work page 2018
-
[28]
W. McKinney, et al. , Proceedings of the 9th Python in Science Conference (Austin, TX, 2010), vol. 445, pp. 51–56. 15 Acknowledgements: The authors thank Llewelyn Morgan for facilitating access to data and supporting the project. A previous version of this paper was presented at CARMA 2018: 2nd International Conference on Advanced Research Methods and Ana...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.