pith. sign in

arxiv: 1907.05162 · v1 · pith:5SWMJMXOnew · submitted 2019-07-11 · 💻 cs.CY · physics.soc-ph

Estimating Traffic Disruption Patterns with Volunteered Geographic Information

Pith reviewed 2026-05-24 23:08 UTC · model grok-4.3

classification 💻 cs.CY physics.soc-ph
keywords traffic disruptionOpenStreetMapvolunteered geographic informationlinear regressionland usetraffic volumepredictive modelingOxfordshire
0
0 comments X

The pith

Static OpenStreetMap features explain more than half the variation in traffic volume and disruptions across sampled road points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether traffic disruptions and volumes can be estimated from static geographic features drawn from OpenStreetMap alone. Linear regression models are fitted to counts of disruptions and traffic volume at 6,500 points spread across 112 regions in Oxfordshire. The models show that these fixed map attributes account for over half the observed variation without any dynamic inputs. Cross-validation and feature selection confirm that detailed point-of-interest categories outperform the broad land-use groupings commonly used in transport studies. The work therefore asks how much traffic behavior is already encoded in the static layout of streets and nearby destinations.

Core claim

Linear regressions that treat OpenStreetMap features as predictors can explain more than half the variation in traffic disruption counts and traffic volume at 6,500 sampled points within 112 Oxfordshire regions; models built on granular point-of-interest data outperform those built on aggregate land-use categories.

What carries the argument

Linear regression with recursive feature elimination, using static OpenStreetMap attributes as predictors for traffic disruption and volume counts.

If this is right

  • Traffic volume and disruption counts can be estimated at many locations without installing sensors or purchasing proprietary data.
  • Granular point-of-interest records improve prediction accuracy over the aggregate land-use categories standard in transport planning.
  • Cross-validated models demonstrate that static features alone carry substantial predictive power for network-level traffic patterns.
  • Recursive feature elimination ranks the relative contribution of different land-use and road attributes to observed traffic outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same static-feature approach could be tested in cities outside the UK to check how far the relationships travel.
  • Adding time-of-day or weather variables on top of the OSM baseline would show how much extra variation remains to be explained.
  • If the relationships generalize, planners could generate rough traffic estimates for entire networks from openly available map data.
  • The method might extend to other volunteered geographic datasets beyond OpenStreetMap if similar point and line attributes are available.

Load-bearing premise

The linear relationships observed between static map features and traffic counts hold without large omitted effects from time of day, weather, or unmeasured road capacity differences.

What would settle it

Collecting traffic counts and OSM features in a fresh set of regions and finding that the R-squared falls below 0.5 after the same regression procedure would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.05162 by Chico Q. Camargo, Graham McNeill, Jonathan Bright, Scott A. Hale, Sridhar Raman.

Figure 1
Figure 1. Figure 1: As shown in the top panels (a), we first produce kernel density estimates (KDE) of every OSM category and meta-category. We then estimate the number of traffic disruptions at a given latitude and longitude using the KDEs of either the OSM meta-categories or of the OSM categories at each point. To produce the KDEs, we made use of a Gaussian kernel searched over a range of bandwidth parameters before adoptin… view at source ↗
Figure 1
Figure 1. Figure 1: Schematic pipeline of the linear model for the two sets of linear models in this study. [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Clustermap showing the Pearson correlation of the distribution of different OSM [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
read the original abstract

Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use OSM features as predictors for linear regressions of counts of traffic disruptions and traffic volume at 6,500 points in the road network within 112 regions of Oxfordshire, UK. We show that more than half the variation in traffic volume and disruptions can be explained with static features alone, and use cross-validation and recursive feature elimination to evaluate the predictive power and importance of different land use categories. Finally, we show that using OSM's granular point of interest data allows for better predictions than the aggregate categories typically used in studies of transportation and land use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that static features extracted from OpenStreetMap can explain more than half the observed spatial variation in traffic volume and disruption counts across 6,500 sampled points in 112 Oxfordshire regions. Linear regressions are fitted using these features as predictors, with cross-validation and recursive feature elimination employed to evaluate predictive performance and the relative importance of land-use and POI categories; granular POI data is shown to outperform aggregate categories.

Significance. If the reported explanatory power holds under fuller validation, the work demonstrates a practical, low-cost approach to traffic estimation using volunteered geographic information where direct counts are unavailable. The explicit use of cross-validation and recursive feature elimination to quantify predictive utility (rather than in-sample fit alone) is a methodological strength that supports the central claim within the studied dataset.

major comments (3)
  1. [Abstract and Results] Abstract and Results: the claim of explanatory power 'above 50%' is not accompanied by the actual OLS or cross-validated R² values, regression coefficients, standard errors, or RMSE; without these quantities it is impossible to judge the magnitude or stability of the reported relationships.
  2. [Methods and Results] Methods and Results: no diagnostics are reported for multicollinearity among the OSM predictors, spatial autocorrelation in residuals, or omitted-variable bias from time-of-day or weather effects; these checks are load-bearing for interpreting the linear-regression R² in a spatial setting.
  3. [Validation procedure] Validation details: the cross-validation procedure is described at a high level but lacks specification of fold structure (e.g., whether regions are held out), the precise performance metric used for recursive feature elimination, and any spatial cross-validation safeguards against leakage.
minor comments (1)
  1. [Data and Methods] Notation for the 112 regions and 6,500 points should be defined consistently when first introduced; a small table summarizing the feature set and its aggregation level would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important areas for improving the transparency and robustness of our analysis. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and additional reporting.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: the claim of explanatory power 'above 50%' is not accompanied by the actual OLS or cross-validated R² values, regression coefficients, standard errors, or RMSE; without these quantities it is impossible to judge the magnitude or stability of the reported relationships.

    Authors: We agree that the specific quantitative results are necessary to allow readers to evaluate the strength and stability of the reported relationships. In the revised manuscript we will add tables in the Results section that report the OLS R², cross-validated R², selected regression coefficients with standard errors, and RMSE for the primary models. These additions will directly support the claim that more than half the variation is explained. revision: yes

  2. Referee: [Methods and Results] Methods and Results: no diagnostics are reported for multicollinearity among the OSM predictors, spatial autocorrelation in residuals, or omitted-variable bias from time-of-day or weather effects; these checks are load-bearing for interpreting the linear-regression R² in a spatial setting.

    Authors: We acknowledge that these diagnostics are important for interpreting R² in a spatial context. We will add variance inflation factor (VIF) calculations to assess multicollinearity among the OSM predictors and report Moran's I statistics on the model residuals to evaluate spatial autocorrelation. For omitted-variable bias, we will add an explicit discussion noting that the models are intentionally limited to static OSM land-use and POI features; time-of-day and weather effects lie outside the volunteered geographic information scope of the study and would require external datasets. The cross-validation results still demonstrate predictive utility within the available static features. revision: yes

  3. Referee: [Validation procedure] Validation details: the cross-validation procedure is described at a high level but lacks specification of fold structure (e.g., whether regions are held out), the precise performance metric used for recursive feature elimination, and any spatial cross-validation safeguards against leakage.

    Authors: We will revise the Methods section to provide complete specification of the cross-validation procedure. This will include the number of folds, whether entire regions are held out as a spatial safeguard against leakage, the exact performance metric (R² or mean squared error) used for recursive feature elimination, and any additional steps taken to mitigate spatial dependence during validation. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper reports an empirical OLS regression (with cross-validation and recursive feature elimination) of observed traffic volume and disruption counts on static OSM land-use and point-of-interest features across 6,500 sampled points. The central R² claim is the fraction of variance captured by these external predictors within the dataset; it is not obtained by fitting a parameter to the target variable itself or by any self-citation chain that reduces the result to the inputs by construction. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the validity of linear regression assumptions applied to count data and the representativeness of the sampled Oxfordshire points; coefficients are fitted parameters.

free parameters (1)
  • regression coefficients for each OSM feature
    Fitted via ordinary least squares to the observed traffic counts and disruptions.
axioms (1)
  • domain assumption Linear relationship between static OSM features and traffic metrics holds sufficiently for explanatory purposes
    Invoked by the choice of linear regression as the modeling technique.

pith-pipeline@v0.9.0 · 5686 in / 1157 out tokens · 19732 ms · 2026-05-24T23:08:19.968436+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

  1. [1]

    Department for Transport, Transport Statistics Great Britain 2016, https://bit.ly/2tsCsvq

  2. [2]

    E. I. Vlahogianni, M. G. Karlaftis, J. C. Golias, Short-term traffic forecasting: Where we are and where were going. Transportation Research Part C: Emerging Technologies 43, 3–19 (2014)

  3. [3]

    McNeill, J

    G. McNeill, J. Bright, S. A. Hale, Estimating local commuting patterns from geolocated twitter data. EPJ Data Science 6, 24 (2017)

  4. [4]

    Wegener, F

    M. Wegener, F. F ¨urst, Land-use transport interaction: State of the art, http://dx.doi.org/10.2139/ssrn.1434678 (2004). 12

  5. [5]

    Lenormand, M

    M. Lenormand, M. Picornell, O. G. Cant ´u-Ros, T. Louail, R. Herranz, M. Barthelemy, E. Fr´ıas-Mart´ınez, M. S. Miguel, J. J. Ramasco, Comparing and modelling land use orga- nization in cities. Royal Society Open Science 2, 150449 (2015)

  6. [6]

    Louail, M

    T. Louail, M. Lenormand, M. Picornell, O. G. Cant ´u, R. Herranz, E. Frias-Martinez, J. J. Ramasco, M. Barthelemy, Uncovering the spatial structure of mobility networks. Nature Communications 6 (2015)

  7. [7]

    Y . Liu, F. Wang, Y . Xiao, S. Gao, Urban land uses and traffic source-sink areas: Evidence from gps-enabled taxi data in shanghai.Landscape and Urban Planning 106, 73–87 (2012)

  8. [8]

    Haklay, How good is volunteered geographical information? a comparative study of openstreetmap and ordnance survey datasets

    M. Haklay, How good is volunteered geographical information? a comparative study of openstreetmap and ordnance survey datasets. Environment and planning B: Planning and design 37, 682–703 (2010)

  9. [9]

    Girres, G

    J.-F. Girres, G. Touya, Quality assessment of the french openstreetmap dataset. Transac- tions in GIS 14, 435–459 (2010)

  10. [10]

    Zielstra, A

    D. Zielstra, A. Zipf, 13th AGILE international conference on geographic information sci- ence (2010), vol. 2010

  11. [11]

    Helbich, C

    M. Helbich, C. Amelunxen, P. Neis, A. Zipf, Comparative spatial analysis of positional accuracy of openstreetmap and proprietary geodata. Proceedings of GI F orumpp. 24–33 (2012)

  12. [12]

    Mashhadi, G

    A. Mashhadi, G. Quattrone, L. Capra, OpenStreetMap in GIScience (Springer, 2015), pp. 125–141

  13. [13]

    J. J. Arsanjani, P. Mooney, A. Zipf, A. Schauss, OpenStreetMap in GIScience (Springer, 2015), pp. 37–58. 13

  14. [14]

    Senaratne, A

    H. Senaratne, A. Mobasheri, A. L. Ali, C. Capineri, M. Haklay, A review of volunteered ge- ographic information quality assessment methods. International Journal of Geographical Information Science 31, 139–167 (2017)

  15. [15]

    Bright, S

    J. Bright, S. De Sabbata, S. Lee, Geodemographic biases in crowdsourced knowledge web- sites: Do neighbours fill in the blanks? GeoJournal 83, 427–440 (2018)

  16. [16]

    Bright, S

    J. Bright, S. De Sabbata, S. Lee, B. Ganesh, D. K. Humphreys, Openstreetmap data for alcohol research: Reliability assessment and quality indicators. Health & place 50, 130– 136 (2018)

  17. [17]

    C. Q. Camargo, J. Bright, S. A. Hale, Diagnosing the performance of human mobility models at small spatial scales using volunteered geographic information. arXiv preprint arXiv:1905.07964 (2019)

  18. [18]

    H. Choi, H. Varian, Predicting the present with google trends. Economic Record 88, 2–9 (2012)

  19. [19]

    L. Wu, E. Brynjolfsson, Economic analysis of the digital economy (University of Chicago Press, 2015), pp. 89–118

  20. [20]

    A. Y . Lin, J. Cranshaw, S. Counts, Proceedings of the 2019 World Wide Web Conference (WWW19), May (2019), pp. 13–17

  21. [21]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011). 14

  22. [22]

    Meinshausen, P

    N. Meinshausen, P. B ¨uhlmann, Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 417–473 (2010)

  23. [23]

    OpenStreetMap contributors, Openstreetmap mapnik and cartocss update, https://github.com/gravitystorm/openstreetmap-carto/blob/master/CHANGELOG.md (2017)

  24. [24]

    Srinivasan, R

    S. Srinivasan, R. Provost, R. Steiner, Modeling the land-use correlates of vehicle-trip lengths for assessing the transportation impacts of land developments.Journal of Transport and Land Use (2013)

  25. [25]

    B. Sana, J. Castiglione, D. Cooper, D. Tischler, Using Googles Aggregated and Anonymized Trip Data to Support Freeway Corridor Management Planning in San Fran- cisco, California. Transportation Research Record: Journal of the Transportation Research Board 2643, 65–73 (2017)

  26. [26]

    V . L. Knoop, P. B. C. van Erp, L. Leclercq, S. P. Hoogendoorn, 2018 21st International Conference on Intelligent Transportation Systems (ITSC) (2018), pp. 3832–3839

  27. [27]

    EDINA Digimap Ordnance Survey Service, OS MasterMap Topography Layer [Shape geospatial data], Scale 1, Tile: Oxfordshire, Ordnance Survey, Using: EDINA Digimap Ordnance Survey Service, https://digimap.edina.ac.uk/ (Downloaded in June 2018)

  28. [28]

    McKinney, et al

    W. McKinney, et al. , Proceedings of the 9th Python in Science Conference (Austin, TX, 2010), vol. 445, pp. 51–56. 15 Acknowledgements: The authors thank Llewelyn Morgan for facilitating access to data and supporting the project. A previous version of this paper was presented at CARMA 2018: 2nd International Conference on Advanced Research Methods and Ana...