Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation
Pith reviewed 2026-06-27 10:20 UTC · model grok-4.3
The pith
Machine learning models in traffic microsimulation generate simulated conflicts that predict real-world crash frequencies at signalized intersections without site-specific calibration.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Traffic microsimulation at five real-world signalised intersections using a state-of-the-art ML-based behaviour model yields simulated conflicts that, when analysed with a two-dimensional time-to-collision metric and extreme value theory, produce crash frequency predictions aligned with real-world data. The same process with a standard rule-based model does not permit meaningful predictions. Directly using ML-generated simulated crashes for prediction also performs poorly, indicating that the ML model reproduces conflicts realistically but not crashes.
What carries the argument
The ML-based driver behaviour model that learns human driving directly from large-scale trajectory datasets and generates trajectories for conflict detection in microsimulation.
If this is right
- ML-based models support crash frequency prediction from simulated conflicts at specific locations without requiring calibration to those locations.
- Rule-based models require location-specific calibration to generate usable conflict dynamics for safety prediction.
- Current ML models can generate realistic conflicts but not yet realistic crashes in simulation.
- The approach enables proactive safety evaluation of road infrastructure designs using microsimulation.
Where Pith is reading between the lines
- Safety checks for new road layouts could be run before construction by swapping in the ML model and checking predicted crash rates.
- Extending the same ML model to networks with different speed limits or layouts would test whether the no-calibration result holds more broadly.
- Combining the conflict-based predictions with other surrogate measures might tighten the match to real crash data.
- Future refinements to the ML model that add explicit crash-generation rules could close the remaining gap between simulated and observed crashes.
Load-bearing premise
The ML model produces conflict dynamics at these five intersections that are representative enough of real human driving for extreme value theory to extrapolate accurately to crash frequencies without any location-specific adjustment.
What would settle it
A direct count of how well the extrapolated crash frequencies from ML-model conflicts match the actual recorded crashes at the five Leeds intersections, or at additional intersections without retraining the model.
read the original abstract
Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript compares rule-based and ML-based vehicle behavior models in traffic microsimulation at five signalised intersections in Leeds, UK. Simulated trajectories are processed with a 2D Time-to-Collision surrogate to extract conflicts, which are then extrapolated to crash frequencies via Extreme Value Theory. The central claim is that ML-generated conflicts produce crash-frequency predictions aligned with observed real-world crash data, whereas rule-based conflicts do not, and that this holds without location-specific calibration of the ML model. Direct simulation of crashes with the ML model is reported to perform poorly.
Significance. If the quantitative match between ML-derived EVT predictions and observed crash counts is robust and the representativeness of the ML dynamics is independently verified, the work would demonstrate a practical route to improve surrogate-safety crash prediction without site-specific recalibration, which is a recurring bottleneck in microsimulation safety studies.
major comments (3)
- [Abstract / Results] Abstract and results section: the claim that ML conflicts 'yielded crash predictions in line with the real-world crash data' is presented without any reported quantitative metrics (predicted vs. observed frequencies, RMSE, R², confidence intervals, or sample sizes for the five intersections). This absence prevents assessment of how close the match actually is and whether it supports the headline conclusion.
- [Methods / Discussion] Methods and discussion: the central assumption that an ML model trained on large-scale but unspecified trajectory data produces conflict statistics at the Leeds sites that are sufficiently representative of real driving for EVT tail extrapolation is stated without supporting diagnostics (e.g., comparison of speed distributions, gap-acceptance rates, or conflict severity histograms between simulated and observed trajectories at the study intersections).
- [Results] Results: the observation that directly simulated ML crashes perform poorly is noted but not quantified or reconciled with the claim that the same model produces realistic conflicts; this tension bears directly on whether the conflict dynamics are causally grounded or merely coincidentally compatible with the EVT fit.
minor comments (1)
- [Abstract] The abstract states the rule-based model 'did not permit meaningful predictions, presumably due to a lack of model calibration'; this presumption should be supported by explicit evidence that the rule-based model was indeed uncalibrated for the Leeds sites.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights opportunities to strengthen the quantitative support for our claims. We respond to each major comment below and will incorporate revisions to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results section: the claim that ML conflicts 'yielded crash predictions in line with the real-world crash data' is presented without any reported quantitative metrics (predicted vs. observed frequencies, RMSE, R², confidence intervals, or sample sizes for the five intersections). This absence prevents assessment of how close the match actually is and whether it supports the headline conclusion.
Authors: We agree that the absence of explicit metrics limits evaluation of the match. In the revised manuscript, we will add a table in the results section reporting predicted crash frequencies from the ML-EVT analysis versus observed counts for each intersection, along with RMSE, R², 95% confidence intervals from the fits, and the number of simulated conflicts per site. This will allow direct assessment of alignment. revision: yes
-
Referee: [Methods / Discussion] Methods and discussion: the central assumption that an ML model trained on large-scale but unspecified trajectory data produces conflict statistics at the Leeds sites that are sufficiently representative of real driving for EVT tail extrapolation is stated without supporting diagnostics (e.g., comparison of speed distributions, gap-acceptance rates, or conflict severity histograms between simulated and observed trajectories at the study intersections).
Authors: The paper's core contribution is demonstrating generalization without site-specific calibration. To support the representativeness assumption, the revision will include comparisons of speed and headway distributions from simulated versus observed trajectories at the Leeds sites. We note that full observed conflict histograms are not available in our dataset, limiting some requested diagnostics, but the aggregate flow metrics and the resulting EVT-crash match provide supporting evidence. revision: partial
-
Referee: [Results] Results: the observation that directly simulated ML crashes perform poorly is noted but not quantified or reconciled with the claim that the same model produces realistic conflicts; this tension bears directly on whether the conflict dynamics are causally grounded or merely coincidentally compatible with the EVT fit.
Authors: We will quantify the direct-crash results by reporting the (near-zero or mismatched) predicted frequencies from counting simulated crashes against observed values. The expanded discussion will reconcile this by noting that the ML model captures the body of the conflict distribution sufficiently for EVT tail extrapolation to match real crashes, while the simulated crashes remain too infrequent to serve as direct predictors; this distinction is consistent with current limitations in ML crash generation rather than undermining the conflict-based approach. revision: yes
Circularity Check
No significant circularity; results grounded in external real-world crash data benchmark
full rationale
The paper compares EVT-derived crash frequency predictions from ML-simulated conflicts against independent real-world crash records at the five Leeds sites. No equations, fitted parameters, or derivations are shown that reduce the reported match to quantities defined or calibrated by the authors themselves. The ML behavior model is trained on external large-scale trajectory data, and the rule-based model is a standard reference; the central claim therefore retains independent content from the external benchmark. Any self-citations are not load-bearing for the core comparison.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ali, Y., Haque, M. M., & Mannering, F. (2023). Assessing traffic conflict/crash relationships with extreme value theory: Recent developments and future directions for connected and autonomous vehicle and highway safety research. Analytic Methods in Accident Research, 39, 100276. Amundsen, F., & Hyden, C. (1977). Proceedings of the First International Work...
-
[2]
Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation research part A: policy and practice, 44(5), 291-305. Lorion, A. C., & Persaud, B. (2015). Investigation of surrogate measures for safety assessment of urban two-way stop controlled intersections. Canad...
-
[3]
https://doi.org/10.1016/j.aap.2018.12.013 Wang, C., Xu, C. C., Xia, J. X., Qian, Z. D., & Lu, L. J. (2018). A combined use of microscopic traffic simulation and extreme value methods for traffic safety evaluation. Transportation Research Part C-Emerging Technologies, 90, 281-291. https://doi.org/10.1016/j.trc.2018.03.011 Ward, J. R., Agamennoni, G., Worra...
-
[4]
Young, W., Sobhani, A., Lenné, M. G., & Sarvi, M. (2014). Simulation of safety: A review of the state of the art in road safety simulation modelling. Accident Analysis & Prevention, 66, 89-103. Zhang, Q., Gao, Y., Zhang, Y., Guo, Y., Ding, D., Wang, Y., Sun, P., & Zhao, D. (2022). Trajgen: Generating realistic and diverse trajectories with reactive and fe...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.