Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

Carlo G. Prato; Gustav Markkula; Xian Liu

arxiv: 2606.12500 · v2 · pith:TP5TJCXUnew · submitted 2026-06-10 · 💻 cs.LG · cs.AI

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

Xian Liu , Carlo G. Prato , Gustav Markkula This is my paper

Pith reviewed 2026-06-27 10:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords machine learningtraffic microsimulationcrash frequency predictionsurrogate safety measuresextreme value theorydriver behavior modelsignalized intersectionstime-to-collision

0 comments

The pith

Machine learning models in traffic microsimulation generate simulated conflicts that predict real-world crash frequencies at signalized intersections without site-specific calibration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an ML-based driver behavior model, trained on trajectory data, produces vehicle conflicts in microsimulation whose frequencies can be extrapolated via extreme value theory to match observed crash counts at five Leeds intersections. A standard rule-based model fails to support such predictions. This matters because it points to a way of assessing safety for current or planned road designs using simulation rather than waiting for actual crashes. The work also notes that directly simulating crashes with the ML model does not yet yield useful predictions, highlighting a remaining gap in realism.

Core claim

Traffic microsimulation at five real-world signalised intersections using a state-of-the-art ML-based behaviour model yields simulated conflicts that, when analysed with a two-dimensional time-to-collision metric and extreme value theory, produce crash frequency predictions aligned with real-world data. The same process with a standard rule-based model does not permit meaningful predictions. Directly using ML-generated simulated crashes for prediction also performs poorly, indicating that the ML model reproduces conflicts realistically but not crashes.

What carries the argument

The ML-based driver behaviour model that learns human driving directly from large-scale trajectory datasets and generates trajectories for conflict detection in microsimulation.

If this is right

ML-based models support crash frequency prediction from simulated conflicts at specific locations without requiring calibration to those locations.
Rule-based models require location-specific calibration to generate usable conflict dynamics for safety prediction.
Current ML models can generate realistic conflicts but not yet realistic crashes in simulation.
The approach enables proactive safety evaluation of road infrastructure designs using microsimulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety checks for new road layouts could be run before construction by swapping in the ML model and checking predicted crash rates.
Extending the same ML model to networks with different speed limits or layouts would test whether the no-calibration result holds more broadly.
Combining the conflict-based predictions with other surrogate measures might tighten the match to real crash data.
Future refinements to the ML model that add explicit crash-generation rules could close the remaining gap between simulated and observed crashes.

Load-bearing premise

The ML model produces conflict dynamics at these five intersections that are representative enough of real human driving for extreme value theory to extrapolate accurately to crash frequencies without any location-specific adjustment.

What would settle it

A direct count of how well the extrapolated crash frequencies from ML-model conflicts match the actual recorded crashes at the five Leeds intersections, or at additional intersections without retraining the model.

read the original abstract

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ML driver model in microsim produces conflicts whose EVT fits match real crash counts at five Leeds sites while rule-based does not, but the abstract supplies no numbers or training details.

read the letter

The paper's main result is that conflicts generated by an ML-based driver model in microsimulation, when analyzed with extreme value theory, produce crash frequency estimates that match real-world data at five signalized intersections in Leeds. The rule-based model, by contrast, does not yield usable predictions, apparently because it was not calibrated to those sites.

What stands out is the direct use of observed crash records as the benchmark across multiple real locations. This moves beyond purely simulated validation and tests whether the ML approach can support proactive safety analysis without needing historical crash data or site-specific tuning. The authors also note that simply counting simulated crashes from the ML model does not work well, which is a candid acknowledgment of current limitations in generating full crash events.

The setup is straightforward and the choice of 2D time-to-collision as the surrogate measure is reasonable for the context. It builds on prior work in ML behavior modeling by showing a practical downstream application.

The abstract leaves out key details that would strengthen the claim. There are no reported metrics on how closely the predictions match the observed crashes, no information on the number of simulated conflicts or crashes, and no description of the ML model's training or any cross-validation. This makes it difficult to assess the reliability of the alignment. The assumption that the ML model, trained on large but unspecified trajectory data, produces conflict patterns representative of Leeds drivers without any adjustment is central but untested in the summary provided. If the full paper has more on this, it would help; otherwise it remains a potential weakness.

Overall, the work is aimed at researchers in traffic engineering and road safety who use microsimulation for design evaluation. Someone in that area would find the comparison useful even if the quantitative evidence needs bolstering.

I would recommend sending this to peer review. The empirical grounding in real crash data makes it worth a closer look by referees who can check the methods and stats.

Referee Report

3 major / 1 minor

Summary. The manuscript compares rule-based and ML-based vehicle behavior models in traffic microsimulation at five signalised intersections in Leeds, UK. Simulated trajectories are processed with a 2D Time-to-Collision surrogate to extract conflicts, which are then extrapolated to crash frequencies via Extreme Value Theory. The central claim is that ML-generated conflicts produce crash-frequency predictions aligned with observed real-world crash data, whereas rule-based conflicts do not, and that this holds without location-specific calibration of the ML model. Direct simulation of crashes with the ML model is reported to perform poorly.

Significance. If the quantitative match between ML-derived EVT predictions and observed crash counts is robust and the representativeness of the ML dynamics is independently verified, the work would demonstrate a practical route to improve surrogate-safety crash prediction without site-specific recalibration, which is a recurring bottleneck in microsimulation safety studies.

major comments (3)

[Abstract / Results] Abstract and results section: the claim that ML conflicts 'yielded crash predictions in line with the real-world crash data' is presented without any reported quantitative metrics (predicted vs. observed frequencies, RMSE, R², confidence intervals, or sample sizes for the five intersections). This absence prevents assessment of how close the match actually is and whether it supports the headline conclusion.
[Methods / Discussion] Methods and discussion: the central assumption that an ML model trained on large-scale but unspecified trajectory data produces conflict statistics at the Leeds sites that are sufficiently representative of real driving for EVT tail extrapolation is stated without supporting diagnostics (e.g., comparison of speed distributions, gap-acceptance rates, or conflict severity histograms between simulated and observed trajectories at the study intersections).
[Results] Results: the observation that directly simulated ML crashes perform poorly is noted but not quantified or reconciled with the claim that the same model produces realistic conflicts; this tension bears directly on whether the conflict dynamics are causally grounded or merely coincidentally compatible with the EVT fit.

minor comments (1)

[Abstract] The abstract states the rule-based model 'did not permit meaningful predictions, presumably due to a lack of model calibration'; this presumption should be supported by explicit evidence that the rule-based model was indeed uncalibrated for the Leeds sites.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights opportunities to strengthen the quantitative support for our claims. We respond to each major comment below and will incorporate revisions to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results section: the claim that ML conflicts 'yielded crash predictions in line with the real-world crash data' is presented without any reported quantitative metrics (predicted vs. observed frequencies, RMSE, R², confidence intervals, or sample sizes for the five intersections). This absence prevents assessment of how close the match actually is and whether it supports the headline conclusion.

Authors: We agree that the absence of explicit metrics limits evaluation of the match. In the revised manuscript, we will add a table in the results section reporting predicted crash frequencies from the ML-EVT analysis versus observed counts for each intersection, along with RMSE, R², 95% confidence intervals from the fits, and the number of simulated conflicts per site. This will allow direct assessment of alignment. revision: yes
Referee: [Methods / Discussion] Methods and discussion: the central assumption that an ML model trained on large-scale but unspecified trajectory data produces conflict statistics at the Leeds sites that are sufficiently representative of real driving for EVT tail extrapolation is stated without supporting diagnostics (e.g., comparison of speed distributions, gap-acceptance rates, or conflict severity histograms between simulated and observed trajectories at the study intersections).

Authors: The paper's core contribution is demonstrating generalization without site-specific calibration. To support the representativeness assumption, the revision will include comparisons of speed and headway distributions from simulated versus observed trajectories at the Leeds sites. We note that full observed conflict histograms are not available in our dataset, limiting some requested diagnostics, but the aggregate flow metrics and the resulting EVT-crash match provide supporting evidence. revision: partial
Referee: [Results] Results: the observation that directly simulated ML crashes perform poorly is noted but not quantified or reconciled with the claim that the same model produces realistic conflicts; this tension bears directly on whether the conflict dynamics are causally grounded or merely coincidentally compatible with the EVT fit.

Authors: We will quantify the direct-crash results by reporting the (near-zero or mismatched) predicted frequencies from counting simulated crashes against observed values. The expanded discussion will reconcile this by noting that the ML model captures the body of the conflict distribution sufficiently for EVT tail extrapolation to match real crashes, while the simulated crashes remain too infrequent to serve as direct predictors; this distinction is consistent with current limitations in ML crash generation rather than undermining the conflict-based approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results grounded in external real-world crash data benchmark

full rationale

The paper compares EVT-derived crash frequency predictions from ML-simulated conflicts against independent real-world crash records at the five Leeds sites. No equations, fitted parameters, or derivations are shown that reduce the reported match to quantities defined or calibrated by the authors themselves. The ML behavior model is trained on external large-scale trajectory data, and the rule-based model is a standard reference; the central claim therefore retains independent content from the external benchmark. Any self-citations are not load-bearing for the core comparison.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Because only the abstract is available, the ledger is necessarily incomplete. The central claim rests on the unstated assumption that the chosen 2D TTC threshold and EVT fitting procedure are valid for converting simulated conflicts to crash frequencies, and that the ML model generalizes without site-specific training. No free parameters, axioms, or invented entities are explicitly listed in the abstract.

pith-pipeline@v0.9.1-grok · 5806 in / 1349 out tokens · 20919 ms · 2026-06-27T10:20:44.502281+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

M., & Mannering, F

Ali, Y., Haque, M. M., & Mannering, F. (2023). Assessing traffic conflict/crash relationships with extreme value theory: Recent developments and future directions for connected and autonomous vehicle and highway safety research. Analytic Methods in Accident Research, 39, 100276. Amundsen, F., & Hyden, C. (1977). Proceedings of the First International Work...

work page doi:10.1016/j.aap.2023.107306 2023
[2]

Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation research part A: policy and practice, 44(5), 291-305. Lorion, A. C., & Persaud, B. (2015). Investigation of surrogate measures for safety assessment of urban two-way stop controlled intersections. Canad...

work page doi:10.1139/cjce-2015-0023 2010
[3]

C., Xia, J

https://doi.org/10.1016/j.aap.2018.12.013 Wang, C., Xu, C. C., Xia, J. X., Qian, Z. D., & Lu, L. J. (2018). A combined use of microscopic traffic simulation and extreme value methods for traffic safety evaluation. Transportation Research Part C-Emerging Technologies, 90, 281-291. https://doi.org/10.1016/j.trc.2018.03.011 Ward, J. R., Agamennoni, G., Worra...

work page doi:10.1016/j.aap.2018.12.013 2018
[4]

G., & Sarvi, M

Young, W., Sobhani, A., Lenné, M. G., & Sarvi, M. (2014). Simulation of safety: A review of the state of the art in road safety simulation modelling. Accident Analysis & Prevention, 66, 89-103. Zhang, Q., Gao, Y., Zhang, Y., Guo, Y., Ding, D., Wang, Y., Sun, P., & Zhao, D. (2022). Trajgen: Generating realistic and diverse trajectories with reactive and fe...

work page doi:10.1016/j.aap.2013.09.006 2014

[1] [1]

M., & Mannering, F

Ali, Y., Haque, M. M., & Mannering, F. (2023). Assessing traffic conflict/crash relationships with extreme value theory: Recent developments and future directions for connected and autonomous vehicle and highway safety research. Analytic Methods in Accident Research, 39, 100276. Amundsen, F., & Hyden, C. (1977). Proceedings of the First International Work...

work page doi:10.1016/j.aap.2023.107306 2023

[2] [2]

Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation research part A: policy and practice, 44(5), 291-305. Lorion, A. C., & Persaud, B. (2015). Investigation of surrogate measures for safety assessment of urban two-way stop controlled intersections. Canad...

work page doi:10.1139/cjce-2015-0023 2010

[3] [3]

C., Xia, J

https://doi.org/10.1016/j.aap.2018.12.013 Wang, C., Xu, C. C., Xia, J. X., Qian, Z. D., & Lu, L. J. (2018). A combined use of microscopic traffic simulation and extreme value methods for traffic safety evaluation. Transportation Research Part C-Emerging Technologies, 90, 281-291. https://doi.org/10.1016/j.trc.2018.03.011 Ward, J. R., Agamennoni, G., Worra...

work page doi:10.1016/j.aap.2018.12.013 2018

[4] [4]

G., & Sarvi, M

Young, W., Sobhani, A., Lenné, M. G., & Sarvi, M. (2014). Simulation of safety: A review of the state of the art in road safety simulation modelling. Accident Analysis & Prevention, 66, 89-103. Zhang, Q., Gao, Y., Zhang, Y., Guo, Y., Ding, D., Wang, Y., Sun, P., & Zhao, D. (2022). Trajgen: Generating realistic and diverse trajectories with reactive and fe...

work page doi:10.1016/j.aap.2013.09.006 2014