Recognition: unknown
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
Pith reviewed 2026-05-09 20:50 UTC · model grok-4.3
The pith
TRACE turns NHTSA crash reports into CARLA simulations that keep the real road layouts from maps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TRACE automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by retrieving site-specific OpenStreetMap data to preserve exact road topology, leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and generating simulation trajectories from semi-structured report data, yielding a benchmark of 52 diverse accident scenarios.
What carries the argument
The TRACE pipeline, which combines OpenStreetMap data retrieval for exact road topology, LLM inference of initial vehicle states, and trajectory generation from report data.
If this is right
- AV developers can test against simulations drawn from actual crash sites rather than generic or invented road layouts.
- The 52 scenarios cover multiple collision types and road topologies, allowing systematic exposure of weaknesses that appear in real incidents.
- An open-source benchmark reduces the need to wait for rare real-world AV failures by recreating known ones in a repeatable simulator.
- The method supports scaling to additional reports to grow the set of testable cases over time.
Where Pith is reading between the lines
- The same reconstruction steps could be applied to crash data from other countries or databases beyond NHTSA.
- If the simulations prove reliable, patterns across the 52 cases might reveal which road features most often contribute to AV errors.
- Testing could extend to edge cases like different weather or lighting by layering those on top of the base reconstructions.
- The approach might help regulators define minimum simulation coverage requirements based on real crash distributions.
Load-bearing premise
The vehicle starting positions and paths inferred by the language models, together with the roads taken from maps, are close enough to the original crashes that AV systems will fail in the simulations for the same reasons they would fail in reality.
What would settle it
Compare the behavior and failure points of an AV system run in the TRACE simulations against detailed records or video of the original real-world crashes; large mismatches in collision timing, location, or sequence would show the reconstructions are not faithful.
Figures
read the original abstract
Validating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents TRACE, an automated pipeline that reconstructs NHTSA crash reports into CARLA simulations. It retrieves site-specific OpenStreetMap data to preserve exact road topologies, uses Large Language Models to infer vehicles' initial states from road geometry and pre-crash maneuvers, and generates simulation trajectories from semi-structured report data. The authors produce and release a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers as a resource for testing AV systems against real-world failures.
Significance. If the reconstructions are shown to be faithful, this would be a useful contribution to AV evaluation by supplying topology-preserving simulations derived from actual crash reports, filling a gap left by purely synthetic or abstract-geometry benchmarks and enabling more realistic exposure to safety-critical events.
major comments (2)
- [Abstract and Section 3 (Pipeline)] Abstract and pipeline description: The central claim that the CARLA simulations are 'high-fidelity' and suitable for exposing real-world AV failure modes rests on LLM-inferred initial states and generated trajectories, yet no quantitative metrics (impact speed, angle, point of contact, or post-crash motion) are reported comparing the simulations to the original NHTSA report values or any ground-truth data.
- [Section 4 (Benchmark)] Benchmark section: The curation of the 52 scenarios is presented without any error analysis, fidelity scores, or expert validation of how well the reconstructed dynamics match the source crashes; this directly affects the claim that the benchmark is 'challenging' for AV testing.
minor comments (2)
- [Section 3] The description of how semi-structured report data is parsed into simulation parameters could include a concrete example or pseudocode for reproducibility.
- [Figures] Figure captions for scenario visualizations should explicitly note which elements (road topology, vehicle positions, trajectories) are derived from OSM versus LLM inference.
Simulated Author's Rebuttal
Thank you for the constructive review of our manuscript on TRACE. We address the major comments point-by-point below, acknowledging where additional validation is needed, and outline the revisions we will make.
read point-by-point responses
-
Referee: Abstract and Section 3 (Pipeline): The central claim that the CARLA simulations are 'high-fidelity' and suitable for exposing real-world AV failure modes rests on LLM-inferred initial states and generated trajectories, yet no quantitative metrics (impact speed, angle, point of contact, or post-crash motion) are reported comparing the simulations to the original NHTSA report values or any ground-truth data.
Authors: We agree that the manuscript would be strengthened by quantitative validation where possible. NHTSA crash reports are primarily narrative and semi-structured, and do not consistently provide precise numerical values for impact speed, angle, point of contact, or post-crash motion across all cases. Our pipeline achieves fidelity primarily through site-specific OSM topology retrieval and LLM inference of initial states and trajectories that are consistent with the described pre-crash maneuvers and road geometry. In the revision, we will update the abstract and Section 3 to clarify these sources of fidelity, add a limitations discussion on the lack of exact dynamic matching, and include qualitative comparisons (e.g., trajectory visualizations aligned to report descriptions) for a representative subset of scenarios where partial details are available. We will also replace the term 'high-fidelity' with 'topology-preserving' in key claims. revision: partial
-
Referee: Benchmark section: The curation of the 52 scenarios is presented without any error analysis, fidelity scores, or expert validation of how well the reconstructed dynamics match the source crashes; this directly affects the claim that the benchmark is 'challenging' for AV testing.
Authors: We acknowledge that the current benchmark section does not include systematic error analysis or fidelity scoring. The 52 scenarios were selected to maximize diversity in collision types, road topologies, and pre-crash maneuvers drawn directly from NHTSA reports. In the revised manuscript, we will expand Section 4 with a new error analysis subsection that discusses sources of potential discrepancy (including LLM inference variability) and provides qualitative fidelity assessments via manual review and trajectory-report alignment for sampled scenarios. This will support the claim that the benchmark is challenging by explicitly linking each scenario to documented real-world failure modes. revision: yes
- Full quantitative metrics (e.g., exact impact speeds and angles) cannot be provided for the complete set of 52 scenarios, as the source NHTSA reports lack consistent numerical ground-truth data for these parameters.
Circularity Check
No circularity: construction pipeline with no derivations or self-referential reductions
full rationale
The manuscript describes an automated pipeline that retrieves OSM topology, uses LLMs to infer initial states from report text and geometry, and generates CARLA trajectories. No equations, fitted parameters, or mathematical derivations are present in the provided text. The central output (the 52-scenario benchmark) is produced by the described steps rather than reduced to its inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises. Lack of quantitative fidelity metrics is an empirical-validation concern, not a circularity issue in any derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Large language models can reliably infer plausible vehicle initial states and pre-crash maneuvers from road geometry descriptions and semi-structured crash report text.
- domain assumption CARLA simulations driven by OSM-derived road topology plus LLM-generated trajectories are sufficiently representative of real crashes for AV safety evaluation.
Reference graph
Works this paper leans on
-
[1]
In2019 International Conference on Robotics and Automation (ICRA)(2019), pp
Abeysirigoonawardena, Y., Shkurti, F., and Dudek, G.Generating adversarial driving scenarios in high-fidelity simulators. In2019 International Conference on Robotics and Automation (ICRA)(2019), pp. 8271–8277
2019
-
[2]
In2018 IEEE Intelligent Vehicles Symposium (IV)(2018), pp
Bagschik, G., Menzel, T., and Maurer, M.Ontology based scene creation for the development of automated vehicles. In2018 IEEE Intelligent Vehicles Symposium (IV)(2018), pp. 1813–1820
2018
-
[3]
Autonomous vehicle collision reports
California Department of Motor Vehicles. Autonomous vehicle collision reports. https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous- vehicles/autonomous-vehicle-collision-reports/, 2026. Accessed: 2026-01-21
2026
-
[4]
CARLA simulator documentation: Release 0.9.15
CARLA Team. CARLA simulator documentation: Release 0.9.15. https://carla.re adthedocs.io/en/0.9.15/, 2024. Accessed: 2026-01-22
2024
-
[5]
CARLA autonomous driving challenge, 2026
CARLA Team. CARLA autonomous driving challenge, 2026. Accessed: 2026-01- 21
2026
-
[6]
CARLA: Open-source simulator for autonomous driving research
CARLA Team. CARLA: Open-source simulator for autonomous driving research. https://carla.org/, 2026. Accessed: 2026-01-21
2026
-
[7]
Ding, W., Xu, M., and Zhao, D.Cmts: Conditional multiple trajectory synthesizer for generating safety-critical driving scenarios, 2019
2019
-
[8]
European road safety observatory, 2026
European Commission. European road safety observatory, 2026. Accessed: 2026- 01-23
2026
-
[9]
Gao, Y., Piccinini, M., Moller, K., Alanwar, A., and Betz, J.From words to collisions: Llm-guided evaluation and adversarial generation of safety-critical driving scenarios, 2025
2025
-
[10]
T., Liu, Y., and Chen, Z.Sovar: Build generalizable scenarios from accident reports for autonomous driving testing
Guo, A., Zhou, Y., Tian, H., Fang, C., Sun, Y., Sun, W., Gao, X., Luu, A. T., Liu, Y., and Chen, Z.Sovar: Build generalizable scenarios from accident reports for autonomous driving testing. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering(Oct. 2024), ACM, p. 268–280
2024
-
[11]
In2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)(2019), pp
Huynh, T., Gambi, A., and Fraser, G.Ac3r: Automatically reconstructing car crashes from police reports. In2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)(2019), pp. 31–34
2019
-
[12]
Electronics 13, 16 (2024)
Jiang, W., Wang, L., Zhang, T., Chen, Y., Dong, J., Bao, W., Zhang, Z., and Fu, Q.Robuste2e: Exploring the robustness of end-to-end autonomous driving. Electronics 13, 16 (2024)
2024
-
[13]
Li, M., Ding, W., Lin, H., Lyu, Y., Y ao, Y., Zhang, Y., and Zhao, D.Crashagent: Crash scenario generation via multi-modal reasoning, 2025
2025
-
[14]
Luo, S., Zhang, Y., Deng, Y., Liang, L., and Zheng, X.Safe: Harnessing llm for scenario-driven ads testing from multimodal crash data, 2025
2025
-
[15]
Fatality Analysis Reporting System (FARS) analytical user’s manual, 1975–2023
National Center for Statistics and Analysis. Fatality Analysis Reporting System (FARS) analytical user’s manual, 1975–2023. Tech. Rep. DOT HS 813 706, National Highway Traffic Safety Administration, 2025. Accessed: 2026-01-21
1975
-
[16]
Crash API - National Highway Traffic Safety Administration
National Highway Traffic Safety Administration. Crash API - National Highway Traffic Safety Administration. https://crashviewer.nhtsa.dot.gov/Cras hAPI, 2026. Accessed: 2026-01-21
2026
-
[17]
Crash investigation sampling system (CISS)
National Highway Traffic Safety Administration. Crash investigation sampling system (CISS). https://www.nhtsa.gov/crash-data-systems/crash- investigation-sampling-system, 2026. Accessed: 2026-01-21
2026
-
[18]
OpenStreetMap
OpenStreetMap contributors. OpenStreetMap. https://www.openstreetmap. org/, 2026. Accessed: 2026-01-22
2026
-
[19]
J., Fidler, S., and Litany, O.Generating useful accident-prone driving scenarios via a learned traffic prior, 2022
Rempe, D., Philion, J., Guibas, L. J., Fidler, S., and Litany, O.Generating useful accident-prone driving scenarios via a learned traffic prior, 2022
2022
-
[20]
In2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)(2017), pp
Rocklage, E., Kraft, H., Karatas, A., and Seewig, J.Automated scenario genera- tion for regression testing of autonomous vehicles. In2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)(2017), pp. 476–483
2017
-
[21]
Tan, S., Ivanovic, B., Weng, X., Pavone, M., and Kraehenbuehl, P.Language conditioned traffic generation, 2023
2023
-
[22]
Osmium Tool: A multipurpose command line tool for working with openstreetmap data
The Osmium Tool Team. Osmium Tool: A multipurpose command line tool for working with openstreetmap data. https://osmcode.org/osmium-tool/, 2026. Accessed: 2026-01-22
2026
-
[23]
von Stein, M., Shriver, D., and Elbaum, S.Deepmaneuver: Adversarial test generation for trajectory manipulation of autonomous vehicles.IEEE Transactions on Software Engineering 49, 10 (2023), 4496–4509
2023
-
[24]
In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)(Los Alamitos, CA, USA, May 2025), IEEE Computer Society, pp
Woodlief, T., Hildebrandt, C., and Elbaum, S.A Differential Testing Frame- work to Identify Critical AV Failures Leveraging Arbitrary Inputs . In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)(Los Alamitos, CA, USA, May 2025), IEEE Computer Society, pp. 360–372
2025
-
[25]
Xu, C., Ding, W., Lyu, W., Liu, Z., Wang, S., He, Y., Hu, H., Zhao, D., and Li, B.Safebench: A benchmarking platform for safety evaluation of autonomous vehicles, 2022
2022
-
[26]
Y ang, Z., Chai, Y., Anguelov, D., Zhou, Y., Sun, P., Erhan, D., Rafferty, S., and Kretzschmar, H.Surfelgan: Synthesizing realistic sensor data for autonomous driving, 2020
2020
-
[27]
Zhang, J., Xu, C., and Li, B.Chatscene: Knowledge-enabled safety-critical sce- nario generation for autonomous vehicles, 2024
2024
-
[28]
Zhang, L., Peng, Z., Li, Q., and Zhou, B.Cat: Closed-loop adversarial training for safe end-to-end driving, 2023
2023
-
[29]
Zhang, X., Zhang, Q., Han, L., Qu, Q., and Chen, X.Accidentsim: Generating physically realistic vehicle collision videos from real-world accident reports, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.