NOVA: Symbolic Regression Discovery of Interpretable Car-Following and Lane-Change Models with Driver Heterogeneity
Pith reviewed 2026-06-27 14:10 UTC · model grok-4.3
The pith
NOVA uses exhaustive symbolic search to recover a two-term acceleration model from 4.7 million driving records that improves on prior regression baselines and transfers across sites.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NOVA's deterministic engine evaluates more than ten thousand candidate algebraic structures on 4,765,788 active driving observations and isolates a compact two-term acceleration model under a forward-shifted rolling-mean target; this structure achieves RMSE 1.376 m/s² (R² 15.57 percent) on the intent-forecasting task, outperforms the best recalibrated symbolic-regression baseline by 0.135 m/s², yields a dominant nonlinear term that survives eight independent experiments, transfers zero-shot between sites with under three percentage-point R² loss, and, when embedded in a multinomial logit, delivers 67.4 percent balanced accuracy on 502 held-out drivers for a three-class lane-change problem.
What carries the argument
NOVA, the Rust-powered deterministic search engine that enumerates and ranks more than 10,000 algebraic structures with minimal behavioral priors under two preprocessing pipelines.
If this is right
- The two-term model can serve as a transparent, low-parameter replacement for black-box predictors inside traffic simulators and intent-forecasting modules.
- Feature operators discovered on one freeway site can be reused on another with negligible accuracy loss, enabling site-agnostic model deployment.
- Embedding the discovered structure inside a multinomial logit yields a 29.8-point accuracy gain on lane-change classification for drivers never seen during training.
- The residual-guided extension connects the selected algebraic form to an established psychophysical theory of collision avoidance.
- A single dominant nonlinear term survives repeated runs and complementary preprocessing pipelines, indicating robustness of the core functional shape.
Where Pith is reading between the lines
- If the two-term form generalizes, traffic-flow models could be rewritten with far fewer parameters while retaining predictive power.
- The zero-shot transfer result suggests that driver heterogeneity may be captured by a small set of site-independent operators rather than site-specific calibration.
- Testing the same search procedure on naturalistic datasets recorded at higher frame rates or on urban arterials would reveal whether the recovered structures remain stable outside freeway conditions.
- Replacing the current deterministic enumeration with a budgeted stochastic search could scale the method to higher-dimensional state spaces that include lateral dynamics.
Load-bearing premise
The assumption that an exhaustive search over ten thousand algebraic expressions with almost no behavioral priors will recover the actual load-bearing structures of human driving rather than dataset-specific artifacts or preprocessing artifacts.
What would settle it
A new driving dataset collected under identical sensor conditions on which the NOVA-discovered two-term model produces higher RMSE than a simple constant-acceleration baseline or than a structure found by an independent exhaustive search on the same data.
Figures
read the original abstract
We present NOVA, an autonomous symbolic regression framework that identifies interpretable car-following and lane-change structures from raw trajectory data with minimal behavioral priors. Applied to 4,765,788 active driving observations from the NGSIM I-80 and US-101 datasets, NOVA's deterministic Rust-powered search engine evaluates over 10,000 candidate algebraic structures and identifies a compact two-term acceleration model under a forward-shifted rolling-mean prediction target. Evaluated under two complementary preprocessing pipelines, NOVA achieves $RMSE = 1.376 m/s^2$ ($R^2 = 15.57\%$) on the intent-forecasting benchmark, outperforming the best recalibrated symbolic-regression baseline (SR-LLM, PNAS~2025) by 0.135 m/s$^2$ in RMSE under an identical evaluation protocol. Across eight independent experiments, a single dominant nonlinear term emerges as a robust backbone of human car-following; a residual-guided extension further links the selected structure to an established psychophysical theory of collision avoidance. The discovered feature operators transfer zero-shot between freeway sites with under 3 pp $R^2$ loss. Extended to lane-change modelling within a multinomial logit framework, NOVA achieves 67.4\% balanced accuracy under strict vehicle-ID holdout on 502 unseen drivers, surpassing existing lane-changing baselines by +29.8 percentage points on a three-class problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NOVA, a symbolic regression framework for discovering interpretable car-following and lane-change models from raw trajectory data using minimal behavioral priors. Applied to over 4.7 million observations from NGSIM I-80 and US-101, it identifies a compact two-term acceleration model with RMSE = 1.376 m/s² (R² = 15.57%) that outperforms a recalibrated symbolic regression baseline by 0.135 m/s², demonstrates zero-shot transfer between sites with less than 3 percentage point R² loss, links the structure to psychophysical theory, and extends to lane-change modeling achieving 67.4% balanced accuracy under strict holdout, surpassing baselines by 29.8 percentage points.
Significance. If the results hold and the models generalize beyond the specific dataset and target construction, this would represent a significant advance in data-driven discovery of interpretable driver models, offering compact structures that capture heterogeneity and connect to established theory. The deterministic search over 10,000 structures and use of a large-scale real-world dataset are notable strengths that support reproducibility and reduce reliance on hand-crafted priors.
major comments (2)
- [§4 (Car-following results)] The reported zero-shot transfer between I-80 and US-101 with under 3 pp R² loss tests stability within the NGSIM collection but does not address external validity, as both sites use the same sensor setup and extraction pipeline. This is load-bearing for the claim that the search recovers general load-bearing structures of human driving rather than NGSIM-specific artifacts.
- [§3.2 (Prediction target)] The forward-shifted rolling-mean target is used for the reported performance; without an ablation study using the raw acceleration signal as target, it is unclear whether the discovered two-term model reflects underlying dynamics or alignment with the smoothed target. This directly impacts the interpretation of the RMSE improvement over baselines.
minor comments (1)
- Clarify in the main text what the 'eight independent experiments' consist of, as mentioned in the abstract, to facilitate replication.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, indicating whether revisions to the manuscript are planned.
read point-by-point responses
-
Referee: [§4 (Car-following results)] The reported zero-shot transfer between I-80 and US-101 with under 3 pp R² loss tests stability within the NGSIM collection but does not address external validity, as both sites use the same sensor setup and extraction pipeline. This is load-bearing for the claim that the search recovers general load-bearing structures of human driving rather than NGSIM-specific artifacts.
Authors: We agree that both sites belong to the NGSIM collection and share the same sensor and extraction pipeline. The two locations nevertheless differ in geometry, traffic composition, and observed driver behaviors, so the zero-shot result still provides evidence that the recovered structures are not artifacts of a single site's local conditions. We have added a dedicated limitations paragraph in Section 5 acknowledging that broader external validation on datasets collected with different sensors or in different regions remains necessary to fully substantiate claims of generality. revision: partial
-
Referee: [§3.2 (Prediction target)] The forward-shifted rolling-mean target is used for the reported performance; without an ablation study using the raw acceleration signal as target, it is unclear whether the discovered two-term model reflects underlying dynamics or alignment with the smoothed target. This directly impacts the interpretation of the RMSE improvement over baselines.
Authors: The forward-shifted rolling-mean target is chosen deliberately to match the intent-forecasting benchmark, which aims to predict the driver's intended acceleration rather than instantaneous noisy measurements. All baselines, including the recalibrated SR-LLM, were evaluated under exactly the same target and preprocessing pipeline, so the reported 0.135 m/s² RMSE advantage remains a fair comparison within the stated task. We stand by this design choice and do not plan to alter the primary results. revision: no
Circularity Check
No significant circularity; derivation is data-driven enumeration without self-referential reduction
full rationale
The paper's core chain consists of exhaustive deterministic enumeration of >10k algebraic structures on raw NGSIM trajectory observations, followed by selection of a compact model and post-hoc residual-guided linkage to psychophysical theory. No equations reduce the discovered acceleration terms or performance metrics to fitted parameters by construction, no self-citations bear load on uniqueness or ansatz choices, and the lane-change extension uses vehicle-ID holdout on unseen drivers. The reported metrics arise from direct evaluation against baselines under stated protocols rather than tautological renaming or input-output equivalence.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Discovering car-following dynamics from trajectory data through deep learning. URL:https://arxiv. org/abs/2408.00251,arXiv:2408.00251. Bando, M., Hasebe, K., Nakayama, A., Shibata, A., Sugiyama, Y.,
-
[2]
URL:https://link.aps.org/doi/10.1103/PhysRevE.51.1035, doi:10. 1103/PhysRevE.51.1035. Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.,
-
[3]
Transportation Research Part B: Methodological 105, 362–377
A critical evaluation of the next generation sim- ulation (ngsim) vehicle trajectory dataset. Transportation Research Part B: Methodological 105, 362–377. URL:https://ideas.repec.org/a/eee/transb/ v105y2017icp362-377.html, doi:10.1016/j.trb.2017.09.018. 28 Denos C. Gazis, R.H., Rothery, R.W.,
-
[4]
Nonlinear follow-the-leader models of traffic flow. Oper. Res. 9, 545–567. URL:https://doi.org/10.1287/opre.9. 4.545, doi:10.1287/opre.9.4.545. Deo, N., Trivedi, M.M.,
-
[5]
Multi-modal trajectory prediction of surrounding ve- hicles with maneuver based lstms, in: 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE Press. p. 1179–1184. URL:https://doi.org/10.1109/IVS.2018. 8500493, doi:10.1109/IVS.2018.8500493. Durbin, J., Watson, G.S.,
-
[6]
doi:10.1016/j.ijtst.2019.05
-
[7]
AI-Newton: A Concept-Driven Phys- ical Law Discovery System without Prior Physical KnowledgearXiv:2504.01538. Farhi, N., Haj-Salem, H., Khoshyaran, M., Lebacque, J.P., Salvarani, F., Schnetzler, B., , de Vuyst, F.,
-
[8]
Sr-llm: An incremental symbolic regression framework driven by llm-based retrieval-augmented generation. Proceedings of the National Academy of Sciences 122, e2516995122. URL:https://www.pnas. org/doi/abs/10.1073/pnas.2516995122, doi:10.1073/pnas.2516995122, arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2516995122. Helbing, D.,
-
[9]
Traffic and related self-driven many-particle systems. Rev. Mod. Phys. 73, 1067–1141. URL:https://link.aps.org/doi/10.1103/RevModPhys. 73.1067, doi:10.1103/RevModPhys.73.1067. Kesting, A., Treiber, M., Helbing, D.,
-
[10]
Transportation Research Record 1999, 86–94
General lane-changing model mobil for car-following models. Transportation Research Record 1999, 86–94. URL:https://doi.org/10.3141/1999-10, doi:10.3141/1999-10, arXiv:https://doi.org/10.3141/1999-10. Koza, J.R.,
-
[11]
Manti, S., Mohammadian, S., Treiber, M., Lucantonio, A.,
URL:https://api.elsevier.com/content/article/PII: S0968090X24004418?httpAccept=text/xml, doi:10.1016/j.trc.2024.104920, arXiv:https://orbilu.uni.lu/handle/10993/63380. Manti, S., Mohammadian, S., Treiber, M., Lucantonio, A.,
-
[12]
Transporta- tion Research Record 2088, 90–101
Estimating acceleration and lane- changing dynamics from next generation simulation trajectory data. Transporta- tion Research Record 2088, 90–101. URL:https://doi.org/10.3141/2088-10, doi:10.3141/2088-10,arXiv:https://doi.org/10.3141/2088-10. Treiber, M., Hennecke, A., Helbing, D.,
-
[13]
Congested traffic states in em- pirical observations and microscopic simulations. Phys. Rev. E 62, 1805–1824. URL:https://link.aps.org/doi/10.1103/PhysRevE.62.1805, doi:10.1103/ PhysRevE.62.1805. Treiber, M., Kesting, A., Thiemann, C.,
work page internal anchor Pith review doi:10.1103/physreve.62.1805
-
[14]
Science Advances 6(16), eaay2631 (2020)
Ai feynman: A physics-inspired method for sym- bolic regression. Science Advances 6, eaay2631. URL:https://www.science. org/doi/abs/10.1126/sciadv.aay2631, doi:10.1126/sciadv.aay2631, arXiv:https://www.science.org/doi/pdf/10.1126/sciadv.aay2631. U.S., F.H.A.,
-
[15]
URL:https://arxiv.org/ abs/2210.10965,arXiv:2210.10965
Idm-follower: A model-informed deep learning method for long-sequence car-following trajectory prediction. URL:https://arxiv.org/ abs/2210.10965,arXiv:2210.10965. 30 Zhang, Y., Talebpour, A.,
-
[16]
Trans- portation Research Record 2678, 812–826
Characterizing human–automated vehi- cle interactions: An investigation into car-following behavior. Trans- portation Research Record 2678, 812–826. URL:https://doi. org/10.1177/03611981231192999, doi:10.1177/03611981231192999, arXiv:https://doi.org/10.1177/03611981231192999. Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.,
-
[17]
URL:https: //www.mdpi.com/1999-4893/17/2/68, doi:10.3390/a17020068. 31
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.