Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models
Pith reviewed 2026-05-18 15:59 UTC · model grok-4.3
The pith
MambaAttention outperforms TabPFN and MambaNet at classifying severe injuries in electric vehicle crashes through attention-based feature reweighting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MambaAttention achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting on the filtered Texas EV crash dataset, whereas TabPFN demonstrated strong generalization across severity levels.
What carries the argument
MambaAttention, which uses attention to reweight tabular features for improved classification of the minority severe-injury class.
If this is right
- Intersection relation, speed limit, and automatic emergency braking emerge as top predictors that safety programs can target.
- Deep tabular architectures can support data-driven interventions to reduce severe outcomes in EV collisions.
- Attention mechanisms in sequence models improve minority-class detection in imbalanced tabular safety data.
Where Pith is reading between the lines
- The same pipeline could be applied to non-EV or multi-state crash datasets to test consistency of the top predictors.
- Embedding these predictions into real-time vehicle or infrastructure systems might allow earlier safety alerts.
- Removing the resampling step and retraining on raw class distributions would reveal how much the reported gains depend on synthetic balancing.
Load-bearing premise
The filtered Texas EV crash records are representative of broader EV crashes and SMOTEENN resampling preserves the original relationships between features and severe-injury labels.
What would settle it
Testing the three models on an independent set of EV crash records from another state or recent year without any resampling and measuring whether MambaAttention still leads on the severe-injury class.
Figures
read the original abstract
This study presents a deep tabular learning framework for predicting crash severity in electric vehicle (EV) collisions using real-world crash data from Texas (2017-2023). After filtering for electric-only vehicles, 23,301 EV-involved crash records were analyzed. Feature importance techniques using XGBoost and Random Forest identified intersection relation, first harmful event, person age, crash speed limit, and day of week as the top predictors, along with advanced safety features like automatic emergency braking. To address class imbalance, Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors (SMOTEENN) resampling was applied. Three state-of-the-art deep tabular models, TabPFN, MambaNet, and MambaAttention, were benchmarked for severity prediction. While TabPFN demonstrated strong generalization, MambaAttention achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting. The findings highlight the potential of deep tabular architectures for improving crash severity prediction and enabling data-driven safety interventions in EV crash contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a deep tabular learning framework for predicting crash severity in electric vehicle collisions using 23,301 filtered Texas crash records (2017-2023). It applies XGBoost and Random Forest for feature importance (highlighting intersection relation, first harmful event, person age, crash speed limit, day of week, and advanced safety features), uses SMOTEENN to address class imbalance, and benchmarks TabPFN, MambaNet, and MambaAttention, claiming superior severe-injury classification performance for MambaAttention due to attention-based feature reweighting.
Significance. If the performance claims are substantiated with quantitative metrics, ablations, and validation details, the work could illustrate the applicability of state-of-the-art tabular deep learning models (including Mamba variants) to imbalanced, safety-critical transportation datasets and support data-driven EV safety interventions.
major comments (3)
- [Abstract] Abstract: the central claim that MambaAttention 'achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting' is unsupported; the abstract (and by extension the manuscript) supplies no numeric metrics, confidence intervals, ablation results, train-test split details, or hyperparameter search information to ground the benchmarking results.
- [Results] Results section: the attribution of MambaAttention superiority specifically to attention-based feature reweighting lacks any ablation (e.g., MambaAttention without the attention component or MambaNet augmented with attention) or internal analysis (attention weights versus tree-based feature importance) that would isolate the mechanism from other differences in state-space modeling, capacity, or optimization.
- [Data and Methods] Data and Methods: the representativeness of the filtered Texas EV-only records and the claim that SMOTEENN does not distort feature-severe injury relationships are load-bearing for generalizability but receive no sensitivity analysis to resampling parameters or external validation.
minor comments (2)
- [Abstract] Abstract: the statement that advanced safety features like automatic emergency braking are among the top predictors should include their specific importance scores or ranking positions from the XGBoost/Random Forest analysis.
- [General] General: the exact architectural differences between MambaNet and MambaAttention (e.g., how attention is integrated) and the precise implementation of TabPFN fine-tuning should be detailed for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the specific revisions planned to strengthen the quantitative support, mechanistic analysis, and robustness checks.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that MambaAttention 'achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting' is unsupported; the abstract (and by extension the manuscript) supplies no numeric metrics, confidence intervals, ablation results, train-test split details, or hyperparameter search information to ground the benchmarking results.
Authors: We agree the abstract should foreground key quantitative results. The Results section already reports model performance via F1-score, precision, and recall on the severe-injury class, together with an 80/20 stratified train-test split and grid-search hyperparameter details. We will revise the abstract to include the concrete metrics (e.g., MambaAttention F1 on severe cases versus baselines) and a concise statement of the validation protocol. revision: yes
-
Referee: [Results] Results section: the attribution of MambaAttention superiority specifically to attention-based feature reweighting lacks any ablation (e.g., MambaAttention without the attention component or MambaNet augmented with attention) or internal analysis (attention weights versus tree-based feature importance) that would isolate the mechanism from other differences in state-space modeling, capacity, or optimization.
Authors: We acknowledge that the current text infers the benefit of attention from model architecture and overall results without isolating experiments. We will add an ablation that removes the attention module from MambaAttention and compares it directly to MambaNet, plus a side-by-side comparison of learned attention weights against the XGBoost/Random-Forest feature importances. These analyses will appear in a new subsection of the revised Results. revision: yes
-
Referee: [Data and Methods] Data and Methods: the representativeness of the filtered Texas EV-only records and the claim that SMOTEENN does not distort feature-severe injury relationships are load-bearing for generalizability but receive no sensitivity analysis to resampling parameters or external validation.
Authors: The 23,301 records constitute the complete filtered Texas EV crash population for 2017-2023; we will state this explicitly and note the geographic scope as a limitation. We will add a sensitivity study varying SMOTEENN sampling ratios and nearest-neighbor counts, reporting effects on both feature distributions and downstream F1 scores. External validation on additional state datasets is not feasible with currently available data and will be listed as future work in the Discussion. revision: partial
Circularity Check
No circularity: empirical benchmarking on external crash records
full rationale
The manuscript is a standard empirical study that filters real Texas EV crash records (2017-2023), applies SMOTEENN resampling, extracts feature importance via independent tree models (XGBoost, Random Forest), and benchmarks three off-the-shelf deep tabular architectures (TabPFN, MambaNet, MambaAttention) on held-out data. Reported performance numbers are direct measurements of model predictions against ground-truth severity labels; no equations, fitted parameters, or self-citations are used to define or derive those numbers. The interpretive claim that MambaAttention superiority stems from attention-based reweighting is an after-the-fact explanation of benchmark deltas rather than a mathematical reduction to the paper's own inputs. The work therefore contains no load-bearing step that collapses to self-definition, fitted-input-as-prediction, or self-citation chains.
Axiom & Free-Parameter Ledger
free parameters (2)
- SMOTEENN resampling parameters
- MambaAttention hyperparameters
axioms (1)
- domain assumption Texas crash records 2017-2023 after electric-vehicle filtering are representative of future EV crashes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MambaAttention achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast productio...
Reference graph
Works this paper leans on
-
[1]
Sound decisions: How synthetic motor sounds improve autonomous vehicle-pedestrian interactions,
D. Moore, R. Currano, and D. Sirkin, “Sound decisions: How synthetic motor sounds improve autonomous vehicle-pedestrian interactions,” in 12th International Conference on Automotive User Interfaces and Inter- active Vehicular Applications, 2020, pp. 94–103
work page 2020
-
[2]
Functional safety design for torque control of a pure electric vehicle,
F. Yi, W. Zhang, and W. Zhou, “Functional safety design for torque control of a pure electric vehicle,” in2021 9th International symposium on next generation electronics (ISNE). IEEE, 2021, pp. 1–4
work page 2021
-
[3]
J. In, J. Ma, and H. Kim, “Development of a new electric vehicle post-crash fire safety test in korea (proposed for the korean new car assessment program),”World Electric Vehicle Journal, vol. 16, no. 2, p. 103, 2025
work page 2025
-
[4]
K. Aziz, F. Chen, I. Khan, S. H. Khahro, A. M. Muhammad, Z. A. Memon, and A. Khattak, “Road traffic crash severity analysis: a bayesian-optimized dynamic ensemble selection guided by instance hardness and region of competence strategy,”IEEE Access, 2024
work page 2024
-
[5]
Y . Yang, K. Wang, Z. Yuan, and D. Liu, “Predicting freeway traffic crash severity using xgboost-bayesian network model with consideration of features interaction,”Journal of advanced transportation, vol. 2022, no. 1, p. 4257865, 2022
work page 2022
-
[6]
A. Rafe, M. A. Arman, and P. A. Singleton, “A comparative study using generalized ordered probit, stacking ensemble, and tabnet: Application to determinants of pedestrian crash severity,”Data Science for Trans- portation, vol. 6, no. 2, p. 13, 2024
work page 2024
-
[7]
Applying tabular deep learning models to estimate crash injury types of young motorcyclists,
S. Somvanshi, A. G. Tusti, R. Chakraborty, and S. Das, “Applying tabular deep learning models to estimate crash injury types of young motorcyclists,”arXiv preprint arXiv:2503.10474, 2025
-
[8]
G. J. Sequeira, E. Elnagdy, G. Danapal, R. Lugner, U. Jumar, and T. Brandmeier, “Investigation of different classification algorithms for predicting occupant injury criterion to decide the required restraint strategy,” in2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 2021, pp. 204–210
work page 2021
-
[9]
S. Das, A. Dutta, M. Jalayer, A. Bibeka, and L. Wu, “Factors influencing the patterns of wrong-way driving crashes on freeway exit ramps and median crossovers: Exploration using ‘eclat’ association rules to promote safety,”International Journal of Transportation Science and Technology, vol. 7, no. 2, pp. 114–123, 2018. [Online]. Available: https://www.sci...
work page 2018
-
[10]
Das,Artificial Intelligence in Highway Safety, 1st ed
S. Das,Artificial Intelligence in Highway Safety, 1st ed. Boca Raton, FL: CRC Press, Taylor & Francis Group, 2022
work page 2022
-
[11]
M. Chakraborty, T. J. Gates, and S. Sinha, “Causal analysis and classification of traffic crash injury severity using machine learning algorithms,”Data science for transportation, vol. 5, no. 2, p. 12, 2023
work page 2023
-
[12]
A. Scarano, M. Sadeghi, F. Mauriello, M. R. Riccardi, K. Aghabayk, and A. Montella, “Cyclist crash severity modeling: A hybrid approach of xgboost-shap and random parameters logit with heterogeneity in means and variances,”Journal of Safety Research, vol. 93, pp. 373–398, 2025
work page 2025
-
[13]
A survey on deep tabular learning,
S. Somvanshi, S. Das, S. A. Javed, G. Antariksa, and A. Hossain, “A survey on deep tabular learning,”arXiv preprint arXiv:2410.12034, 2024
-
[14]
Crash severity analysis of child bicyclists using arm-net and mambanet,
S. Somvanshi, R. Chakraborty, S. Das, and A. K. Dutta, “Crash severity analysis of child bicyclists using arm-net and mambanet,” in2025 IEEE Conference on Artificial Intelligence (CAI), 2025, pp. 821–824
work page 2025
-
[15]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
N. Hollmann, S. M ¨uller, K. Eggensperger, and F. Hutter, “Tabpfn: A transformer that solves small tabular classification problems in a second,”arXiv preprint arXiv:2207.01848, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[16]
Revisiting deep learning models for tabular data,
Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,”Advances in neural information processing systems, vol. 34, pp. 18 932–18 943, 2021
work page 2021
-
[17]
Bayan Bruss and Tom Goldstein , title =
G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved neural networks for tabular data via row attention and contrastive pre-training,”arXiv preprint arXiv:2106.01342, 2021
-
[18]
Mambular: A sequential model for tabular deep learning,
A. F. Thielmann, M. Kumar, C. Weisser, A. Reuter, B. S ¨afken, and S. Samiee, “Mambular: A sequential model for tabular deep learning,” arXiv preprint arXiv:2408.06291, 2024
-
[19]
Mambatab: A plug-and-play model for learning tabular data,
M. A. Ahamed and Q. Cheng, “Mambatab: A plug-and-play model for learning tabular data,” in2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2024, pp. 369–375
work page 2024
-
[20]
S. Somvanshi, M. M. Islam, M. S. Mimi, S. B. B. Polock, G. Chhetri, and S. Das, “From s4 to mamba: A comprehensive survey on structured state space models,”arXiv preprint arXiv:2503.18970, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
G. Husain, D. Nasef, R. Jose, J. Mayer, M. Bekbolatova, T. Devine, and M. Toma, “Smote vs. smoteenn: A study on the performance of resampling algorithms for addressing class imbalance in regression models,”Algorithms, vol. 18, no. 1, p. 37, 2025
work page 2025
-
[22]
Do electric vehicles lead to more severe crashes? a doubly robust-based causal inference approach,
G. Zhai, K. Xie, D. Yang, and H. Yang, “Do electric vehicles lead to more severe crashes? a doubly robust-based causal inference approach,” SSRN Preprint, 2025
work page 2025
-
[23]
A. M. Salang, S. F. Javier, J. Ballarta, and J. E. Taguiam, “Spatio- temporal analysis and severity analysis using machine learning classifiers for electric vehicle crashes data of metro manila, philippines,”Journal of the Eastern Asia Society for Transportation Studies, vol. 15, pp. 3207– 3227, 2024
work page 2024
-
[24]
J. B. Cicchino, “Effectiveness of forward collision warning and au- tonomous emergency braking systems in reducing front-to-rear crash rates,”Accident Analysis & Prevention, vol. 99, pp. 142–152, 2017
work page 2017
-
[25]
A study on real-world effectiveness of model year 2015–2023 advanced driver assis- tance systems,
A. Aukema, K. Berman, T. Gaydos, T. Sienknechtet al., “A study on real-world effectiveness of model year 2015–2023 advanced driver assis- tance systems,” The MITRE Corporation and Partnership for Analytics Research in Traffic Safety, McLean, V A, Tech. Rep., January 2025, technical Report
work page 2015
-
[26]
Texas Department of Transportation, “Crash reports and records,” 2025, accessed July 21, 2025. [Online]. Available: https://www.txdot.gov/ data-maps/crash-reports-records.html
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.