arxiv: 2604.11394 · v1 · submitted 2026-04-13 · 💻 cs.CR

Recognition: unknown

Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics

Asma Al-Dahmani , Abdulla Bin Safwan , Mohammad Obeidat , Belal Alsinglawi

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:42 UTC · model grok-4.3

classification 💻 cs.CR

keywords IoT intrusion detectionTabPFNfoundation modelssmart citiesTON IoT datasethybrid pipelinethreat screeningensemble classifiers

0 comments

The pith

TabPFNv2.5 screens IoT threats forty times faster than Random Forest at 97 percent binary accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates TabPFNv2.5, a transformer-based foundation model for tabular data, on the TON IoT dataset for intrusion detection. It shows the model infers threats much quicker than ensemble classifiers like Random Forest while holding similar accuracy on binary tasks. The authors propose a hybrid pipeline that uses the fast model for initial screening and slower ensembles for detailed follow-up classification. This approach targets the need for quick responses in smart city IoT security operations. The work also finds that scanning attacks are harder to detect and that cross-device results depend on feature similarity.

Core claim

TabPFNv2.5 achieves 40 times faster inference than Random Forest while maintaining 97 percent binary classification accuracy on the TON IoT dataset. The authors propose a hybrid pipeline in which TabPFNv2.5 performs rapid threat screening while ensemble models handle detailed classification. This combination supports time-sensitive IoT security operations in smart cities.

What carries the argument

TabPFNv2.5, the transformer-based foundation model for tabular data that carries out the initial rapid threat screening before passing flagged cases to ensemble classifiers.

If this is right

Real-time forensic triage becomes feasible in IoT networks because of the sharply reduced inference time.
Detailed classification can be reserved for cases flagged by the fast initial model.
Smart city security operations can triage incidents more efficiently by balancing speed and precision.
Detection performance varies with device feature similarity, requiring careful feature selection for cross-device use.
Scanning attacks need special handling because they show the lowest F1 scores among tested threats.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Placing the fast model on edge hardware could cut latency even further for live monitoring.
The hybrid idea might extend to other high-volume detection tasks such as network anomaly monitoring.
Adapting the foundation model for direct multi-class output could simplify the pipeline and reduce handoff errors.
Validation against traffic from operational smart city networks would test whether the gains survive real variability.

Load-bearing premise

The TON IoT dataset and its feature distributions represent real-world smart-city IoT traffic, so the reported accuracy and speed gains hold under live deployment with unseen devices and attack variants.

What would settle it

A test on live smart-city IoT traffic from new devices showing inference speed gains well below 40 times or binary accuracy below 97 percent would disprove the central performance claims.

Figures

Figures reproduced from arXiv: 2604.11394 by Abdulla Bin Safwan, Asma Al-Dahmani, Belal Alsinglawi, Mohammad Obeidat.

read the original abstract

Security operations in smart cities demand detection systems that balance accuracy with response time. While ensemble methods like Random Forest achieve high accuracy, their computational overhead impedes real-time forensic triage. We present the first systematic evaluation of TabPFNv2.5, a transformer-based foundation model, against traditional ensemble classifiers for IoT intrusion detection. Using the TON IoT dataset, we demonstrate that TabPFNv2.5 achieves 40 faster inference than Random Forest while maintaining 97% binary classification accuracy. We propose a hybrid pipeline in which TabPFNv2.5 performs rapid threat screening, while ensemble models handle detailed classification. Our analysis reveals that scanning attacks remain the hardest to detect (F1: 69.8%) and cross-device generalization depends critically on feature similarity. These findings establish foundation models as viable components for time-sensitive IoT security operations

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TabPFNv2.5 runs faster than Random Forest on TON IoT data for binary screening but the numbers rest on narrow experiments with no generalization checks.

read the letter

The key point is that TabPFNv2.5 can screen IoT threats much faster than Random Forest on the TON IoT dataset while keeping binary accuracy around 97 percent, and the authors suggest pairing it with ensembles for full classification. What the paper does is take an existing tabular foundation model and test it head to head on intrusion detection data. This is new as a direct comparison for this application. They report concrete numbers and flag that scanning attacks are harder to catch, with F1 at 69.8 percent, plus that feature similarity drives cross-device results. The hybrid pipeline is a reasonable practical suggestion for time-sensitive smart city security. The weak parts are the lack of supporting details in the experiments. No error bars appear on the accuracy or speed figures, and there is little on how the train and test splits were done or exactly how the 40 times factor was timed. More importantly, all results stay within the TON IoT dataset. The paper notes that generalization hinges on feature similarity, but without tests on distribution shifts, new attack types, or actual live traffic, it is hard to know if the speed and accuracy will hold up in real deployments with unseen devices. This paper is for engineers and researchers working on IoT security systems who need faster detection methods. Someone looking to implement quick triage in operational settings might find the numbers and pipeline useful to experiment with. It is not breaking new ground in methods or theory. I would send it to peer review. The core comparison is straightforward on public data, so referees can assess the claims and ask for the missing experimental rigor and generalization checks. It is worth the time if the authors can fill in those gaps.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates TabPFNv2.5, a transformer-based tabular foundation model, for IoT intrusion detection on the TON IoT dataset. It claims this is the first systematic comparison against ensemble methods, that TabPFNv2.5 delivers 40x faster inference than Random Forest while retaining 97% binary classification accuracy, proposes a hybrid pipeline using the foundation model for rapid threat screening and ensembles for detailed classification, and reports that scanning attacks are hardest to detect (F1 69.8%) with cross-device generalization depending on feature similarity.

Significance. If the empirical claims are substantiated with proper controls, the work could meaningfully advance real-time IoT forensics in smart cities by demonstrating that foundation models can reduce inference latency without sacrificing accuracy on standard datasets. The hybrid pipeline idea and explicit identification of scanning-attack weaknesses provide concrete directions for practitioners. No machine-checked proofs or parameter-free derivations are present, but the direct empirical focus on a public dataset is a modest strength.

major comments (2)

[Abstract and Results] Abstract and Results: The claims of 40x faster inference and 97% binary accuracy are presented without error bars, train-test split details, hardware/platform specifications for timing, or statistical significance tests. Only Random Forest is used as a baseline; this is load-bearing for the central speed/accuracy comparison and the hybrid-pipeline proposal.
[Discussion/Evaluation] Discussion/Evaluation: The manuscript acknowledges that scanning attacks yield only 69.8% F1 and that cross-device generalization 'depends critically on feature similarity,' yet provides no quantitative distribution-shift experiments, OOD attack variants, or live-traffic traces. This directly undermines the claim that the hybrid pipeline is viable for real smart-city deployments with unseen devices and evolving attacks.

minor comments (1)

[Abstract] Abstract: '40 faster inference' is grammatically imprecise and should be rephrased as '40 times faster inference' or '40× faster inference' for clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our empirical claims and evaluation. We address each major comment below with clarifications and revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results: The claims of 40x faster inference and 97% binary accuracy are presented without error bars, train-test split details, hardware/platform specifications for timing, or statistical significance tests. Only Random Forest is used as a baseline; this is load-bearing for the central speed/accuracy comparison and the hybrid-pipeline proposal.

Authors: We agree these supporting details are necessary to substantiate the claims. In the revised manuscript we add error bars from repeated runs with varied seeds, specify the train-test split used on TON IoT, report the exact hardware and software platform for all CPU-based timing measurements, and include statistical significance tests (paired t-tests) confirming the speed difference. Although Random Forest is the highlighted baseline for timing due to its prevalence in IoT IDS work, our experiments also covered XGBoost and other ensembles; we now present a full comparison table to better ground the hybrid-pipeline proposal. revision: yes
Referee: [Discussion/Evaluation] Discussion/Evaluation: The manuscript acknowledges that scanning attacks yield only 69.8% F1 and that cross-device generalization 'depends critically on feature similarity,' yet provides no quantitative distribution-shift experiments, OOD attack variants, or live-traffic traces. This directly undermines the claim that the hybrid pipeline is viable for real smart-city deployments with unseen devices and evolving attacks.

Authors: We recognize the need for stronger evidence on generalization. Using TON IoT's multi-device structure we expand the analysis with quantitative feature-similarity metrics (e.g., pairwise correlation and overlap statistics) between device subsets and their correlation with observed F1 drops; we also add a per-attack breakdown for scanning attacks. However, the public dataset contains neither live-traffic traces nor additional OOD variants, so those experiments cannot be performed. We have added an explicit limitations section stating that the hybrid pipeline is offered as an efficient screening component within dataset constraints, recommending periodic retraining for evolving threats. revision: partial

standing simulated objections not resolved

Absence of live-traffic traces and OOD attack variants in the TON IoT dataset, preventing quantitative experiments on those specific aspects

Circularity Check

0 steps flagged

No circularity: empirical evaluation on public dataset

full rationale

The paper conducts a direct empirical comparison of TabPFNv2.5 against Random Forest and other ensembles on the TON IoT dataset, reporting inference speed and accuracy metrics without any mathematical derivations, equations, or parameter-fitting steps. No load-bearing claims reduce by construction to self-defined quantities, self-citations, or renamed inputs; the hybrid pipeline proposal follows from observed performance numbers rather than any internal redefinition. This is a standard experimental study whose central results are falsifiable against the public dataset and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical benchmark study with no mathematical derivations, new theoretical entities, or fitted parameters described in the abstract. All claims rest on experimental results whose details are not supplied.

pith-pipeline@v0.9.0 · 5457 in / 1236 out tokens · 28004 ms · 2026-05-10T15:42:55.748920+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 11 canonical work pages · 2 internal anchors

[1]

Sustainable Cities and Society72, 103041 (2021)

Ashraf, J., Keshk, M., Moustafa, N., Abdel-Basset, M., Khurshid, H., Bakhshi, A.D., Mostafa, R.R.: IoTBoT-IDS: A novel statistical learning-enabled botnet de- tection framework for protecting networks of smart cities. Sustainable Cities and Society72, 103041 (2021). doi:10.1016/j.scs.2021.103041

work page doi:10.1016/j.scs.2021.103041 2021
[2]

IEEE Transactions on Intelligent Transportation Systems23(3), 2523–2537 (2022)

Abdel-Basset, M., Moustafa, N., Hawash, H., Razzak, I., Sallam, K.M., Elkomy, O.M.: Federated intrusion detection in blockchain-based smart transportation sys- tems. IEEE Transactions on Intelligent Transportation Systems23(3), 2523–2537 (2022). doi:10.1109/TITS.2021.3119968

work page doi:10.1109/tits.2021.3119968 2022
[3]

Cybersecurity4, 18 (2021)

Khraisat, A., Alazab, A.: A critical review of intrusion detection systems in the Internet of Things: techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cybersecurity4, 18 (2021). doi:10.1186/s42400- 021-00077-7

work page doi:10.1186/s42400- 2021
[4]

In: IEEE International Conference on Dis- tributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), pp

Alhowaide, A., Alsmadi, I., Alsinglawi, B.: Ensemble-based cyber intrusion detec- tion for robust smart city protection. In: IEEE International Conference on Dis- tributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), pp. 124–129 (2024)

2024
[5]

Journal of Net- work and Systems Management29, 1–18 (2021)

Al-Omari, M., Rawashdeh, M., Qutaishat, F., Alshira’H, M., Ababneh, N.: An intelligent tree-based intrusion detection model for cyber security. Journal of Net- work and Systems Management29, 1–18 (2021). doi:10.1007/s10922-021-09591-y

work page doi:10.1007/s10922-021-09591-y 2021
[6]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Hollmann, N., M¨ uller, S., Eggensperger, K., Hutter, F.: TabPFN: A trans- former that solves small tabular classification problems in a second. In: Interna- tional Conference on Learning Representations (ICLR) (2023). arXiv:2207.01848. doi:10.48550/arXiv.2207.01848

work page internal anchor Pith review doi:10.48550/arxiv.2207.01848 2023
[7]

A closer look at tabpfn v2: Understanding its strengths and extending its capabilities.arXiv preprint arXiv:2502.17361, 2025

Ye, H.-J., Liu, S.-Y., Chao, W.-L.: A closer look at TabPFN v2: Under- standing its strengths and extending its capabilities. arXiv:2502.17361 (2025). doi:10.48550/arXiv.2502.17361

work page doi:10.48550/arxiv.2502.17361 2025
[8]

IEEE Internet of Things Journal9(13), 11604–11613 (2022)

Khan, I.A., Moustafa, N., Pi, D., Sallam, K.M., Zomaya, A.Y., Li, B.: A new explainable deep learning framework for cyber threat discovery in industrial IoT networks. IEEE Internet of Things Journal9(13), 11604–11613 (2022). doi:10.1109/JIOT.2021.3130156

work page doi:10.1109/jiot.2021.3130156 2022
[9]

Future Internet16(3), 67 (2024)

Scalise, P., Boeding, M., Hempel, M., Sharif, H., Delloiacovo, J., Reed, J.: A system- atic survey on 5G and 6G security considerations, challenges, trends, and research areas. Future Internet16(3), 67 (2024). doi:10.3390/fi16030067

work page doi:10.3390/fi16030067 2024
[10]

Grinsztajn, E

Grinsztajn, L., Oyallon, E., Varoquaux, G.: Why do tree-based models still outperform deep learning on tabular data? arXiv:2207.08815 (2022). doi:10.48550/arXiv.2207.08815

work page doi:10.48550/arxiv.2207.08815 2022
[11]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Grinsztajn, L. et al.: TabPFN-2.5: Advancing the state of the art in tabular foun- dation models. arXiv:2511.08667 (2025). doi:10.48550/arXiv.2511.08667

work page internal anchor Pith review doi:10.48550/arxiv.2511.08667 2025
[12]

IEEE Access8, 165130–165150 (2020)

Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A., Anwar, A.: TON IoT telemetry dataset: A new generation dataset of IoT and IIoT for data- driven intrusion detection systems. IEEE Access8, 165130–165150 (2020). doi:10.1109/ACCESS.2020.3022862

work page doi:10.1109/access.2020.3022862 2020