pith. sign in

arxiv: 2604.05458 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.AI

MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

Pith reviewed 2026-05-10 19:41 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords multi-agent systemsretrieval augmented generationintrusion detectionIoT securityexplainable AIlarge language modelscontinual learningnetwork security
0
0 comments X

The pith

A multi-agent RAG framework with a self-building Experience Library enables explainable and continually improving intrusion detection for IoT networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MA-IDS to overcome key limits of existing network intrusion detection in IoT environments. Signature-based systems miss zero-day attacks and modified variants, while machine learning models typically lack clear explanations for decisions and struggle with resource constraints and diverse protocols. MA-IDS deploys two collaborating agents around a persistent library of human-readable rules: one agent retrieves relevant past error rules before classifying traffic, and the other analyzes mistakes to generate and store new rules for future use. This external knowledge accumulation lets the underlying language model reason without internal changes. If the approach holds, it would deliver high detection performance together with built-in explanations and ongoing adaptation on standard benchmarks.

Core claim

MA-IDS pairs a Traffic Classification Agent that retrieves stored error rules before each inference with an Error Analysis Agent that converts misclassifications into new human-readable detection rules and adds them to the Experience Library. The library is maintained in a FAISS vector database so that future classifications can draw on accumulated experience. On the NF-BoT-IoT and NF-ToN-IoT datasets the system records Macro F1-Scores of 89.75% and 85.22%, gains of more than 72 and 80 percentage points over zero-shot baselines, while remaining competitive with SVM classifiers and supplying a rule-level explanation for every decision.

What carries the argument

The Experience Library, a FAISS vector database that stores and retrieves human-readable detection rules generated from classification errors to guide future inferences without modifying the language model.

If this is right

  • The system supplies rule-level explanations for every classification decision.
  • Continual learning occurs through external rule accumulation rather than retraining or fine-tuning the language model.
  • Detection performance reaches levels competitive with SVM on established IoT intrusion benchmarks.
  • The method addresses zero-day attacks and protocol heterogeneity through retrieval-augmented reasoning instead of fixed signatures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • External rule storage could reduce the frequency of full model retraining in changing network environments.
  • The same error-to-rule loop might transfer to other adaptive detection tasks such as anomaly monitoring in industrial control systems.
  • Growth of the library may eventually require conflict-resolution mechanisms if rules begin to overlap or contradict one another.

Load-bearing premise

The Error Analysis Agent can reliably translate misclassifications into accurate, non-conflicting human-readable rules that measurably improve subsequent classifications when stored and retrieved.

What would settle it

Running repeated classification rounds on the same NF-BoT-IoT or NF-ToN-IoT data and checking whether the F1 score stops rising or falls once the library has accumulated many rules.

Figures

Figures reproduced from arXiv: 2604.05458 by Ayesha S. Dina, Luis G. Jaimes, Md Shamimul Islam.

Figure 1
Figure 1. Figure 1: Dual-Phase Agentic Workflow for MA-IDS. Phase 1 (Top) illustrates the offline library-building loop, where an Error [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Macro-averaged Precision, Recall, and F1-Score for the Zero-Shot Baseline and MA-IDS on (a) NF-BoT-IoT and (b) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Macro F1-score over cumulative samples during library construction for MA-IDS with and without the Experience [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These challenges become even more severe in IoT environments because of resource constraints and heterogeneous protocols. To address these issues, we propose MA-IDS, a Multi-Agent Intrusion Detection System that combines Large Language Models (LLMs) with Retrieval Augmented Generation (RAG) for reasoning-driven intrusion detection. The proposed framework grounds LLM reasoning through a persistent, self-building Experience Library. Two specialized agents collaborate through a FAISS-based vector database: a Traffic Classification Agent that retrieves past error rules before each inference, and an Error Analysis Agent that converts misclassifications into human-readable detection rules stored for future retrieval, enabling continual learning through external knowledge accumulation, without modifying the underlying language model. Evaluated on NF-BoT-IoT and NF-ToN-IoT benchmark datasets, MA-IDS achieves Macro F1-Scores of 89.75% and 85.22%, improving over zero-shot baselines of 17% and 4.96% by more than 72 and 80 percentage points. These results are competitive with SVM while providing rule-level explanations for every classification decision, demonstrating that retrieval-augmented reasoning offers a principled path toward explainable, self-improving intrusion detection for IoT networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MA-IDS, a multi-agent RAG-based framework for IoT intrusion detection. A Traffic Classification Agent retrieves rules from a persistent Experience Library before inference, while an Error Analysis Agent converts misclassifications into human-readable rules stored in a FAISS vector database for continual learning without LLM fine-tuning. On NF-BoT-IoT and NF-ToN-IoT datasets, it reports Macro F1 scores of 89.75% and 85.22%, improving over zero-shot baselines (17% and 4.96%) by over 72 and 80 points and competitive with SVM, while supplying rule-level explanations.

Significance. If the empirical claims hold after proper validation, the work demonstrates a practical route to explainable, self-improving NIDS for heterogeneous IoT settings by externalizing knowledge via RAG rather than model updates. The multi-agent separation of classification and error-to-rule synthesis is a clear architectural contribution, though its value depends on unshown evidence that the generated rules are accurate, non-conflicting, and causally responsible for the reported gains.

major comments (3)
  1. [Experimental Evaluation] Experimental Evaluation section: the headline Macro F1 improvements (89.75% / 85.22%) are presented without any description of train/test splits, cross-validation procedure, baseline SVM implementation details, or statistical significance tests. These omissions make it impossible to assess whether the numbers support the central claim that the Experience Library drives the gains.
  2. [Error Analysis Agent] Error Analysis Agent and Experience Library description: no quantitative metrics (accuracy, conflict rate, retrieval hit rate, or ablation) are supplied for the rules synthesized from misclassifications. Without such evidence, the assertion that the library enables reliable continual learning remains unverified and load-bearing for the performance narrative.
  3. [Results] Results and Ablation Studies: the manuscript contains no ablation that isolates the contribution of the Experience Library (e.g., performance with vs. without retrieval). The large delta from zero-shot baselines could stem from prompt engineering, dataset characteristics, or other unstated factors rather than the claimed RAG mechanism.
minor comments (2)
  1. [Framework Architecture] Notation for the FAISS vector database and rule retrieval process could be clarified with a diagram or pseudocode to improve reproducibility.
  2. [Abstract] The abstract and introduction repeat the same performance numbers; a single consolidated table comparing all baselines (zero-shot, SVM, MA-IDS) would reduce redundancy.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. We agree that the experimental presentation requires strengthening and will revise the manuscript to include the requested details, metrics, and analyses.

read point-by-point responses
  1. Referee: [Experimental Evaluation] Experimental Evaluation section: the headline Macro F1 improvements (89.75% / 85.22%) are presented without any description of train/test splits, cross-validation procedure, baseline SVM implementation details, or statistical significance tests. These omissions make it impossible to assess whether the numbers support the central claim that the Experience Library drives the gains.

    Authors: We acknowledge this gap in the current presentation of results. In the revised manuscript, we will expand the Experimental Evaluation section to explicitly describe the train/test splits for NF-BoT-IoT and NF-ToN-IoT (including any temporal or stratified partitioning), the cross-validation procedure (e.g., 5-fold), full implementation details for the SVM baseline (hyperparameters, kernel, feature extraction), and statistical significance tests (such as McNemar's test or paired t-tests with p-values) comparing MA-IDS to baselines. These additions will enable proper assessment of result robustness. revision: yes

  2. Referee: [Error Analysis Agent] Error Analysis Agent and Experience Library description: no quantitative metrics (accuracy, conflict rate, retrieval hit rate, or ablation) are supplied for the rules synthesized from misclassifications. Without such evidence, the assertion that the library enables reliable continual learning remains unverified and load-bearing for the performance narrative.

    Authors: We agree that quantitative validation of the synthesized rules is necessary to support claims of reliable continual learning. In the revision, we will report metrics including rule accuracy on validation samples, conflict detection/resolution rates in the library, retrieval hit rates during inference, and an analysis of rule accumulation effects. This will provide direct evidence for the Error Analysis Agent's contribution without relying solely on end-to-end performance. revision: yes

  3. Referee: [Results] Results and Ablation Studies: the manuscript contains no ablation that isolates the contribution of the Experience Library (e.g., performance with vs. without retrieval). The large delta from zero-shot baselines could stem from prompt engineering, dataset characteristics, or other unstated factors rather than the claimed RAG mechanism.

    Authors: This is a fair critique. We will add a dedicated ablation study in the revised Results section, comparing full MA-IDS (with Experience Library retrieval) against a no-retrieval variant using the same prompts and agents. We will also document the exact prompts used in zero-shot and full settings to rule out confounding factors. This ablation will directly isolate the RAG mechanism's role in the observed gains. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on public benchmarks

full rationale

The paper reports Macro F1 scores of 89.75% and 85.22% on NF-BoT-IoT and NF-ToN-IoT as measured outcomes of running the MA-IDS framework, compared to zero-shot baselines. No equations, fitted parameters, or self-referential derivations are present in the provided text; the Experience Library and Error Analysis Agent are architectural components whose contribution is asserted via evaluation rather than proven by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review prevents full ledger construction; the framework introduces the Experience Library as a core component whose effectiveness is asserted via reported scores but lacks independent evidence or parameter details.

invented entities (1)
  • Experience Library no independent evidence
    purpose: Persistent, self-building store of human-readable detection rules derived from classification errors to enable continual learning
    Central to the continual-learning claim but introduced without external validation beyond the two reported F1 scores.

pith-pipeline@v0.9.0 · 5574 in / 1338 out tokens · 75690 ms · 2026-05-10T19:41:46.753143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks,

    Y . Li, Z. Xiang, N. D. Bastian, D. Song, and B. Li, “IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks,” Under Review at ICLR, 2025

  2. [2]

    TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification,

    Y . Ginige and T. Dahanayaka, “TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification,” inProc. ACM Workshop Privacy Electron. Soc., 2023

  3. [3]

    Zambare, V

    P. Zambare, V . N. Thanikella, N. P. Kottur, S. A. Akula, and Y . Liu, “NetMoniAI: An Agentic AI Framework for Network Security & Monitoring,”arXiv preprintarXiv:2508.10052, 2025

  4. [4]

    Large Language Models for Network Intrusion Detec- tion Systems: Foundations, Implementations, and Future Directions,

    S. Yang et al., “Large Language Models for Network Intrusion Detec- tion Systems: Foundations, Implementations, and Future Directions,” arXiv preprintarXiv:2507.04752, 2025

  5. [5]

    Research in intrusion-detection systems: A survey,

    S. Axelsson, “Research in intrusion-detection systems: A survey,” Technical Report, Chalmers University of Technology, 1998

  6. [6]

    Intrusion Detection Systems,

    R. Bace and P. Mell, “Intrusion Detection Systems,” NIST Special Publication 800-31, Nov. 2001

  7. [7]

    An Intrusion-Detection Model,

    D. E. Denning, “An Intrusion-Detection Model,”IEEE Trans. Softw. Eng., vol. SE-13, no. 2, pp. 222–232, Feb. 1987

  8. [8]

    Survey of intrusion detection systems: techniques, datasets and challenges,

    A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, pp. 1–22, 2019

  9. [9]

    Guide to Intrusion Detection and Prevention Systems (IDPS),

    K. Scarfone and P. Mell, “Guide to Intrusion Detection and Prevention Systems (IDPS),” NIST Special Publication 800-94, Feb. 2007

  10. [10]

    A signature-based intrusion detection system for the internet of things,

    P. Ioulianou, V . Vasilakis, I. Moscholios, and M. Logothetis, “A signature-based intrusion detection system for the internet of things,” inInformation and Communication Technology Forum, Jun. 2018

  11. [11]

    High throughput token driven fsm based regex pattern matching for network intrusion detection system,

    S. Nagaraju, B. Shanmugham, and K. Baskaran, “High throughput token driven fsm based regex pattern matching for network intrusion detection system,”Materials Today: Proceedings, vol. 47, pp. 139– 143, 2021

  12. [12]

    Dynamically detecting security threats and updating a signature-based intrusion detection system’s database,

    M. Y . AlYousef and N. T. Abdelmajeed, “Dynamically detecting security threats and updating a signature-based intrusion detection system’s database,”Procedia Computer Science, vol. 159, pp. 1507– 1516, 2019

  13. [13]

    Cyber security threats detection in internet of things using deep learning approach,

    F. Ullah et al., “Cyber security threats detection in internet of things using deep learning approach,”IEEE Access, vol. 7, pp. 124379– 124389, 2019

  14. [14]

    An anomaly intrusion detection system using c5 decision tree classifier,

    A. Khraisat, I. Gondal, and P. Vamplew, “An anomaly intrusion detection system using c5 decision tree classifier,” inPAKDD 2018 Workshops, pp. 149–155, 2018

  15. [15]

    Network intrusion detection based on ie-dbn model,

    H. Jia, J. Liu, M. Zhang, X. He, and W. Sun, “Network intrusion detection based on ie-dbn model,”Computer Communications, vol. 178, pp. 131–140, 2021

  16. [16]

    A language-based intrusion detection approach for automotive embedded networks,

    I. Studnia et al., “A language-based intrusion detection approach for automotive embedded networks,”Int. J. Embedded Systems, vol. 10, no. 1, pp. 1–12, 2018

  17. [17]

    A novel hybrid intrusion detection method integrating anomaly detection with misuse detection,

    G. Kim, S. Lee, and S. Kim, “A novel hybrid intrusion detection method integrating anomaly detection with misuse detection,”Expert Systems with Applications, vol. 41, no. 4, pp. 1690–1700, 2014

  18. [18]

    Dimension reduction in intrusion detection features using discriminative machine learning approach,

    K. Bajaj and A. Arora, “Dimension reduction in intrusion detection features using discriminative machine learning approach,”Int. J. Computer Science Issues, vol. 10, no. 4, p. 324, 2013

  19. [19]

    A new intrusion detection sys- tem based on knn classification algorithm in wireless sensor network,

    W. Li, P. Yi, Y . Wu, L. Pan, and J. Li, “A new intrusion detection sys- tem based on knn classification algorithm in wireless sensor network,” J. Electrical and Computer Engineering, vol. 2014

  20. [20]

    A reliable net- work intrusion detection approach using decision tree with enhanced data quality,

    A. Guezzaz, S. Benkirane, M. Azrour, and S. Khurram, “A reliable net- work intrusion detection approach using decision tree with enhanced data quality,”Security and Communication Networks, vol. 2021, 2021

  21. [21]

    A comprehensive survey and taxonomy of the svm-based intrusion detection systems,

    M. Mohammadi et al., “A comprehensive survey and taxonomy of the svm-based intrusion detection systems,”J. Network and Computer Applications, vol. 178, p. 102983, 2021

  22. [22]

    Machine learning based intrusion detection systems for iot applications,

    A. Verma and V . Ranga, “Machine learning based intrusion detection systems for iot applications,”Wireless Personal Communications, vol. 111, no. 4, pp. 2287–2310, 2020

  23. [23]

    A lightweight supervised intrusion detection mechanism for iot networks,

    S. Roy, J. Li, B. Choi, and Y . Bai, “A lightweight supervised intrusion detection mechanism for iot networks,”Future Generation Computer Systems, vol. 127, pp. 276–285, 2022

  24. [24]

    Quantum machine learning for feature selection in internet of things network intrusion detection,

    P. J. Davis, S. Coffey, L. Beshaj, and N. D. Bastian, “Quantum machine learning for feature selection in internet of things network intrusion detection,” inQuantum Information Science, Sensing, and Computation XVI, vol. 13028, SPIE, 2024

  25. [25]

    Deep Learning-based Intrusion Detection Systems: A Survey,

    Z. Xu et al., “Deep Learning-based Intrusion Detection Systems: A Survey,” arXiv:2504.07839, 2025

  26. [26]

    F. M. Anis, M. Alabdullatif, S. Aljbli and M. Hammoudeh, ”A Survey on the Applications of Deep Learning in Network Intrusion Detection Systems to Enhance Network Security,” in IEEE Access, vol. 13, 2025

  27. [27]

    B. K. Sedraoui, A. Benmachiche, A. Makhlouf and C. Chemam, ”Intrusion Detection with deep learning: A literature review,” 6th PAIS, Algeria, 2024. doi: 10.1109/PAIS62114.2024.10541191

  28. [28]

    Deep Learning for Cyber Security Intrusion Detection,

    M. A. Ferrag et al., “Deep Learning for Cyber Security Intrusion Detection,”J. Information Security and Applications, 2019

  29. [29]

    Awajan, Albara. 2023. ”A Novel Deep Learning-Based Intrusion Detection System for IoT Networks” Computers 12, no. 2: 34

  30. [30]

    A. A. Jihado and A. S. Girsang, ”Hybrid deep learning network intrusion detection system based on convolutional neural network and bidirectional long short-term memory,”JAIT, vol. 15, 2024

  31. [31]

    Keshk, N

    M. Keshk, N. Koroniotis, N. Pham, N. Moustafa, B. Turnbull, and A. Y . Zomaya, ”An explainable deep learning-enabled intrusion detection framework in IoT networks,”Information Sciences, vol. 639, 2023,

  32. [32]

    S. I. Popoola, Y . Tsado, A. A. Ogunjinmi, E. Sanchez-Velazquez, Y . Peng, and D. B. Rawat, ”Multi-stage deep learning for intrusion detection in industrial internet of things,”IEEE Access, 2025,

  33. [33]

    Large language models in wireless application design: In-context learning- enhanced automatic network intrusion detection,

    H. Zhang, A. B. Sediq, A. Afana, and M. Erol-Kantarci, “Large language models in wireless application design: In-context learning- enhanced automatic network intrusion detection,” arXiv:2405.11002, 2024

  34. [34]

    M. Luay, S. Layeghy, S. Hosseininoorbin, M. Sarhan, N. Moustafa, and M. Portmann, ”Temporal analysis of NetFlow datasets for network intrusion detection systems,” arXiv:2503.04404, 2025. [Online]