MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library
Pith reviewed 2026-05-10 19:41 UTC · model grok-4.3
The pith
A multi-agent RAG framework with a self-building Experience Library enables explainable and continually improving intrusion detection for IoT networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MA-IDS pairs a Traffic Classification Agent that retrieves stored error rules before each inference with an Error Analysis Agent that converts misclassifications into new human-readable detection rules and adds them to the Experience Library. The library is maintained in a FAISS vector database so that future classifications can draw on accumulated experience. On the NF-BoT-IoT and NF-ToN-IoT datasets the system records Macro F1-Scores of 89.75% and 85.22%, gains of more than 72 and 80 percentage points over zero-shot baselines, while remaining competitive with SVM classifiers and supplying a rule-level explanation for every decision.
What carries the argument
The Experience Library, a FAISS vector database that stores and retrieves human-readable detection rules generated from classification errors to guide future inferences without modifying the language model.
If this is right
- The system supplies rule-level explanations for every classification decision.
- Continual learning occurs through external rule accumulation rather than retraining or fine-tuning the language model.
- Detection performance reaches levels competitive with SVM on established IoT intrusion benchmarks.
- The method addresses zero-day attacks and protocol heterogeneity through retrieval-augmented reasoning instead of fixed signatures.
Where Pith is reading between the lines
- External rule storage could reduce the frequency of full model retraining in changing network environments.
- The same error-to-rule loop might transfer to other adaptive detection tasks such as anomaly monitoring in industrial control systems.
- Growth of the library may eventually require conflict-resolution mechanisms if rules begin to overlap or contradict one another.
Load-bearing premise
The Error Analysis Agent can reliably translate misclassifications into accurate, non-conflicting human-readable rules that measurably improve subsequent classifications when stored and retrieved.
What would settle it
Running repeated classification rounds on the same NF-BoT-IoT or NF-ToN-IoT data and checking whether the F1 score stops rising or falls once the library has accumulated many rules.
Figures
read the original abstract
Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These challenges become even more severe in IoT environments because of resource constraints and heterogeneous protocols. To address these issues, we propose MA-IDS, a Multi-Agent Intrusion Detection System that combines Large Language Models (LLMs) with Retrieval Augmented Generation (RAG) for reasoning-driven intrusion detection. The proposed framework grounds LLM reasoning through a persistent, self-building Experience Library. Two specialized agents collaborate through a FAISS-based vector database: a Traffic Classification Agent that retrieves past error rules before each inference, and an Error Analysis Agent that converts misclassifications into human-readable detection rules stored for future retrieval, enabling continual learning through external knowledge accumulation, without modifying the underlying language model. Evaluated on NF-BoT-IoT and NF-ToN-IoT benchmark datasets, MA-IDS achieves Macro F1-Scores of 89.75% and 85.22%, improving over zero-shot baselines of 17% and 4.96% by more than 72 and 80 percentage points. These results are competitive with SVM while providing rule-level explanations for every classification decision, demonstrating that retrieval-augmented reasoning offers a principled path toward explainable, self-improving intrusion detection for IoT networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MA-IDS, a multi-agent RAG-based framework for IoT intrusion detection. A Traffic Classification Agent retrieves rules from a persistent Experience Library before inference, while an Error Analysis Agent converts misclassifications into human-readable rules stored in a FAISS vector database for continual learning without LLM fine-tuning. On NF-BoT-IoT and NF-ToN-IoT datasets, it reports Macro F1 scores of 89.75% and 85.22%, improving over zero-shot baselines (17% and 4.96%) by over 72 and 80 points and competitive with SVM, while supplying rule-level explanations.
Significance. If the empirical claims hold after proper validation, the work demonstrates a practical route to explainable, self-improving NIDS for heterogeneous IoT settings by externalizing knowledge via RAG rather than model updates. The multi-agent separation of classification and error-to-rule synthesis is a clear architectural contribution, though its value depends on unshown evidence that the generated rules are accurate, non-conflicting, and causally responsible for the reported gains.
major comments (3)
- [Experimental Evaluation] Experimental Evaluation section: the headline Macro F1 improvements (89.75% / 85.22%) are presented without any description of train/test splits, cross-validation procedure, baseline SVM implementation details, or statistical significance tests. These omissions make it impossible to assess whether the numbers support the central claim that the Experience Library drives the gains.
- [Error Analysis Agent] Error Analysis Agent and Experience Library description: no quantitative metrics (accuracy, conflict rate, retrieval hit rate, or ablation) are supplied for the rules synthesized from misclassifications. Without such evidence, the assertion that the library enables reliable continual learning remains unverified and load-bearing for the performance narrative.
- [Results] Results and Ablation Studies: the manuscript contains no ablation that isolates the contribution of the Experience Library (e.g., performance with vs. without retrieval). The large delta from zero-shot baselines could stem from prompt engineering, dataset characteristics, or other unstated factors rather than the claimed RAG mechanism.
minor comments (2)
- [Framework Architecture] Notation for the FAISS vector database and rule retrieval process could be clarified with a diagram or pseudocode to improve reproducibility.
- [Abstract] The abstract and introduction repeat the same performance numbers; a single consolidated table comparing all baselines (zero-shot, SVM, MA-IDS) would reduce redundancy.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. We agree that the experimental presentation requires strengthening and will revise the manuscript to include the requested details, metrics, and analyses.
read point-by-point responses
-
Referee: [Experimental Evaluation] Experimental Evaluation section: the headline Macro F1 improvements (89.75% / 85.22%) are presented without any description of train/test splits, cross-validation procedure, baseline SVM implementation details, or statistical significance tests. These omissions make it impossible to assess whether the numbers support the central claim that the Experience Library drives the gains.
Authors: We acknowledge this gap in the current presentation of results. In the revised manuscript, we will expand the Experimental Evaluation section to explicitly describe the train/test splits for NF-BoT-IoT and NF-ToN-IoT (including any temporal or stratified partitioning), the cross-validation procedure (e.g., 5-fold), full implementation details for the SVM baseline (hyperparameters, kernel, feature extraction), and statistical significance tests (such as McNemar's test or paired t-tests with p-values) comparing MA-IDS to baselines. These additions will enable proper assessment of result robustness. revision: yes
-
Referee: [Error Analysis Agent] Error Analysis Agent and Experience Library description: no quantitative metrics (accuracy, conflict rate, retrieval hit rate, or ablation) are supplied for the rules synthesized from misclassifications. Without such evidence, the assertion that the library enables reliable continual learning remains unverified and load-bearing for the performance narrative.
Authors: We agree that quantitative validation of the synthesized rules is necessary to support claims of reliable continual learning. In the revision, we will report metrics including rule accuracy on validation samples, conflict detection/resolution rates in the library, retrieval hit rates during inference, and an analysis of rule accumulation effects. This will provide direct evidence for the Error Analysis Agent's contribution without relying solely on end-to-end performance. revision: yes
-
Referee: [Results] Results and Ablation Studies: the manuscript contains no ablation that isolates the contribution of the Experience Library (e.g., performance with vs. without retrieval). The large delta from zero-shot baselines could stem from prompt engineering, dataset characteristics, or other unstated factors rather than the claimed RAG mechanism.
Authors: This is a fair critique. We will add a dedicated ablation study in the revised Results section, comparing full MA-IDS (with Experience Library retrieval) against a no-retrieval variant using the same prompts and agents. We will also document the exact prompts used in zero-shot and full settings to rule out confounding factors. This ablation will directly isolate the RAG mechanism's role in the observed gains. revision: yes
Circularity Check
No circularity: results are direct empirical measurements on public benchmarks
full rationale
The paper reports Macro F1 scores of 89.75% and 85.22% on NF-BoT-IoT and NF-ToN-IoT as measured outcomes of running the MA-IDS framework, compared to zero-shot baselines. No equations, fitted parameters, or self-referential derivations are present in the provided text; the Experience Library and Error Analysis Agent are architectural components whose contribution is asserted via evaluation rather than proven by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Experience Library
no independent evidence
Reference graph
Works this paper leans on
-
[1]
IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks,
Y . Li, Z. Xiang, N. D. Bastian, D. Song, and B. Li, “IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks,” Under Review at ICLR, 2025
work page 2025
-
[2]
TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification,
Y . Ginige and T. Dahanayaka, “TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification,” inProc. ACM Workshop Privacy Electron. Soc., 2023
work page 2023
-
[3]
P. Zambare, V . N. Thanikella, N. P. Kottur, S. A. Akula, and Y . Liu, “NetMoniAI: An Agentic AI Framework for Network Security & Monitoring,”arXiv preprintarXiv:2508.10052, 2025
-
[4]
S. Yang et al., “Large Language Models for Network Intrusion Detec- tion Systems: Foundations, Implementations, and Future Directions,” arXiv preprintarXiv:2507.04752, 2025
-
[5]
Research in intrusion-detection systems: A survey,
S. Axelsson, “Research in intrusion-detection systems: A survey,” Technical Report, Chalmers University of Technology, 1998
work page 1998
-
[6]
R. Bace and P. Mell, “Intrusion Detection Systems,” NIST Special Publication 800-31, Nov. 2001
work page 2001
-
[7]
D. E. Denning, “An Intrusion-Detection Model,”IEEE Trans. Softw. Eng., vol. SE-13, no. 2, pp. 222–232, Feb. 1987
work page 1987
-
[8]
Survey of intrusion detection systems: techniques, datasets and challenges,
A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, pp. 1–22, 2019
work page 2019
-
[9]
Guide to Intrusion Detection and Prevention Systems (IDPS),
K. Scarfone and P. Mell, “Guide to Intrusion Detection and Prevention Systems (IDPS),” NIST Special Publication 800-94, Feb. 2007
work page 2007
-
[10]
A signature-based intrusion detection system for the internet of things,
P. Ioulianou, V . Vasilakis, I. Moscholios, and M. Logothetis, “A signature-based intrusion detection system for the internet of things,” inInformation and Communication Technology Forum, Jun. 2018
work page 2018
-
[11]
S. Nagaraju, B. Shanmugham, and K. Baskaran, “High throughput token driven fsm based regex pattern matching for network intrusion detection system,”Materials Today: Proceedings, vol. 47, pp. 139– 143, 2021
work page 2021
-
[12]
M. Y . AlYousef and N. T. Abdelmajeed, “Dynamically detecting security threats and updating a signature-based intrusion detection system’s database,”Procedia Computer Science, vol. 159, pp. 1507– 1516, 2019
work page 2019
-
[13]
Cyber security threats detection in internet of things using deep learning approach,
F. Ullah et al., “Cyber security threats detection in internet of things using deep learning approach,”IEEE Access, vol. 7, pp. 124379– 124389, 2019
work page 2019
-
[14]
An anomaly intrusion detection system using c5 decision tree classifier,
A. Khraisat, I. Gondal, and P. Vamplew, “An anomaly intrusion detection system using c5 decision tree classifier,” inPAKDD 2018 Workshops, pp. 149–155, 2018
work page 2018
-
[15]
Network intrusion detection based on ie-dbn model,
H. Jia, J. Liu, M. Zhang, X. He, and W. Sun, “Network intrusion detection based on ie-dbn model,”Computer Communications, vol. 178, pp. 131–140, 2021
work page 2021
-
[16]
A language-based intrusion detection approach for automotive embedded networks,
I. Studnia et al., “A language-based intrusion detection approach for automotive embedded networks,”Int. J. Embedded Systems, vol. 10, no. 1, pp. 1–12, 2018
work page 2018
-
[17]
A novel hybrid intrusion detection method integrating anomaly detection with misuse detection,
G. Kim, S. Lee, and S. Kim, “A novel hybrid intrusion detection method integrating anomaly detection with misuse detection,”Expert Systems with Applications, vol. 41, no. 4, pp. 1690–1700, 2014
work page 2014
-
[18]
Dimension reduction in intrusion detection features using discriminative machine learning approach,
K. Bajaj and A. Arora, “Dimension reduction in intrusion detection features using discriminative machine learning approach,”Int. J. Computer Science Issues, vol. 10, no. 4, p. 324, 2013
work page 2013
-
[19]
A new intrusion detection sys- tem based on knn classification algorithm in wireless sensor network,
W. Li, P. Yi, Y . Wu, L. Pan, and J. Li, “A new intrusion detection sys- tem based on knn classification algorithm in wireless sensor network,” J. Electrical and Computer Engineering, vol. 2014
work page 2014
-
[20]
A reliable net- work intrusion detection approach using decision tree with enhanced data quality,
A. Guezzaz, S. Benkirane, M. Azrour, and S. Khurram, “A reliable net- work intrusion detection approach using decision tree with enhanced data quality,”Security and Communication Networks, vol. 2021, 2021
work page 2021
-
[21]
A comprehensive survey and taxonomy of the svm-based intrusion detection systems,
M. Mohammadi et al., “A comprehensive survey and taxonomy of the svm-based intrusion detection systems,”J. Network and Computer Applications, vol. 178, p. 102983, 2021
work page 2021
-
[22]
Machine learning based intrusion detection systems for iot applications,
A. Verma and V . Ranga, “Machine learning based intrusion detection systems for iot applications,”Wireless Personal Communications, vol. 111, no. 4, pp. 2287–2310, 2020
work page 2020
-
[23]
A lightweight supervised intrusion detection mechanism for iot networks,
S. Roy, J. Li, B. Choi, and Y . Bai, “A lightweight supervised intrusion detection mechanism for iot networks,”Future Generation Computer Systems, vol. 127, pp. 276–285, 2022
work page 2022
-
[24]
Quantum machine learning for feature selection in internet of things network intrusion detection,
P. J. Davis, S. Coffey, L. Beshaj, and N. D. Bastian, “Quantum machine learning for feature selection in internet of things network intrusion detection,” inQuantum Information Science, Sensing, and Computation XVI, vol. 13028, SPIE, 2024
work page 2024
-
[25]
Deep Learning-based Intrusion Detection Systems: A Survey,
Z. Xu et al., “Deep Learning-based Intrusion Detection Systems: A Survey,” arXiv:2504.07839, 2025
-
[26]
F. M. Anis, M. Alabdullatif, S. Aljbli and M. Hammoudeh, ”A Survey on the Applications of Deep Learning in Network Intrusion Detection Systems to Enhance Network Security,” in IEEE Access, vol. 13, 2025
work page 2025
-
[27]
B. K. Sedraoui, A. Benmachiche, A. Makhlouf and C. Chemam, ”Intrusion Detection with deep learning: A literature review,” 6th PAIS, Algeria, 2024. doi: 10.1109/PAIS62114.2024.10541191
-
[28]
Deep Learning for Cyber Security Intrusion Detection,
M. A. Ferrag et al., “Deep Learning for Cyber Security Intrusion Detection,”J. Information Security and Applications, 2019
work page 2019
-
[29]
Awajan, Albara. 2023. ”A Novel Deep Learning-Based Intrusion Detection System for IoT Networks” Computers 12, no. 2: 34
work page 2023
-
[30]
A. A. Jihado and A. S. Girsang, ”Hybrid deep learning network intrusion detection system based on convolutional neural network and bidirectional long short-term memory,”JAIT, vol. 15, 2024
work page 2024
- [31]
-
[32]
S. I. Popoola, Y . Tsado, A. A. Ogunjinmi, E. Sanchez-Velazquez, Y . Peng, and D. B. Rawat, ”Multi-stage deep learning for intrusion detection in industrial internet of things,”IEEE Access, 2025,
work page 2025
-
[33]
H. Zhang, A. B. Sediq, A. Afana, and M. Erol-Kantarci, “Large language models in wireless application design: In-context learning- enhanced automatic network intrusion detection,” arXiv:2405.11002, 2024
-
[34]
M. Luay, S. Layeghy, S. Hosseininoorbin, M. Sarhan, N. Moustafa, and M. Portmann, ”Temporal analysis of NetFlow datasets for network intrusion detection systems,” arXiv:2503.04404, 2025. [Online]
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.