pith. sign in

arxiv: 2511.18240 · v2 · pith:KLBZ67EVnew · submitted 2025-11-23 · 💻 cs.CR

Carbon-Aware Intrusion Detection: A Comparative Study of Supervised and Unsupervised DRL for Sustainable IoT Edge Gateways

Pith reviewed 2026-05-21 19:10 UTC · model grok-4.3

classification 💻 cs.CR
keywords intrusion detectiondeep reinforcement learningIoT edge gatewayscarbon-awareDDoS detectionsupervised learninglabel-free learningsustainable computing
0
0 comments X

The pith

Two DRL-based IDS for IoT edge gateways achieve 94% and 98% detection accuracy via a carbon-aware multi-objective reward.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops DeepEdgeIDS, a label-free Autoencoder-DRL hybrid, and AutoDRL-IDS, a supervised LSTM-DRL model, to detect DDoS attacks on resource-constrained IoT edge gateways. It introduces a carbon-aware multi-objective reward formulation that supports supervised optimization in one model and label-free online learning in the other. This setup enables real-time intrusion detection while factoring in energy use and carbon impact, addressing the limits of static signatures and labeled-data dependence in traditional systems. A sympathetic reader would care because IoT networks continue to expand and require both effective security and lower environmental costs.

Core claim

The paper claims that a carbon-aware multi-objective reward formulation in deep reinforcement learning enables AutoDRL-IDS to reach 94% detection accuracy with labeled data and DeepEdgeIDS to reach 98% offline evaluation accuracy through label-free anomaly detection plus online mitigation feedback, supporting sustainable real-time IDS operation in dynamic IoT networks.

What carries the argument

The carbon-aware multi-objective reward formulation that supports supervised reward optimization for AutoDRL-IDS and label-free online reward learning for DeepEdgeIDS.

If this is right

  • AutoDRL-IDS offers a path to 94% accurate detection wherever labeled attack data is available.
  • DeepEdgeIDS shows that label-free anomaly detection plus online feedback can reach 98% accuracy on edge hardware.
  • Both models support real-time IDS that accounts for energy efficiency and carbon impact in dynamic networks.
  • Theoretical analysis combined with gateway experiments confirms the feasibility of the dual-objective approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reward structure could be tested on other edge security tasks such as malware detection or access control.
  • Deployment in real IoT testbeds with live traffic would reveal whether the offline 98% accuracy holds under variable loads.
  • Pairing the models with low-power hardware accelerators might yield further measurable drops in carbon cost.

Load-bearing premise

The custom carbon-aware multi-objective reward function can be optimized to improve both detection performance and sustainability metrics simultaneously without one objective degrading the other in practice.

What would settle it

An experiment where lowering the carbon component of the reward consistently drops detection accuracy below 90% would falsify the claim that both objectives improve together.

Figures

Figures reproduced from arXiv: 2511.18240 by Amin Nikanjam, Foutse Khomh, Kawser Wazed Nafi, Martine Bellaiche, Omar Abdul-Wahab, Saeid Jamshidi, Samira Keivanpour.

Figure 1
Figure 1. Figure 1: Overview of the proposed DeepEdgeIDS architecture for DDoS detection. where x(t) is the network state vector, and u(t) ∈ A is the mitigation control input. The optimal policy π ∗ minimizing cumulative cost: J(π) = Z T 0 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Reward convergence for different ϵ values in DeepEdgeIDS. confirming that temporal smoothness and sustainability con￾straints jointly regulate stability [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reward convergence for different ϵ values in AutoDRL-IDS. 3) Operational Impact: Table II translates convergence stability into measurable performance metrics, including de￾tection probability, missed packets per hour (M = 50 pps), and false alerts per 100 predictions. Both DRL-based IDS sus￾tain detection probabilities exceeding 90% with insignificant computational overhead, achieving bounded regret under… view at source ↗
Figure 4
Figure 4. Figure 4: IoT edge testbed architecture for evaluating AutoDRL-IDS and DeepEdgeIDS under real-time zero-day DDoS attacks. TABLE III: Performance Comparison of AutoDRL-IDS and DeepEdgeIDS with Existing Models. Model Accuracy % Precision % Recall % F1-Score % AutoDRL-IDS (Proposed) 94.0 91.7 92.0 91.3 DeepEdgeIDS (Proposed) 98.0 92.4 97.6 94.9 Baseline Models RL-Based HBOS [23] 94.0 94.0 94.0 91.0 DDAD-SOEL [24] 99.34… view at source ↗
Figure 5
Figure 5. Figure 5: System monitor log output during normal and DDoS attack scenarios showing performance metrics of AutoDRL-IDS and DeepEdgeIDS. TABLE IV: ANOVA: Detection Probability Comparison Between DeepEdgeIDS and AutoDRL-IDS on Edge Gateways. Source Degrees of Freedom Sum of Squares Mean Square F Statistic P-value Between Groups 1 0.3154 0.3154 67.89 <0.05 Within Groups 98 0.2046 0.0021 – – Total 99 0.5200 – – – [PITH… view at source ↗
Figure 6
Figure 6. Figure 6: DDoS detection probability comparison between AutoDRL￾IDS and DeepEdgeIDS under DDoS attacks on Edge Gateways. 3) IDS Response Time [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Response time comparison between AutoDRL-IDS and DeepEdgeIDS on Edge Gateway. demonstrating faster detection. The observed decreasing trend in both models reflects the convergence of DRL policy, where agents progressively reduce exploratory overhead and transi￾tion toward optimized state–action mappings. As the operation stabilizes, inference delays diminish due to cache warming, streamlined feature extrac… view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of energy consumption for AutoDRL-IDS and DeepEdgeIDS on Edge Gateways. consumes more energy than AutoDRL-IDS, with the differ￾ence being statistically significant but not large in practice. The higher consumption in DeepEdgeIDS stems from its adaptive reinforcement updates, which require more computation dur￾ing real-time mitigation. However, this small energy increase is balanced by its highe… view at source ↗
Figure 9
Figure 9. Figure 9 [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of carbon emissions for AutoDRL-IDS and DeepEdgeIDS on Edge Gateways. benefiting from a static policy framework that reduces energy consumption over time. Furthermore, DeepEdgeIDS achieves adaptability and detection responsiveness at the cost of slightly higher carbon emissions, whereas AutoDRL-IDS offers a greener and more predictable energy profile, making it suitable for edge . 3) CPU Usage … view at source ↗
Figure 13
Figure 13. Figure 13: Memory usage comparison between AutoDRL-IDS and DeepEdgeIDS on Edge Gateways. in memory usage between the two DRL-based IDS is not statistically significant (F = 2.18, P > 0.05). This demon￾strates that both DRL-based IDS maintain comparable and efficient memory management suitable for the edge gateway. Although DeepEdgeIDS requires more memory to support its [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
read the original abstract

The rapid expansion of the Internet of Things (IoT) has intensified cybersecurity challenges, particularly in mitigating Distributed Denial-of-Service (DDoS) attacks at the network edge. Traditional Intrusion Detection Systems (IDSs) face significant limitations, including poor adaptability to evolving and zero-day attacks, reliance on static signatures and labeled datasets, and inefficiency on resource-constrained edge gateways. Moreover, most existing DRL-based IDS studies overlook sustainability factors such as energy efficiency and carbon impact. To address these challenges, this paper proposes two novel Deep Reinforcement Learning (DRL)-based IDS: DeepEdgeIDS, a label-free Autoencoder-DRL hybrid, and AutoDRL-IDS, a supervised LSTM--DRL model. Both DRL-based IDS are validated through theoretical analysis and experimental evaluation on edge gateways. Results demonstrate that AutoDRL-IDS achieves 94% detection accuracy using labeled data, while DeepEdgeIDS attains 98% offline evaluation accuracy through label-free anomaly detection and online mitigation feedback. This study introduces a carbon-aware, multi-objective reward formulation that supports supervised reward optimization for AutoDRL-IDS and label-free online reward learning for DeepEdgeIDS, enabling sustainable real-time IDS operation in dynamic IoT networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes two DRL-based intrusion detection systems for IoT edge gateways facing DDoS attacks: AutoDRL-IDS, a supervised LSTM-DRL model reporting 94% detection accuracy with labeled data, and DeepEdgeIDS, a label-free Autoencoder-DRL hybrid reporting 98% offline accuracy via anomaly detection and online mitigation. Both incorporate a custom carbon-aware multi-objective reward formulation intended to jointly optimize detection performance and sustainability metrics such as energy efficiency and carbon impact, with validation via theoretical analysis and experiments on edge gateways.

Significance. If the central performance claims and non-conflicting objective improvements hold under rigorous controls, the work would contribute to the emerging intersection of sustainable computing and adaptive cybersecurity for resource-constrained IoT. The comparative supervised versus label-free DRL framing, combined with explicit carbon awareness in the reward, could inform practical edge deployments where both security and environmental constraints matter. The emphasis on online mitigation feedback and real-time operation addresses acknowledged limitations of static signature-based IDS.

major comments (2)
  1. [Abstract and experimental evaluation] Abstract and experimental evaluation sections: the central claims of 94% and 98% detection accuracy are presented without any description of the underlying dataset(s), attack traffic generation method, baseline algorithms (e.g., standard supervised ML classifiers or other DRL-IDS), number of independent runs, or statistical measures such as standard deviation or confidence intervals. These omissions directly undermine assessment of whether the reported figures support the superiority or sustainability claims.
  2. [Reward formulation] Reward formulation section: the multi-objective carbon-aware reward is asserted to enable simultaneous gains in detection accuracy and sustainability without degradation, yet no explicit trade-off curves, Pareto analysis, or sensitivity results on the objective weights are supplied. This leaves the weakest assumption—that the custom reward can be optimized without one objective harming the other—unsupported by concrete evidence in the reported experiments.
minor comments (2)
  1. [Methodology] Notation for the carbon impact term and energy consumption metric should be defined consistently when first introduced and cross-referenced to the reward equation to avoid ambiguity for readers unfamiliar with carbon-aware RL.
  2. [Discussion] The manuscript would benefit from a dedicated limitations subsection discussing potential overfitting to the chosen edge-gateway hardware or sensitivity to hyperparameter choices in the DRL agents.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects for improving clarity and rigor. We address each major comment below and indicate the changes we will make in the revised version.

read point-by-point responses
  1. Referee: [Abstract and experimental evaluation] Abstract and experimental evaluation sections: the central claims of 94% and 98% detection accuracy are presented without any description of the underlying dataset(s), attack traffic generation method, baseline algorithms (e.g., standard supervised ML classifiers or other DRL-IDS), number of independent runs, or statistical measures such as standard deviation or confidence intervals. These omissions directly undermine assessment of whether the reported figures support the superiority or sustainability claims.

    Authors: We agree that these details are necessary for a complete evaluation of the reported accuracy figures. In the revised manuscript, we will expand both the abstract and the experimental evaluation section to explicitly describe the dataset(s) employed, the DDoS attack traffic generation approach, the baseline algorithms used for comparison (including standard supervised ML classifiers and other DRL-IDS methods), the number of independent runs, and statistical measures such as standard deviations and confidence intervals. This will provide the necessary context to assess the performance and sustainability claims. revision: yes

  2. Referee: [Reward formulation] Reward formulation section: the multi-objective carbon-aware reward is asserted to enable simultaneous gains in detection accuracy and sustainability without degradation, yet no explicit trade-off curves, Pareto analysis, or sensitivity results on the objective weights are supplied. This leaves the weakest assumption—that the custom reward can be optimized without one objective harming the other—unsupported by concrete evidence in the reported experiments.

    Authors: We acknowledge that additional evidence is required to substantiate the joint optimization claim. We will add to the revised manuscript explicit trade-off curves, Pareto analysis, and sensitivity results with respect to the objective weights. These additions will demonstrate that the carbon-aware multi-objective reward supports simultaneous improvements in detection performance and sustainability metrics without one objective degrading the other. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a custom carbon-aware multi-objective reward formulation to support both supervised and label-free DRL training for IDS on IoT edge gateways. Reported accuracies (94% for AutoDRL-IDS, 98% for DeepEdgeIDS) are presented as outcomes of experimental evaluation rather than direct algebraic consequences of the reward definition itself. No equations or steps in the abstract reduce the performance metrics to parameters fitted on the identical evaluation data, nor do they rely on self-citation chains or imported uniqueness theorems for the central claims. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Claims rest on standard DRL learning assumptions plus a custom reward function whose balance parameters are not shown to be derived from first principles or external benchmarks.

free parameters (1)
  • objective weights in carbon-aware reward
    Tunable scalars balancing detection accuracy against energy/carbon terms; introduced to enable the multi-objective optimization described.
axioms (1)
  • domain assumption Reinforcement learning agents can converge to effective policies for real-time intrusion mitigation through environment interaction and reward feedback.
    Invoked by the use of DRL for both supervised and label-free IDS operation.

pith-pipeline@v0.9.0 · 5786 in / 1351 out tokens · 60730 ms · 2026-05-21T19:10:03.608967+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    Internet of things: Security and solutions survey,

    P. K. Sadhu, V . P. Yanambaka, and A. Abdelgawad, “Internet of things: Security and solutions survey,”Sensors, vol. 22, no. 19, p. 7433, 2022

  2. [2]

    Security and internet of things: benefits, challenges, and future perspectives,

    H. Taherdoost, “Security and internet of things: benefits, challenges, and future perspectives,”Electronics, vol. 12, no. 8, p. 1901, 2023

  3. [3]

    Internet of things in industry: Research profiling, application, challenges and opportunities—a review,

    K. W ´ojcicki, M. Biega ´nska, B. Paliwoda, and J. G ´orna, “Internet of things in industry: Research profiling, application, challenges and opportunities—a review,”Energies, vol. 15, no. 5, p. 1806, 2022

  4. [4]

    The internet of things for logistics: Perspectives, application review, and challenges,

    H. Tran-Dang, N. Krommenacker, P. Charpentier, and D.-S. Kim, “The internet of things for logistics: Perspectives, application review, and challenges,”IETE Technical Review, vol. 39, no. 1, pp. 93–121, 2022

  5. [5]

    Blockchain based solutions to mitigate distributed denial of service (ddos) attacks in the internet of things (iot): A survey,

    Z. Shah, I. Ullah, H. Li, A. Levula, and K. Khurshid, “Blockchain based solutions to mitigate distributed denial of service (ddos) attacks in the internet of things (iot): A survey,”Sensors, vol. 22, no. 3, p. 1094, 2022

  6. [6]

    Enhanced method of ann based model for detection of ddos attacks on multimedia internet of things,

    R. Gopi, V . Sathiyamoorthi, S. Selvakumar, R. Manikandan, P. Chat- terjee, N. Jhanjhi, and A. K. Luhach, “Enhanced method of ann based model for detection of ddos attacks on multimedia internet of things,” Multimedia Tools and Applications, pp. 1–19, 2022

  7. [7]

    Collaborative prediction and detection of ddos attacks in edge computing: A deep learning-based approach with distributed sdn,

    H. Zhou, Y . Zheng, X. Jia, and J. Shu, “Collaborative prediction and detection of ddos attacks in edge computing: A deep learning-based approach with distributed sdn,”Computer Networks, vol. 225, p. 109642, 2023

  8. [8]

    The internet of things security: A survey en- compassing unexplored areas and new insights,

    A. E. Omolara, A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alab- dulatif, H. Arshadet al., “The internet of things security: A survey en- compassing unexplored areas and new insights,”Computers & Security, vol. 112, p. 102494, 2022

  9. [9]

    Towards detection of ddos attacks in iot with optimal features selection,

    P. Kumari, A. K. Jain, Y . Pal, K. Singh, and A. Singh, “Towards detection of ddos attacks in iot with optimal features selection,”Wireless Personal Communications, vol. 137, no. 2, pp. 951–976, 2024

  10. [10]

    Internet of things intrusion de- tection systems: a comprehensive review and future directions,

    A. Heidari and M. A. Jabraeil Jamali, “Internet of things intrusion de- tection systems: a comprehensive review and future directions,”Cluster Computing, vol. 26, no. 6, pp. 3753–3780, 2023

  11. [11]

    Explainable intrusion detection for cyber defences in the internet of things: Opportunities and solutions,

    N. Moustafa, N. Koroniotis, M. Keshk, A. Y . Zomaya, and Z. Tari, “Explainable intrusion detection for cyber defences in the internet of things: Opportunities and solutions,”IEEE Communications Surveys & Tutorials, vol. 25, no. 3, pp. 1775–1807, 2023

  12. [12]

    A survey on iot intrusion detection: Federated learning, game theory, social psychology, and explainable ai as future directions,

    S. Arisdakessian, O. A. Wahab, A. Mourad, H. Otrok, and M. Guizani, “A survey on iot intrusion detection: Federated learning, game theory, social psychology, and explainable ai as future directions,”IEEE Internet of Things Journal, vol. 10, no. 5, pp. 4059–4092, 2022

  13. [13]

    Intrusion detection system for industrial internet of things based on deep reinforcement learning,

    S. Tharewal, M. W. Ashfaque, S. S. Banu, P. Uma, S. M. Hassen, and M. Shabaz, “Intrusion detection system for industrial internet of things based on deep reinforcement learning,”Wireless Communications and Mobile Computing, vol. 2022, no. 1, p. 9023719, 2022

  14. [14]

    Security defense strategy algorithm for internet of things based on deep reinforcement learning,

    X. Feng, J. Han, R. Zhang, S. Xu, and H. Xia, “Security defense strategy algorithm for internet of things based on deep reinforcement learning,” High-Confidence Computing, vol. 4, no. 1, p. 100167, 2024

  15. [15]

    Deep reinforcement learning for intrusion detection in internet of things: Best practices, lessons learnt, and open challenges,

    A. Rizzardi, S. Sicari, A. C. Porisiniet al., “Deep reinforcement learning for intrusion detection in internet of things: Best practices, lessons learnt, and open challenges,”Computer Networks, vol. 236, p. 110016, 2023

  16. [16]

    Review and analysis of recent advances in intelligent network softwarization for the internet of things,

    M. A. Zormati, H. Lakhlef, and S. Ouni, “Review and analysis of recent advances in intelligent network softwarization for the internet of things,” Computer Networks, p. 110215, 2024

  17. [17]

    A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations,

    Z. Zhao, L. Alzubaidi, J. Zhang, Y . Duan, and Y . Gu, “A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations,”Expert Systems with Appli- cations, vol. 242, p. 122807, 2024

  18. [18]

    Unsupervised learning,

    M. J. Neuer, “Unsupervised learning,” inMachine Learning for Engi- neers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications. Springer, 2024, pp. 141–172

  19. [19]

    Machine learning in real-time internet of things (iot) systems: A survey,

    J. Bian, A. Al Arafat, H. Xiong, J. Li, L. Li, H. Chen, J. Wang, D. Dou, and Z. Guo, “Machine learning in real-time internet of things (iot) systems: A survey,”IEEE Internet of Things Journal, vol. 9, no. 11, pp. 8364–8386, 2022

  20. [20]

    Unsupervised deep learning for iot time series,

    Y . Liu, Y . Zhou, K. Yang, and X. Wang, “Unsupervised deep learning for iot time series,”IEEE Internet of Things Journal, vol. 10, no. 16, pp. 14 285–14 306, 2023

  21. [21]

    Exploring machine learning solutions for overcoming challenges in iot-based wireless sensor network routing: a comprehen- sive review,

    R. Priyadarshi, “Exploring machine learning solutions for overcoming challenges in iot-based wireless sensor network routing: a comprehen- sive review,”Wireless Networks, pp. 1–27, 2024

  22. [22]

    Comparative review of supervised vs. unsupervised learning in cloud security applications,

    N. S. Kharbanda, “Comparative review of supervised vs. unsupervised learning in cloud security applications,” 2024

  23. [23]

    A collaborative stealthy ddos detection method based on reinforcement learning at the edge of internet of things,

    Y . Feng, W. Zhang, S. Yin, H. Tang, Y . Xiang, and Y . Zhang, “A collaborative stealthy ddos detection method based on reinforcement learning at the edge of internet of things,”IEEE Internet of Things Journal, vol. 10, no. 20, pp. 17 934–17 948, 2023

  24. [24]

    Enhancing ddos attack detection using snake optimizer with ensemble learning on internet of things environment,

    M. Aljebreen, H. A. Mengash, M. A. Arasi, S. S. Aljameel, A. S. Salama, and M. A. Hamza, “Enhancing ddos attack detection using snake optimizer with ensemble learning on internet of things environment,” IEEE Access, 2023

  25. [25]

    Early intru- sion detection system using honeypot for industrial control networks,

    A. Pashaei, M. E. Akbari, M. Z. Lighvan, and A. Charmin, “Early intru- sion detection system using honeypot for industrial control networks,” Results in Engineering, vol. 16, p. 100576, 2022

  26. [26]

    Real-time ddos flooding attack detection in intelligent transportation systems,

    H. Karthikeyan and G. Usha, “Real-time ddos flooding attack detection in intelligent transportation systems,”Computers and Electrical Engi- neering, vol. 101, p. 107995, 2022

  27. [27]

    Bellman operator convergence enhancements in reinforcement learning algorithms,

    D. K. Kadurha, D. J. L. Moutouo, and Y . U. Gaba, “Bellman operator convergence enhancements in reinforcement learning algorithms,”arXiv preprint arXiv:2505.14564, 2025

  28. [28]

    Analysis of variance (anova),

    L. St, S. Woldet al., “Analysis of variance (anova),”Chemometrics and intelligent laboratory systems, vol. 6, no. 4, pp. 259–272, 1989

  29. [29]

    Ddos attack detection in internet of things using recurrent neural network,

    O. Yousuf and R. N. Mir, “Ddos attack detection in internet of things using recurrent neural network,”Computers and Electrical Engineering, vol. 101, p. 108034, 2022

  30. [30]

    Performance analysis of entropy variation- based detection of ddos attacks in iot,

    N. Pandey and P. K. Mishra, “Performance analysis of entropy variation- based detection of ddos attacks in iot,”Internet of Things, vol. 23, p. 100812, 2023

  31. [31]

    A big data analytics for ddos attack detection using optimized ensemble framework in internet of things,

    I. Ahmad, Z. Wan, and A. Ahmad, “A big data analytics for ddos attack detection using optimized ensemble framework in internet of things,” Internet of Things, vol. 23, p. 100825, 2023

  32. [32]

    Robust detection of unknown dos/ddos attacks in iot networks using a hybrid learning model,

    X.-H. Nguyen and K.-H. Le, “Robust detection of unknown dos/ddos attacks in iot networks using a hybrid learning model,”Internet of Things, vol. 23, p. 100851, 2023

  33. [33]

    A deep cnn-based framework for distributed denial of services (ddos) attack detection in internet of things (iot),

    B. B. Gupta, A. Gaurav, V . Arya, and P. Kim, “A deep cnn-based framework for distributed denial of services (ddos) attack detection in internet of things (iot),” inProceedings of the 2023 international conference on research in adaptive and convergent systems, 2023, pp. 1–6

  34. [34]

    Ieee p2668-compliant multi-layer iot-ddos defense system using deep reinforcement learning,

    Y . Liu, K.-F. Tsang, C. K. Wu, Y . Wei, H. Wang, and H. Zhu, “Ieee p2668-compliant multi-layer iot-ddos defense system using deep reinforcement learning,”IEEE Transactions on Consumer Electronics, vol. 69, no. 1, pp. 49–64, 2022

  35. [35]

    Federated reinforcement learning based intrusion detection system using dynamic 16 attention mechanism,

    S. Vadigi, K. Sethi, D. Mohanty, S. P. Das, and P. Bera, “Federated reinforcement learning based intrusion detection system using dynamic 16 attention mechanism,”Journal of Information Security and Applications, vol. 78, p. 103608, 2023

  36. [36]

    Ambient intelligence approach: Internet of things based decision performance analysis for intrusion detection,

    T. Ramana, M. Thirunavukkarasan, A. S. Mohammed, G. G. Devarajan, and S. M. Nagarajan, “Ambient intelligence approach: Internet of things based decision performance analysis for intrusion detection,”Computer Communications, vol. 195, pp. 315–322, 2022

  37. [37]

    Decision model of intrusion response based on markov game in fog computing environment,

    X. Ma, Y . Li, and Y . Gao, “Decision model of intrusion response based on markov game in fog computing environment,”Wireless Networks, vol. 29, no. 8, pp. 3383–3392, 2023

  38. [38]

    Anti-attack scheme for edge devices based on deep reinforcement learning,

    R. Zhang, H. Xia, C. Liu, R.-b. Jiang, and X.-g. Cheng, “Anti-attack scheme for edge devices based on deep reinforcement learning,”Wireless Communications and Mobile Computing, vol. 2021, no. 1, p. 6619715, 2021

  39. [39]

    Malbot- drl: Malware botnet detection using deep reinforcement learning in iot networks,

    M. Al-Fawa’reh, J. Abu-Khalaf, P. Szewczyk, and J. J. Kang, “Malbot- drl: Malware botnet detection using deep reinforcement learning in iot networks,”IEEE Internet of Things Journal, 2023

  40. [40]

    Dual- objective reinforcement learning with novel hamilton-jacobi-bellman formulations,

    W. Sharpless, D. Hirsch, S. Tonkens, N. Shinde, and S. Herbert, “Dual- objective reinforcement learning with novel hamilton-jacobi-bellman formulations,”arXiv preprint arXiv:2506.16016, 2025

  41. [41]

    M. A. Vasfi and B. S. Ghahfarokhi, “Channel-hopping sequence genera- tion for blind rendezvous in cognitive radio-enabled internet of vehicles: A multi-agent twin delayed deep deterministic policy gradient-based method,”Computer Communications, p. 108318, 2025

  42. [42]

    Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks,

    S. Bi, L. Huang, H. Wang, and Y .-J. A. Zhang, “Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks,”IEEE Transactions on Wireless Communications, vol. 20, no. 11, pp. 7519–7537, 2021

  43. [43]

    Real-time data collection and trajectory scheduling using a drl–lagrangian framework in multiple uavs collaborative com- munication systems,

    S. Wang and Z. Luo, “Real-time data collection and trajectory scheduling using a drl–lagrangian framework in multiple uavs collaborative com- munication systems,”Remote Sensing, vol. 16, no. 23, p. 4378, 2024

  44. [44]

    Policy learning with constraints in model- free reinforcement learning: A survey,

    Y . Liu, A. Halev, and X. Liu, “Policy learning with constraints in model- free reinforcement learning: A survey,” inThe 30th international joint conference on artificial intelligence (ijcai), 2021

  45. [45]

    Towards pareto-optimal energy management in integrated energy systems: A multi-agent and multi-objective deep reinforcement learning approach,

    J. Dou, X. Wang, Z. Liu, Q. Sun, X. Wang, and J. He, “Towards pareto-optimal energy management in integrated energy systems: A multi-agent and multi-objective deep reinforcement learning approach,” International Journal of Electrical Power & Energy Systems, vol. 159, p. 110022, 2024

  46. [46]

    Improve robustness of reinforcement learning against observation perturbations via l∞lipschitz policy net- works,

    B. Nie, J. Ji, Y . Fu, and Y . Gao, “Improve robustness of reinforcement learning against observation perturbations via l∞lipschitz policy net- works,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 13, 2024, pp. 14 457–14 465

  47. [47]

    Multi-agent reinforcement learning-based dis- tributed cooperative voltage control,

    W. Xu and A. Kamsin, “Multi-agent reinforcement learning-based dis- tributed cooperative voltage control,” in2025 6th International Confer- ence on Electrical Technology and Automatic Control (ICETAC). IEEE, 2025, pp. 474–477

  48. [48]

    Multi-objective optimization of energy saving and throughput in heterogeneous networks using deep reinforcement learning,

    K. Ryu and W. Kim, “Multi-objective optimization of energy saving and throughput in heterogeneous networks using deep reinforcement learning,”Sensors, vol. 21, no. 23, p. 7925, 2021

  49. [49]

    Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset,

    N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset,”Future Generation Computer Systems, vol. 100, pp. 779–796, 2019

  50. [50]

    Label-free fault detection scheme for inverters of pv systems: Deep reinforcement learning-based dynamic threshold,

    G. Seo, S. Yoon, J. Song, E. Srivastava, and E. Hwang, “Label-free fault detection scheme for inverters of pv systems: Deep reinforcement learning-based dynamic threshold,”Applied Sciences, vol. 13, no. 4, p. 2470, 2023

  51. [51]

    A transformer-based network intrusion detection approach for cloud se- curity,

    Z. Long, H. Yan, G. Shen, X. Zhang, H. He, and L. Cheng, “A transformer-based network intrusion detection approach for cloud se- curity,”Journal of Cloud Computing, vol. 13, no. 1, p. 5, 2024

  52. [52]

    Scalable graph- aware edge representation learning for wireless iot intrusion detection,

    Z. Jiang, J. Li, Q. Hu, W. Meng, W. Pedrycz, and Z. Su, “Scalable graph- aware edge representation learning for wireless iot intrusion detection,” IEEE Internet of Things Journal, vol. 11, no. 16, pp. 26 955–26 969, 2024

  53. [53]

    Ids-int: Intrusion detection system using transformer-based transfer learning for imbalanced network traffic. digit commun netw 10 (1): 190–204,

    F. Ullah, S. Ullah, G. Srivastava, and J. Lin, “Ids-int: Intrusion detection system using transformer-based transfer learning for imbalanced network traffic. digit commun netw 10 (1): 190–204,” 2024

  54. [54]

    Harnessing kali linux for advanced penetration testing and cybersecurity threat mitigation,

    V . Yarlagadda, S. Kumar, R. Anumandla, S. Charan, R. Vennapusa, and C. Wholesale, “Harnessing kali linux for advanced penetration testing and cybersecurity threat mitigation,”J. Comput. Digit. Technol., no. April, 2024

  55. [55]

    Opti- mization of rbf-svm kernel using grid search algorithm for ddos attack detection in sdn-based vanet,

    G. O. Anyanwu, C. I. Nwakanma, J.-M. Lee, and D.-S. Kim, “Opti- mization of rbf-svm kernel using grid search algorithm for ddos attack detection in sdn-based vanet,”IEEE Internet of Things Journal, vol. 10, no. 10, pp. 8477–8490, 2022

  56. [56]

    Sampling complexity of td and ppo in rkhs,

    L. Zou, W. Ren, W. Zhang, L. Ding, and S. Li, “Sampling complexity of td and ppo in rkhs,”arXiv preprint arXiv:2509.24991, 2025

  57. [57]

    Statistical efficiency of distribu- tional temporal difference learning and freedman’s inequality in hilbert spaces,

    Y . Peng, L. Zhang, and Z. Zhang, “Statistical efficiency of distribu- tional temporal difference learning and freedman’s inequality in hilbert spaces,”arXiv preprint arXiv:2403.05811, 2024. 17 X. APPENDIXOVERVIEW This appendix provides the mathematical derivations un- derpinning the proposed dual-solution DRL-based IDS, DeepEdgeIDS, and AutoDRL-IDS. We f...

  58. [58]

    DeepEdgeIDS exhibits provable stability, diffusion-regularized contraction, convex sustainability coupling, and sublinear re- gret with bounded carbon dynamics

    Sample Bound For a reproducing kernel Hilbert [57] spaceH k: |QT −Q⋆| ≤ ˜O   s log det(I+ 1 σ2 KT ) T   ,R (D) dyn (T)≤ ˜O( √ ST). DeepEdgeIDS exhibits provable stability, diffusion-regularized contraction, convex sustainability coupling, and sublinear re- gret with bounded carbon dynamics. XII. AUTODRL-IDS A. LSTM Encoding and Supervised-DRL Coupling...