Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense
Pith reviewed 2026-05-25 12:20 UTC · model grok-4.3
The pith
Strategic learning enables three cyber defense schemes to converge to optimal policies under progressive uncertainty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the three defense schemes, facing parameter uncertainty, payoff uncertainty, and environmental uncertainty in turn, each use a distinct strategic learning scheme, yet all three schemes share an identical feedback structure of sensation, estimation, and actions. Through this structure the most rewarding policies are reinforced repeatedly, allowing them to converge autonomously and adaptively to the optimal policies even when the defender lacks complete knowledge of the attacker.
What carries the argument
The shared feedback structure of sensation, estimation, and actions that reinforces rewarding policies until convergence in each of the three learning schemes.
If this is right
- Defensive deception can detect and counter attacks while handling parameter uncertainty.
- Moving target defense can adapt dynamically under payoff uncertainty through feedback.
- Honeypot engagement can manage environmental uncertainty by learning from interactions.
- All schemes increase attacker costs while gathering threat information without full prior knowledge.
- The security benefits of the defenses can be weighed against their implementation costs.
Where Pith is reading between the lines
- The common feedback structure may allow the same learning pattern to be reused in other security settings that involve partial information about opponents.
- Convergence under uncertainty could support defenses that update in real time as attacker tactics evolve.
- The quantified security-cost tradeoff might guide decisions on when to deploy active measures versus static ones.
Load-bearing premise
The learning schemes can estimate the unknowns and reduce uncertainty enough for the policies to converge to the optimal ones.
What would settle it
A simulation or deployment test in which policies under one of the three schemes fail to converge when uncertainty stays high despite repeated application of the sensation-estimation-action feedback.
Figures
read the original abstract
The increasing instances of advanced attacks call for a new defense paradigm that is active, autonomous, and adaptive, named as the \texttt{`3A'} defense paradigm. This chapter introduces three defense schemes that actively interact with attackers to increase the attack cost and gather threat information, i.e., defensive deception for detection and counter-deception, feedback-driven Moving Target Defense (MTD), and adaptive honeypot engagement. Due to the cyber deception, external noise, and the absent knowledge of the other players' behaviors and goals, these schemes possess three progressive levels of information restrictions, i.e., from the parameter uncertainty, the payoff uncertainty, to the environmental uncertainty. To estimate the unknown and reduce uncertainty, we adopt three different strategic learning schemes that fit the associated information restrictions. All three learning schemes share the same feedback structure of sensation, estimation, and actions so that the most rewarding policies get reinforced and converge to the optimal ones in autonomous and adaptive fashions. This work aims to shed lights on proactive defense strategies, lay a solid foundation for strategic learning under incomplete information, and quantify the tradeoff between the security and costs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a '3A' (active, adaptive, autonomous) cyber defense paradigm and presents three schemes—defensive deception for detection/counter-deception, feedback-driven Moving Target Defense (MTD), and adaptive honeypot engagement—under progressive uncertainty levels (parameter, payoff, environmental). It adopts three strategic learning schemes that share a sensation-estimation-action feedback structure to reinforce rewarding policies and achieve convergence to optimal policies in an autonomous, adaptive manner, with the goal of increasing attack costs, gathering threat information, and quantifying security-cost tradeoffs.
Significance. If the convergence claims under incomplete information are rigorously established, the work would provide a conceptual foundation for proactive defense strategies and strategic learning in cyber settings with uncertainty, potentially influencing research on adaptive MTD and deception-based defenses.
major comments (1)
- [Abstract] Abstract: The central claim that the three learning schemes 'converge to the optimal ones' via the shared feedback structure is asserted without any derivations, equations, convergence proofs, simulation results, or validation steps; this is load-bearing for the paper's contribution on strategic learning under uncertainty.
Simulated Author's Rebuttal
We thank the referee for their review and the opportunity to clarify the manuscript. The concern regarding the abstract's assertion of convergence is addressed point-by-point below. The full paper provides the supporting analysis as summarized in the abstract.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the three learning schemes 'converge to the optimal ones' via the shared feedback structure is asserted without any derivations, equations, convergence proofs, simulation results, or validation steps; this is load-bearing for the paper's contribution on strategic learning under uncertainty.
Authors: The abstract is a concise summary of the paper's contributions and does not contain the full technical details, which is standard. The three strategic learning schemes are developed in detail in Sections 3 (defensive deception under parameter uncertainty), 4 (feedback-driven MTD under payoff uncertainty), and 5 (adaptive honeypot engagement under environmental uncertainty). Each section derives the sensation-estimation-action feedback loop, presents the specific learning algorithm (e.g., Q-learning variants or regret-matching), provides convergence proofs to Nash equilibria or optimal policies under the respective uncertainty model using stochastic approximation or Lyapunov analysis, and includes equations for value estimation and policy updates. Section 6 presents simulation results across multiple scenarios validating convergence rates, attack cost increases, and security-cost tradeoffs. If the referee prefers, we can revise the abstract to explicitly reference these sections. revision: partial
Circularity Check
No significant circularity identified
full rationale
The paper describes three defense schemes under progressive uncertainty levels and adopts corresponding strategic learning schemes that share a sensation-estimation-action feedback loop, with the claim that rewarding policies are reinforced to converge to optima. This is a high-level assertion aligned with standard reinforcement learning under incomplete information. No equations, fitted parameters presented as predictions, or load-bearing self-citations are exhibited in the text that would reduce the central claim to its inputs by construction. The derivation chain remains self-contained against external benchmarks such as RL convergence results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Strategic learning schemes can estimate unknown parameters, payoffs, or environments in cyber defense interactions and thereby reduce uncertainty sufficiently for policy convergence.
Reference graph
Works this paper leans on
-
[1]
Verizon 2019 data breach investigations report,
“Verizon 2019 data breach investigations report,” 2019
work page 2019
-
[2]
Combatting cyber risks in the supply chain,
D. Shackleford, “Combatting cyber risks in the supply chain,”SANS. org, 2015
work page 2015
-
[3]
Adaptive strategic cyber defense for advanced persistent threats in criticalinfrastructurenetworks,
L. Huang and Q. Zhu, “Adaptive strategic cyber defense for advanced persistent threats in criticalinfrastructurenetworks,” ACMSIGMETRICSPerformanceEvaluationReview ,vol.46, no. 2, pp. 52–56, 2019
work page 2019
-
[4]
——, “Analysis and computation of adaptive defense strategies against advanced persistent threatsforcyber-physicalsystems,”in InternationalConferenceonDecisionandGameTheory for Security. Springer, 2018, pp. 205–226
work page 2018
-
[5]
L. Huang and Q. Zhu, “A Dynamic Games Approach to Proactive Defense Strategies against AdvancedPersistentThreatsinCyber-PhysicalSystems,” arXive-prints,p.arXiv:1906.09687, Jun 2019
-
[6]
Game-theoretic approach to feedback-driven multi-stage moving target defense,
Q. Zhu and T. Başar, “Game-theoretic approach to feedback-driven multi-stage moving target defense,” inInternational Conference on Decision and Game Theory for Security. Springer, 2013, pp. 246–263. 22 Linan Huang and Quanyan Zhu
work page 2013
-
[7]
Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes,
L. Huang and Q. Zhu, “Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes,”arXiv e-prints, p. arXiv:1906.12182, Jun 2019
-
[8]
A Game-Theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy
J. Pawlick, E. Colbert, and Q. Zhu, “A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy,”arXiv preprint arXiv:1712.05441, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Integrating cyber-d&d into adversary modeling for active cyber defense,
F. J. Stech, K. E. Heckman, and B. E. Strom, “Integrating cyber-d&d into adversary modeling for active cyber defense,” inCyber deception. Springer, 2016, pp. 1–22
work page 2016
-
[10]
Activecyberdefensewithdenialanddeception:Acyber-wargameexperiment,
K. E. Heckman, M. J. Walsh, F. J. Stech, T. A. O’boyle, S. R. DiCato, and A. F. Herber, “Activecyberdefensewithdenialanddeception:Acyber-wargameexperiment,” computers& security, vol. 37, pp. 72–77, 2013
work page 2013
-
[11]
R-locker:Thwartingran- somware action through a honeyfile-based approach,
J.Gómez-Hernández,L.Álvarez-González,andP.García-Teodoro,“R-locker:Thwartingran- somware action through a honeyfile-based approach,”Computers & Security, vol. 73, pp. 389–398, 2018
work page 2018
-
[12]
Changing the game: The art of deceiving sophisticated attackers,
N. Virvilis, B. Vanautgaerden, and O. S. Serrano, “Changing the game: The art of deceiving sophisticated attackers,” in2014 6th International Conference On Cyber Conflict (CyCon 2014). IEEE, 2014, pp. 87–97
work page 2014
-
[13]
Modelingandanalysisofleakydeceptionusingsignaling games with evidence,
J.Pawlick,E.Colbert,andQ.Zhu,“Modelingandanalysisofleakydeceptionusingsignaling games with evidence,”IEEE Transactions on Information Forensics and Security, 2018
work page 2018
-
[14]
SpringerScience&BusinessMedia,2011,vol.54
S.Jajodia,A.K.Ghosh,V.Swarup,C.Wang,andX.S.Wang, Movingtargetdefense:creating asymmetricuncertaintyforcyberthreats . SpringerScience&BusinessMedia,2011,vol.54
work page 2011
-
[15]
Countering code-injection attacks with instruction-set randomization,
G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-injection attacks with instruction-set randomization,” inProceedings of the 10th ACM conference on Computer and communications security. ACM, 2003, pp. 272–280
work page 2003
-
[16]
Deceptive routing in relay networks,
A. Clark, Q. Zhu, R. Poovendran, and T. Başar, “Deceptive routing in relay networks,” in International Conference on Decision and Game Theory for Security. Springer, 2012, pp. 171–185
work page 2012
-
[17]
Markov modeling of moving target defense games,
H. Maleki, S. Valizadeh, W. Koch, A. Bestavros, and M. van Dijk, “Markov modeling of moving target defense games,” inProceedings of the 2016 ACM Workshop on Moving Target Defense. ACM, 2016, pp. 81–92
work page 2016
-
[18]
A methodology for intelligent honeypot deployment and active engagement of attackers,
C. R. Hecker, “A methodology for intelligent honeypot deployment and active engagement of attackers,” Ph.D. dissertation, 2012
work page 2012
-
[19]
Deceptive attack and defense game in honeypot-enablednetworksfortheinternetofthings,
Q. D. La, T. Q. Quek, J. Lee, S. Jin, and H. Zhu, “Deceptive attack and defense game in honeypot-enablednetworksfortheinternetofthings,” IEEEInternetofThingsJournal ,vol.3, no. 6, pp. 1025–1035, 2016
work page 2016
-
[20]
Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats
J. Pawlick, T. T. H. Nguyen, and Q. Zhu, “Optimal timing in dynamic and robust attacker engagement during advanced persistent threats,”CoRR, vol. abs/1707.08031, 2017. [Online]. Available: http://arxiv.org/abs/1707.08031
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
A Stackelberg game perspective on the conflict between machine learning and data obfuscation,
J. Pawlick and Q. Zhu, “A Stackelberg game perspective on the conflict between machine learning and data obfuscation,” in Information Forensics and Security (WIFS), 2016 IEEE International Workshop on . IEEE, 2016, pp. 1–6. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7823893/
-
[22]
Deployment and exploitation of deceptive honeybots in social networks,
Q. Zhu, A. Clark, R. Poovendran, and T. Basar, “Deployment and exploitation of deceptive honeybots in social networks,” inDecision and Control (CDC), 2013 IEEE 52nd Annual Conference on. IEEE, 2013, pp. 212–219
work page 2013
-
[23]
Hybrid learning in stochastic games and its applications in network security,
Q. Zhu, H. Tembine, and T. Basar, “Hybrid learning in stochastic games and its applications in network security,”Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, pp. 305–329, 2013
work page 2013
-
[24]
Interference aware routing game for cognitive radio multi-hop networks,
Q. Zhu, Z. Yuan, J. B. Song, Z. Han, and T. Başar, “Interference aware routing game for cognitive radio multi-hop networks,”Selected Areas in Communications, IEEE Journal on, vol. 30, no. 10, pp. 2006–2015, 2012
work page 2006
-
[25]
Q.Zhu,L.Bushnell,andT.Basar,“Game-theoreticanalysisofnodecaptureandcloningattack with multiple attackers in wireless sensor networks,” inDecision and Control (CDC), 2012 IEEE 51st Annual Conference on. IEEE, 2012, pp. 3404–3411
work page 2012
-
[26]
Q. Zhu, A. Clark, R. Poovendran, and T. Başar, “Deceptive routing games,” inDecision and Control (CDC), 2012 IEEE 51st Annual Conference on. IEEE, 2012, pp. 2704–2711. Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense 23
work page 2012
-
[27]
A stochastic game model for jamming in multi-channel cognitive radio systems
Q. Zhu, H. Li, Z. Han, and T. Basar, “A stochastic game model for jamming in multi-channel cognitive radio systems.” inICC, 2010, pp. 1–6
work page 2010
-
[28]
Secure and practical output feedback control for cloud-enabled cyber- physical systems,
Z. Xu and Q. Zhu, “Secure and practical output feedback control for cloud-enabled cyber- physical systems,” inCommunications and Network Security (CNS), 2017 IEEE Conference on. IEEE, 2017, pp. 416–420
work page 2017
-
[29]
——, “A Game-Theoretic Approach to Secure Control of Communication-Based Train Control Systems Under Jamming Attacks,” inProceedings of the 1st International Workshop on Safe Control of Connected and Autonomous Vehicles. ACM, 2017, pp. 27–34. [Online]. Available: http://dl.acm.org/citation.cfm?id=3055381
work page 2017
-
[30]
Cross-layer secure cyber-physical control system design for networked 3d printers,
——, “Cross-layer secure cyber-physical control system design for networked 3d printers,” in American Control Conference (ACC), 2016. IEEE, 2016, pp. 1191–1196. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7525079/
-
[31]
Modeling, analysis, and mitigation of dynamic botnet formation in wireless iot networks,
M. J. Farooq and Q. Zhu, “Modeling, analysis, and mitigation of dynamic botnet formation in wireless iot networks,”IEEE Transactions on Information Forensics and Security, 2019
work page 2019
-
[32]
A cyber-physical game framework for secure and resilient multi-agent autonomous systems,
Z. Xu and Q. Zhu, “A cyber-physical game framework for secure and resilient multi-agent autonomous systems,” inDecision and Control (CDC), 2015 IEEE 54th Annual Conference on. IEEE, 2015, pp. 5156–5161
work page 2015
-
[33]
Alarge-scalemarkovgameapproachtodynamicprotectionof interdependent infrastructure networks,
L.Huang,J.Chen,andQ.Zhu,“Alarge-scalemarkovgameapproachtodynamicprotectionof interdependent infrastructure networks,” inInternational Conference on Decision and Game Theory for Security. Springer, 2017, pp. 357–376
work page 2017
-
[34]
Adynamicgameanalysisanddesignofinfrastructurenetwork protection and recovery,
J.Chen,C.Touati,andQ.Zhu,“Adynamicgameanalysisanddesignofinfrastructurenetwork protection and recovery,”ACM SIGMETRICS Performance Evaluation Review, vol. 45, no. 2, p. 128, 2017
work page 2017
-
[35]
A hybrid stochastic game for secure control of cyber-physical systems,
F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas, “A hybrid stochastic game for secure control of cyber-physical systems,”Automatica, vol. 93, pp. 55–63, 2018
work page 2018
-
[36]
Resilient control of cyber-physical systems againstdenial-of-serviceattacks,
Y. Yuan, Q. Zhu, F. Sun, Q. Wang, and T. Basar, “Resilient control of cyber-physical systems againstdenial-of-serviceattacks,”in ResilientControlSystems(ISRCS),20136thInternational Symposium on. IEEE, 2013, pp. 54–59
work page 2013
-
[37]
S. Rass and Q. Zhu, “GADAPT: A Sequential Game-Theoretic Framework for Designing Defense-in-Depth Strategies Against Advanced Persistent Threats,” inDecision and Game TheoryforSecurity ,ser.LectureNotesinComputerScience,Q.Zhu,T.Alpcan,E.Panaousis, M. Tambe, and W. Casey, Eds. Cham: Springer International Publishing, 2016, vol. 9996, pp. 314–326
work page 2016
-
[38]
Dynamicinterferenceminimizationrouting game for on-demand cognitive pilot channel,
Q.Zhu,Z.Yuan,J.B.Song,Z.Han,andT.Basar,“Dynamicinterferenceminimizationrouting game for on-demand cognitive pilot channel,” inGlobal Telecommunications Conference (GLOBECOM 2010), 2010 IEEE. IEEE, 2010, pp. 1–6
work page 2010
-
[39]
Strategic defense against deceptive civilian gps spoofing of unmanned aerial vehicles,
T. Zhang and Q. Zhu, “Strategic defense against deceptive civilian gps spoofing of unmanned aerial vehicles,” inInternational Conference on Decision and Game Theory for Security. Springer, 2017, pp. 213–233
work page 2017
-
[40]
L. Huang and Q. Zhu, “Analysis and computation of adaptive defense strategies against ad- vancedpersistentthreatsforcyber-physicalsystems,”in InternationalConferenceonDecision and Game Theory for Security, 2018
work page 2018
-
[41]
Adaptivestrategiccyberdefenseforadvancedpersistentthreatsincriticalinfrastructure networks,
——,“Adaptivestrategiccyberdefenseforadvancedpersistentthreatsincriticalinfrastructure networks,” inACM SIGMETRICS Performance Evaluation Review, 2018
work page 2018
-
[42]
Flip the cloud: Cyber-physical signaling games in the presenceofadvancedpersistentthreats,
J. Pawlick, S. Farhang, and Q. Zhu, “Flip the cloud: Cyber-physical signaling games in the presenceofadvancedpersistentthreats,”in DecisionandGameTheoryforSecurity . Springer, 2015, pp. 289–308
work page 2015
-
[43]
A dynamic bayesian security game framework for strategic defense mechanism design,
S. Farhang, M. H. Manshaei, M. N. Esfahani, and Q. Zhu, “A dynamic bayesian security game framework for strategic defense mechanism design,” inDecision and Game Theory for Security. Springer, 2014, pp. 319–328
work page 2014
-
[44]
Dynamicpolicy-basedidsconfiguration,
Q.ZhuandT.Başar,“Dynamicpolicy-basedidsconfiguration,”in DecisionandControl,2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on. IEEE, 2009, pp. 8600–8605
work page 2009
-
[45]
Networksecurityconfigurations:Anonzero-sumstochastic gameapproach,
Q.Zhu,H.Tembine,andT.Basar,“Networksecurityconfigurations:Anonzero-sumstochastic gameapproach,”in AmericanControlConference(ACC),2010 . IEEE,2010,pp.1059–1064. 24 Linan Huang and Quanyan Zhu
work page 2010
-
[46]
Heterogeneouslearninginzero-sumstochasticgameswith incomplete information,
Q.Zhu,H.Tembine,andT.Başar,“Heterogeneouslearninginzero-sumstochasticgameswith incomplete information,” in49th IEEE conference on decision and control (CDC). IEEE, 2010, pp. 219–224
work page 2010
-
[47]
J. Chen and Q. Zhu, “Security as a Service for Cloud-Enabled Internet of Controlled Things under Advanced Persistent Threats: A Contract Design Approach,” IEEE Transactions on Information Forensics and Security , 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7954676/
-
[48]
ABi-LevelGameApproachtoAttack-AwareCyberInsurance ofComputerNetworks,
R.Zhang,Q.Zhu,andY.Hayel,“ABi-LevelGameApproachtoAttack-AwareCyberInsurance ofComputerNetworks,” IEEEJournalonSelectedAreasinCommunications ,vol.35,no.3,pp. 779–794, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7859343/
-
[49]
Attack-aware cyber insurance of interdependent computer networks,
R. Zhang and Q. Zhu, “Attack-aware cyber insurance of interdependent computer networks,” 2016
work page 2016
-
[50]
W. A. Casey, Q. Zhu, J. A. Morales, and B. Mishra, “Compliance control: Managed vulnera- bility surface in social-technological systems via signaling games,” inProceedings of the 7th ACM CCS International Workshop on Managing Insider Security Threats. ACM, 2015, pp. 53–62
work page 2015
-
[51]
Attack-aware cyber insurance for risk sharing in computer networks,
Y. Hayel and Q. Zhu, “Attack-aware cyber insurance for risk sharing in computer networks,” in Decision and Game Theory for Security. Springer, 2015, pp. 22–34
work page 2015
-
[52]
Epidemic protection over heterogeneous networks using evolutionary poisson games,
——, “Epidemic protection over heterogeneous networks using evolutionary poisson games,” IEEETransactionsonInformationForensicsandSecurity ,vol.12,no.8,pp.1786–1800,2017
work page 2017
-
[53]
Guidex:Agame-theoreticincentive-basedmech- anismforintrusiondetectionnetworks,
Q.Zhu,C.Fung,R.Boutaba,andT.Başar,“Guidex:Agame-theoreticincentive-basedmech- anismforintrusiondetectionnetworks,” SelectedAreasinCommunications,IEEEJournalon , vol. 30, no. 11, pp. 2220–2230, 2012
work page 2012
-
[54]
Tragedy of anticommons in digital right management of medical records
Q. Zhu, C. A. Gunter, and T. Basar, “Tragedy of anticommons in digital right management of medical records.” inHealthSec, 2012
work page 2012
-
[55]
Agame-theoreticalapproachtoincentivedesignin collaborative intrusion detection networks,
Q.Zhu,C.Fung,R.Boutaba,andT.Başar,“Agame-theoreticalapproachtoincentivedesignin collaborative intrusion detection networks,” inGame Theory for Networks, 2009. GameNets’
work page 2009
- [56]
-
[57]
A game theoretic investigation of deception in network security,
T. E. Carroll and D. Grosu, “A game theoretic investigation of deception in network security,” Security and Commun. Nets., vol. 4, no. 10, pp. 1162–1172, 2011
work page 2011
-
[58]
A Stackelberg game perspective on the conflict between machine learninganddataobfuscation,
J. Pawlick and Q. Zhu, “A Stackelberg game perspective on the conflict between machine learninganddataobfuscation,” IEEEIntl.WorkshoponInform.ForensicsandSecurity ,2016
work page 2016
-
[59]
DynamicdifferentialprivacyforADMM-baseddistributedclassification learning,
T.ZhangandQ.Zhu,“DynamicdifferentialprivacyforADMM-baseddistributedclassification learning,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 1, pp. 172–187, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7563366/
-
[60]
S.Farhang,Y.Hayel,andQ.Zhu,“Phy-layerlocationprivacy-preservingaccesspointselection mechanism in next-generation wireless networks,” inCommunications and Network Security (CNS), 2015 IEEE Conference on. IEEE, 2015, pp. 263–271
work page 2015
-
[61]
Distributedprivacy-preservingcollaborativeintrusiondetectionsystems for vanets,
T.ZhangandQ.Zhu,“Distributedprivacy-preservingcollaborativeintrusiondetectionsystems for vanets,”IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 1, pp. 148–161, 2018
work page 2018
-
[62]
gPath: A game-theoretic path selection algorithm to protect tor’s anonymity,
N. Zhang, W. Yu, X. Fu, and S. K. Das, “gPath: A game-theoretic path selection algorithm to protect tor’s anonymity,” inDecision and Game Theory for Security. Springer, 2010, pp. 58–71
work page 2010
-
[63]
Security games with unknown adversarial strategies,
A. Garnaev, M. Baykal-Gursoy, and H. V. Poor, “Security games with unknown adversarial strategies,”IEEE transactions on cybernetics, vol. 46, no. 10, pp. 2291–2299, 2015
work page 2015
-
[64]
Distributed strategic learning with application to network security,
Q. Zhu, H. Tembine, and T. Başar, “Distributed strategic learning with application to network security,” inProceedings of the 2011 American Control Conference. IEEE, 2011, pp. 4057– 4062
work page 2011
-
[65]
Multi-agentreinforcementlearningforintrusiondetection:Acase study and evaluation,
A.ServinandD.Kudenko,“Multi-agentreinforcementlearningforintrusiondetection:Acase study and evaluation,” inGerman Conference on Multiagent System Technologies. Springer, 2008, pp. 159–170
work page 2008
-
[66]
P.M.DjurićandY.Wang,“Distributedbayesianlearninginmultiagentsystems:Improvingour understanding of its capabilities and limitations,”IEEE Signal Processing Magazine, vol. 29, no. 2, pp. 65–76, 2012. Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense 25
work page 2012
-
[67]
Coordination in multiagent reinforcement learning: a bayesianapproach,
G. Chalkiadakis and C. Boutilier, “Coordination in multiagent reinforcement learning: a bayesianapproach,”in ProceedingsofthesecondinternationaljointconferenceonAutonomous agents and multiagent systems. ACM, 2003, pp. 709–716
work page 2003
-
[68]
Distributedreinforcementlearningforpowerlimitedmany-core system performance optimization,
Z.ChenandD.Marculescu,“Distributedreinforcementlearningforpowerlimitedmany-core system performance optimization,” inProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium, 2015, pp. 1521–1526
work page 2015
-
[69]
Games with incomplete information played by “bayesian
J. C. Harsanyi, “Games with incomplete information played by “bayesian” players, i–iii part i. the basic model,”Management science, vol. 14, no. 3, pp. 159–182, 1967
work page 1967
-
[70]
Transfer learning for reinforcement learning domains: A survey,
M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” Journal of Machine Learning Research, vol. 10, no. Jul, pp. 1633–1685, 2009
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.