Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Aron Laszka; Chao Yan; Liang Tong; Ning Zhang; Yevgeniy Vorobeychik

arxiv: 1906.08805 · v1 · pith:ZRQEEXS6new · submitted 2019-06-20 · 💻 cs.CR · cs.AI· cs.GT

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Liang Tong , Aron Laszka , Chao Yan , Ning Zhang , Yevgeniy Vorobeychik This is my paper

Pith reviewed 2026-05-25 19:27 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.GT

keywords alert prioritizationadversarial reinforcement learninggame theorysecurityfraud detectionintrusion detectiondouble oraclestochastic policy

0 comments

The pith

Modeling alert prioritization as a game against a state-aware adaptive attacker and solving it with adversarial reinforcement learning produces a robust stochastic defender policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that alert prioritization can be framed as a repeated game in which the attacker observes the full detection state and selects attacks to maximize impact against the current policy. Neural reinforcement learning computes approximate best responses for each side, which are then fed into a double-oracle loop to reach an approximate equilibrium. The equilibrium strategy is a stochastic policy that tells the defender which alerts to investigate at each state. If correct, this policy remains effective even when attackers adapt dynamically, unlike static scores or heuristics that attackers can learn to evade.

Core claim

The central claim is that the interaction between defender and attacker can be captured in a game-theoretic model, after which an adversarial reinforcement learning procedure—neural RL best-response oracles inside a double-oracle loop—yields an approximate Nash equilibrium whose defender component is a robust stochastic alert-prioritization policy, shown to be effective in fraud-detection and intrusion-detection case studies.

What carries the argument

Adversarial reinforcement learning framework that alternates neural-network best-response computation for defender and attacker with a double-oracle procedure to approximate equilibrium in the alert-prioritization game.

If this is right

The defender obtains a stochastic policy that specifies investigation probabilities for each alert type as a function of observed state.
The policy remains effective against attackers who choose attacks dynamically to exploit the prioritization rule.
The same procedure can be instantiated for different detection domains by changing only the state representation and payoff functions.
Heuristic prioritization scores are replaced by an equilibrium strategy that explicitly accounts for the attacker's best response.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be applied to any detection setting in which the monitored system state is observable to an adaptive adversary.
If the double-oracle loop converges slowly in larger state spaces, hybrid methods that seed the oracles with domain heuristics may be needed.
Live deployment would require periodic re-solving as the underlying attack distribution or detection features drift.

Load-bearing premise

The defender-attacker interaction can be accurately represented as a game in which the attacker knows the full detection-system state and selects attacks optimally in response to the defender's current policy.

What would settle it

A controlled experiment in which an attacker using a strategy outside the modeled game or a policy computed by the double-oracle loop fails to improve its payoff relative to a heuristic baseline would falsify the claim that the resulting defender policy is robust.

Figures

Figures reproduced from arXiv: 1906.08805 by Aron Laszka, Chao Yan, Liang Tong, Ning Zhang, Yevgeniy Vorobeychik.

**Figure 1.** Figure 1: System model. The Attack Oracle computes the attacker’s policy for executing attacks, which is implemented by the Attack Generator and then triggers alerts observed by the Attack Detection Environment. The Defense Oracle computes the defender’s alert prioritization policy, which is implemented by the Alert Analyzer. of alerts in real systems are in fact false positives, any unidentified true positives in t… view at source ↗

**Figure 2.** Figure 2: The game solver based on the double oracle algorithm. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: The interactions among actor, critic and environment. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Intrusion detection: loss of the defender when it knows the attack [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Intrusion detection: loss of the defender when it is uncertain of the [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Intrusion detection: loss of the defender when it has different estimates [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Intrusion detection: loss of the defender when it is certain of the [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 9.** Figure 9: Fraud detection: loss of the defender when it is uncertain of the attack [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Fraud detection: loss of the defender when it has different estimates [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 8.** Figure 8: Fraud detection: loss of the defender when it knows the attack budget. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 11.** Figure 11: Fraud detection: loss of the defender when it is certain of the attack [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 12.** Figure 12: Computational cost. Left: Number of double oracle iterations in [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

read the original abstract

Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies double-oracle adversarial RL to alert prioritization but offers no guarantees on how close the result is to equilibrium.

read the letter

The one thing to know is that this paper frames alert prioritization as a game against an adaptive attacker and solves for an approximate equilibrium policy using neural RL best responses in a double-oracle framework. The experiments on fraud and intrusion detection are the evidence that this produces robust policies. What the paper does well is to make the interaction explicit in the game model and then use existing RL tools to handle the complexity of finding best responses to stochastic policies. That is a reasonable way to get a dynamic policy instead of a fixed score. The experiments are described as extensive, which suggests they put in the work to test it. The main soft spot is the lack of any analysis around the quality of the double-oracle approximation. Since the oracles are neural RL, they can have variance and incomplete coverage, and the paper does not report how they checked convergence or bounded the error. This means the robustness conclusion is tied to the case study outcomes rather than to a verified property of the solver. If the case studies are representative, it may still be useful, but the claim is weaker than it sounds. This paper is for researchers in security who want to incorporate attacker adaptation into their alert systems. It would be of interest to a reading group focused on applied RL or game theory in security. It deserves peer review because the approach is novel in this domain and the experiments provide a starting point for evaluation, even with the gaps in the theoretical support for the approximation. I would send it out.

Referee Report

2 major / 1 minor

Summary. The paper models defender-attacker interaction in alert prioritization as a two-player game in which the attacker knows the full detection state and chooses attacks dynamically. It computes best-response policies via neural RL oracles and iterates them inside a double-oracle loop to produce an approximate equilibrium stochastic policy for the defender; effectiveness is asserted via case studies on fraud and intrusion detection.

Significance. If the empirical outcomes survive proper controls and the approximation quality can be characterized, the framework would supply a principled route to robust stochastic prioritization policies that explicitly anticipate adaptive adversaries, moving beyond heuristic scoring methods.

major comments (2)

[Abstract and Experiments section] Abstract and Experiments section: the claim that the approach 'is effective in creating robust alert prioritization policies' is supported only by case-study outcomes; the manuscript supplies no description of baselines, statistical tests, train/test splits, or the concrete payoff matrices used to instantiate the game in the fraud and intrusion scenarios, rendering the central empirical claim unverifiable from the given text.
[§3 (Adversarial RL and Double-Oracle Framework)] §3 (Adversarial RL and Double-Oracle Framework): the robustness conclusion rests on the double-oracle procedure yielding an approximate equilibrium, yet the text provides no iteration counts, oracle accuracy diagnostics, convergence criteria, or distance-to-equilibrium bounds for the neural best-response oracles; without such analysis the policy's claimed robustness to the modeled adaptive attacker is unsupported beyond the reported case studies.

minor comments (1)

[§2] The description of state and action spaces in the game model would benefit from an explicit tabular summary of dimensions and feature encodings used in each case study.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. The points raised identify areas where the manuscript would benefit from greater transparency in the experimental setup and algorithmic details. We address each major comment below and will incorporate revisions to strengthen the paper.

read point-by-point responses

Referee: [Abstract and Experiments section] Abstract and Experiments section: the claim that the approach 'is effective in creating robust alert prioritization policies' is supported only by case-study outcomes; the manuscript supplies no description of baselines, statistical tests, train/test splits, or the concrete payoff matrices used to instantiate the game in the fraud and intrusion scenarios, rendering the central empirical claim unverifiable from the given text.

Authors: We agree that the experimental claims require more explicit supporting details to be verifiable. The full manuscript contains case-study descriptions for fraud and intrusion detection, but we acknowledge that baselines (such as standard heuristic scoring), statistical tests, train/test splits, and concrete payoff matrices are not sufficiently described. In the revision we will expand the Experiments section to include these elements, making the evaluation reproducible and the effectiveness claims directly verifiable from the text. revision: yes
Referee: [§3 (Adversarial RL and Double-Oracle Framework)] §3 (Adversarial RL and Double-Oracle Framework): the robustness conclusion rests on the double-oracle procedure yielding an approximate equilibrium, yet the text provides no iteration counts, oracle accuracy diagnostics, convergence criteria, or distance-to-equilibrium bounds for the neural best-response oracles; without such analysis the policy's claimed robustness to the modeled adaptive attacker is unsupported beyond the reported case studies.

Authors: We accept that §3 would be strengthened by quantitative details on the double-oracle procedure. The current text describes the framework at a high level but does not report iteration counts, neural oracle accuracy, convergence criteria, or equilibrium-distance bounds. We will revise §3 to include these diagnostics (e.g., number of double-oracle iterations performed, validation accuracy of the RL oracles, and any empirical or theoretical convergence measures), thereby providing direct support for the approximate-equilibrium claim. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic approximation procedure with independent empirical validation

full rationale

The paper models defender-attacker interaction as a game, computes approximate best responses via neural RL oracles, and iterates via double-oracle to produce a stochastic policy. None of these steps reduce the claimed equilibrium policy to a quantity defined in terms of itself, a fitted parameter renamed as prediction, or a self-citation chain. The derivation is an explicit computational procedure whose robustness claim is supported by case-study experiments rather than by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no numerical parameters, invented entities, or formal axioms beyond the high-level modeling choice; the ledger is therefore minimal.

axioms (1)

domain assumption The interaction between defender and attacker can be captured in a game theoretic model.
Stated explicitly as the first step of the approach in the abstract.

pith-pipeline@v0.9.0 · 5806 in / 1259 out tokens · 27766 ms · 2026-05-25T19:27:24.298109+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PACT: Reducing Alert Fatigue in Low-Prevalence SOC Streams with Triggered Active Learning
cs.CR 2026-05 unverdicted novelty 5.0

PACT reduces benign-normalized false-positive burden by 43% and 21% on AIT-ADS and BOTSv1 benchmarks versus a frozen baseline while issuing 3.8x–5.2x fewer analyst queries than random updating.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

TensorFlow: A system for large-scale machine learning,

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V . Vasudevan, P. Warden, M. Wicke, Y . Yu, and X. Zheng, “TensorFlow: A system for large-scale machine learning,” in Proceedings of the 12th USENIX Symposium on Operating S...

work page 2016
[2]

FuzMet: A fuzzy-logic based alert prioritization engine for intrusion detection systems,

K. Alsubhi, I. Aib, and R. Boutaba, “FuzMet: A fuzzy-logic based alert prioritization engine for intrusion detection systems,” International Journal of Network Management , vol. 22, no. 4, pp. 263–284, 2012

work page 2012
[3]

A deployed quantal response-based patrol planning system for the US Coast Guard,

B. An, F. Ord ´o˜nez, M. Tambe, E. Shieh, R. Yang, C. Baldwin, J. DiRenzo III, K. Moretti, B. Maule, and G. Meyer, “A deployed quantal response-based patrol planning system for the US Coast Guard,” Interfaces, vol. 43, no. 5, pp. 400–420, 2013

work page 2013
[4]

A distributional perspec- tive on reinforcement learning,

M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspec- tive on reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning (ICML) – Volume 70 . JMLR, 2017, pp. 449–458

work page 2017
[5]

C. M. Bishop, Pattern Recognition and Machine Learning , ser. Infor- mation Science and Statistics. Springer, 2011

work page 2011
[6]

Audit games,

J. Blocki, N. Christin, A. Datta, A. D. Procaccia, and A. Sinha, “Audit games,” in Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence (IJCAI) , ser. IJCAI ’13. AAAI Press, 2013, pp. 41–47. [Online]. Available: http://dl.acm.org/citation.cfm?id=2540128. 2540137

work page 2013
[7]

Audit games with multiple defender resources,

——, “Audit games with multiple defender resources,” in Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence , 2015. 14

work page 2015
[8]

A survey of data mining and machine learning methods for cyber security intrusion detection,

A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Commu- nications Surveys & Tutorials , vol. 18, no. 2, pp. 1153–1176, 2016

work page 2016
[9]

Noisy networks for exploration,

M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V . Mnih, R. Munos, D. Hassabis, O. Pietquin et al. , “Noisy networks for exploration,” arXiv preprint arXiv:1706.10295 , 2017

work page arXiv 2017
[10]

Understanding the difﬁculty of training deep feedforward neural networks,

X. Glorot and Y . Bengio, “Understanding the difﬁculty of training deep feedforward neural networks,” in Proceedings of the 13th international conference on artiﬁcial intelligence and statistics (AISTAT) , 2010, pp. 249–256

work page 2010
[11]

Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034

work page 2015
[12]

Rainbow: Combining improvements in deep reinforcement learning,

M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dab- ney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence , ser. AAAI, 2018

work page 2018
[13]

Detecting credential spearphishing in enterprise settings,

G. Ho, A. Sharma, M. Javed, V . Paxson, and D. Wagner, “Detecting credential spearphishing in enterprise settings,” in Proceedings of the 26th USENIX Security Symposium (USENIX Security) , 2017, pp. 469– 485

work page 2017
[14]

Nash Q-learning for general-sum stochastic games,

J. Hu and M. P. Wellman, “Nash Q-learning for general-sum stochastic games,” Journal of Machine Learning Research , vol. 4, no. Nov, pp. 1039–1069, 2003

work page 2003
[15]

Multiagent reinforcement learning: theoretical framework and an algorithm,

J. Hu, M. P. Wellman et al. , “Multiagent reinforcement learning: theoretical framework and an algorithm,” in Proceedings of the 15th International Conference on Machine Learning (ICML) , vol. 98, 1998, pp. 242–250

work page 1998
[16]

False alarm minimization tech- niques in signature-based intrusion detection systems: A survey,

N. Hubballi and V . Suryanarayanan, “False alarm minimization tech- niques in signature-based intrusion detection systems: A survey,” Com- puter Communications, vol. 49, pp. 1–17, 2014

work page 2014
[17]

Stackelberg vs. Nash in security games: An extended investigation of interchangeability, equivalence, and uniqueness,

D. Korzhyk, Z. Yin, C. Kiekintveld, V . Conitzer, and M. Tambe, “Stackelberg vs. Nash in security games: An extended investigation of interchangeability, equivalence, and uniqueness,” Journal of Artiﬁcial Intelligence Research, vol. 41, pp. 297–327, 2011

work page 2011
[18]

A uniﬁed game-theoretic approach to multi- agent reinforcement learning,

M. Lanctot, V . Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. P ´erolat, D. Silver, and T. Graepel, “A uniﬁed game-theoretic approach to multi- agent reinforcement learning,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS) , 2017, pp. 4193–4206

work page 2017
[19]

A game-theoretic approach for alert prioritization,

A. Laszka, Y . V orobeychik, D. Fabbri, C. Yan, and B. Malin, “A game-theoretic approach for alert prioritization,” in AAAI Workshop on Artiﬁcial Intelligence for Cyber Security (AICS) , Febrary 2017

work page 2017
[20]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[21]

Markov games as a framework for multi-agent reinforce- ment learning,

M. L. Littman, “Markov games as a framework for multi-agent reinforce- ment learning,” in Proceedings of the 11th International Conference on International Conference on Machine Learning (ICML). Elsevier, 1994, pp. 157–163

work page 1994
[22]

Friend-or-foe Q-learning in general-sum games,

——, “Friend-or-foe Q-learning in general-sum games,” in Proceedings of the 18th International Conference on Machine Learning (ICML) , vol. 1, 2001, pp. 322–328

work page 2001
[23]

Multi- agent actor-critic for mixed cooperative-competitive environments,

R. Lowe, Y . Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi- agent actor-critic for mixed cooperative-competitive environments,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS) , 2017, pp. 6382–6393

work page 2017
[24]

Game theory meets network security and privacy,

M. H. Manshaei, Q. Zhu, T. Alpcan, T. Bacs ¸ar, and J.-P. Hubaux, “Game theory meets network security and privacy,” ACM Computing Surveys (CSUR), vol. 45, no. 3, p. 25, 2013

work page 2013
[25]

Planning in the presence of cost functions controlled by an adversary,

H. B. McMahan, G. J. Gordon, and A. Blum, “Planning in the presence of cost functions controlled by an adversary,” in Proceedings of the 20th International Conference on Machine Learning (ICML) , 2003, p. 536543

work page 2003
[26]

Evaluating computer intrusion detection systems: A survey of common practices,

A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne, “Evaluating computer intrusion detection systems: A survey of common practices,” ACM Computing Surveys (CSUR), vol. 48, no. 1, p. 12, 2015

work page 2015
[27]

Asynchronous methods for deep reinforcement learning,

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proceedings of the 33rd International Con- ference on International Conference on Machine Learning (ICML) – Volume 48, 2016, pp. 1928–1937

work page 2016
[28]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing Atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[29]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015

work page 2015
[30]

A model-based survey of alert correlation techniques,

S. Salah, G. Maci ´a-Fern´andez, and J. E. D ´ıAz-Verdejo, “A model-based survey of alert correlation techniques,” Computer Networks , vol. 57, no. 5, pp. 1289–1317, 2013

work page 2013
[31]

Prioritized Experience Replay

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[32]

Don’t bury your head in warnings: A game-theoretic approach for intelligent allocation of cyber-security alerts,

A. Schlenker, H. Xu, M. Guirguis, C. Kiekintveld, A. Sinha, M. Tambe, S. Sonya, D. Balderas, and N. Dunstatter, “Don’t bury your head in warnings: A game-theoretic approach for intelligent allocation of cyber-security alerts,” in Proceedings of the 26th International Joint Conference on Artiﬁcial Intelligence (IJCAI) , 2017, pp. 381–387. [Online]. Availab...

work page doi:10.24963/ijcai.2017/54 2017
[33]

Toward generating a new intrusion detection dataset and intrusion trafﬁc charac- terization,

I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion trafﬁc charac- terization,” in Proceedings of the 4th International Conference on Infor- mation Systems Security and Privacy (ICISSP) – Volume 1 , INSTICC. SciTePress, 2018, pp. 108–116

work page 2018
[34]

Outside the closed world: On using machine learning for network intrusion detection,

R. Sommer and V . Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in 2010 IEEE symposium on security and privacy . IEEE, 2010, pp. 305–316

work page 2010
[35]

TD-Gammon, a self-teaching backgammon program, achieves master-level play,

G. Tesauro, “TD-Gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation, vol. 6, no. 2, pp. 215– 219, 1994

work page 1994
[36]

Security games for controlling contagion,

J. Tsai, T. H. Nguyen, and M. Tambe, “Security games for controlling contagion,” in Proceedings of the 26th AAAI Conference on Artiﬁcial Intelligence, ser. AAAI’12. AAAI Press, 2012, pp. 1464–1470. [Online]. Available: http://dl.acm.org/citation.cfm?id=2900929.2900936

work page arXiv 2012
[37]

Deep reinforcement learning with double Q-learning,

H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence , 2016

work page 2016
[38]

Taxonomy and survey of collaborative intrusion detection,

E. Vasilomanolakis, S. Karuppayah, M. M ¨uhlh¨auser, and M. Fischer, “Taxonomy and survey of collaborative intrusion detection,” ACM Computing Surveys (CSUR) , vol. 47, no. 4, p. 55, 2015

work page 2015
[39]

V orobeychik and M

Y . V orobeychik and M. Kantarcioglu, Adversarial Machine Learning . Morgan and Claypool, 2018

work page 2018
[40]

Dueling network architectures for deep reinforcement learning,

Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML) , 2016, pp. 1995–2003

work page 2016
[41]

Q-learning,

C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279–292, 1992

work page 1992
[42]

Learning from delayed rewards,

C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. disserta- tion, King’s College, Cambridge, 1989

work page 1989
[43]

Get your workload in order: Game theoretic prioritization of database auditing,

C. Yan, B. Li, Y . V orobeychik, A. Laszka, D. Fabbri, and B. Malin, “Get your workload in order: Game theoretic prioritization of database auditing,” in Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE) , April 2018, pp. 1304–1307. APPENDIX A. Best Response Oracle Algorithm The proposed algorithm to compute the best respons...

work page 2018

[1] [1]

TensorFlow: A system for large-scale machine learning,

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V . Vasudevan, P. Warden, M. Wicke, Y . Yu, and X. Zheng, “TensorFlow: A system for large-scale machine learning,” in Proceedings of the 12th USENIX Symposium on Operating S...

work page 2016

[2] [2]

FuzMet: A fuzzy-logic based alert prioritization engine for intrusion detection systems,

K. Alsubhi, I. Aib, and R. Boutaba, “FuzMet: A fuzzy-logic based alert prioritization engine for intrusion detection systems,” International Journal of Network Management , vol. 22, no. 4, pp. 263–284, 2012

work page 2012

[3] [3]

A deployed quantal response-based patrol planning system for the US Coast Guard,

B. An, F. Ord ´o˜nez, M. Tambe, E. Shieh, R. Yang, C. Baldwin, J. DiRenzo III, K. Moretti, B. Maule, and G. Meyer, “A deployed quantal response-based patrol planning system for the US Coast Guard,” Interfaces, vol. 43, no. 5, pp. 400–420, 2013

work page 2013

[4] [4]

A distributional perspec- tive on reinforcement learning,

M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspec- tive on reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning (ICML) – Volume 70 . JMLR, 2017, pp. 449–458

work page 2017

[5] [5]

C. M. Bishop, Pattern Recognition and Machine Learning , ser. Infor- mation Science and Statistics. Springer, 2011

work page 2011

[6] [6]

Audit games,

J. Blocki, N. Christin, A. Datta, A. D. Procaccia, and A. Sinha, “Audit games,” in Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence (IJCAI) , ser. IJCAI ’13. AAAI Press, 2013, pp. 41–47. [Online]. Available: http://dl.acm.org/citation.cfm?id=2540128. 2540137

work page 2013

[7] [7]

Audit games with multiple defender resources,

——, “Audit games with multiple defender resources,” in Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence , 2015. 14

work page 2015

[8] [8]

A survey of data mining and machine learning methods for cyber security intrusion detection,

A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Commu- nications Surveys & Tutorials , vol. 18, no. 2, pp. 1153–1176, 2016

work page 2016

[9] [9]

Noisy networks for exploration,

M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V . Mnih, R. Munos, D. Hassabis, O. Pietquin et al. , “Noisy networks for exploration,” arXiv preprint arXiv:1706.10295 , 2017

work page arXiv 2017

[10] [10]

Understanding the difﬁculty of training deep feedforward neural networks,

X. Glorot and Y . Bengio, “Understanding the difﬁculty of training deep feedforward neural networks,” in Proceedings of the 13th international conference on artiﬁcial intelligence and statistics (AISTAT) , 2010, pp. 249–256

work page 2010

[11] [11]

Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034

work page 2015

[12] [12]

Rainbow: Combining improvements in deep reinforcement learning,

M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dab- ney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence , ser. AAAI, 2018

work page 2018

[13] [13]

Detecting credential spearphishing in enterprise settings,

G. Ho, A. Sharma, M. Javed, V . Paxson, and D. Wagner, “Detecting credential spearphishing in enterprise settings,” in Proceedings of the 26th USENIX Security Symposium (USENIX Security) , 2017, pp. 469– 485

work page 2017

[14] [14]

Nash Q-learning for general-sum stochastic games,

J. Hu and M. P. Wellman, “Nash Q-learning for general-sum stochastic games,” Journal of Machine Learning Research , vol. 4, no. Nov, pp. 1039–1069, 2003

work page 2003

[15] [15]

Multiagent reinforcement learning: theoretical framework and an algorithm,

J. Hu, M. P. Wellman et al. , “Multiagent reinforcement learning: theoretical framework and an algorithm,” in Proceedings of the 15th International Conference on Machine Learning (ICML) , vol. 98, 1998, pp. 242–250

work page 1998

[16] [16]

False alarm minimization tech- niques in signature-based intrusion detection systems: A survey,

N. Hubballi and V . Suryanarayanan, “False alarm minimization tech- niques in signature-based intrusion detection systems: A survey,” Com- puter Communications, vol. 49, pp. 1–17, 2014

work page 2014

[17] [17]

Stackelberg vs. Nash in security games: An extended investigation of interchangeability, equivalence, and uniqueness,

D. Korzhyk, Z. Yin, C. Kiekintveld, V . Conitzer, and M. Tambe, “Stackelberg vs. Nash in security games: An extended investigation of interchangeability, equivalence, and uniqueness,” Journal of Artiﬁcial Intelligence Research, vol. 41, pp. 297–327, 2011

work page 2011

[18] [18]

A uniﬁed game-theoretic approach to multi- agent reinforcement learning,

M. Lanctot, V . Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. P ´erolat, D. Silver, and T. Graepel, “A uniﬁed game-theoretic approach to multi- agent reinforcement learning,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS) , 2017, pp. 4193–4206

work page 2017

[19] [19]

A game-theoretic approach for alert prioritization,

A. Laszka, Y . V orobeychik, D. Fabbri, C. Yan, and B. Malin, “A game-theoretic approach for alert prioritization,” in AAAI Workshop on Artiﬁcial Intelligence for Cyber Security (AICS) , Febrary 2017

work page 2017

[20] [20]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[21] [21]

Markov games as a framework for multi-agent reinforce- ment learning,

M. L. Littman, “Markov games as a framework for multi-agent reinforce- ment learning,” in Proceedings of the 11th International Conference on International Conference on Machine Learning (ICML). Elsevier, 1994, pp. 157–163

work page 1994

[22] [22]

Friend-or-foe Q-learning in general-sum games,

——, “Friend-or-foe Q-learning in general-sum games,” in Proceedings of the 18th International Conference on Machine Learning (ICML) , vol. 1, 2001, pp. 322–328

work page 2001

[23] [23]

Multi- agent actor-critic for mixed cooperative-competitive environments,

R. Lowe, Y . Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi- agent actor-critic for mixed cooperative-competitive environments,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS) , 2017, pp. 6382–6393

work page 2017

[24] [24]

Game theory meets network security and privacy,

M. H. Manshaei, Q. Zhu, T. Alpcan, T. Bacs ¸ar, and J.-P. Hubaux, “Game theory meets network security and privacy,” ACM Computing Surveys (CSUR), vol. 45, no. 3, p. 25, 2013

work page 2013

[25] [25]

Planning in the presence of cost functions controlled by an adversary,

H. B. McMahan, G. J. Gordon, and A. Blum, “Planning in the presence of cost functions controlled by an adversary,” in Proceedings of the 20th International Conference on Machine Learning (ICML) , 2003, p. 536543

work page 2003

[26] [26]

Evaluating computer intrusion detection systems: A survey of common practices,

A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne, “Evaluating computer intrusion detection systems: A survey of common practices,” ACM Computing Surveys (CSUR), vol. 48, no. 1, p. 12, 2015

work page 2015

[27] [27]

Asynchronous methods for deep reinforcement learning,

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proceedings of the 33rd International Con- ference on International Conference on Machine Learning (ICML) – Volume 48, 2016, pp. 1928–1937

work page 2016

[28] [28]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing Atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[29] [29]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015

work page 2015

[30] [30]

A model-based survey of alert correlation techniques,

S. Salah, G. Maci ´a-Fern´andez, and J. E. D ´ıAz-Verdejo, “A model-based survey of alert correlation techniques,” Computer Networks , vol. 57, no. 5, pp. 1289–1317, 2013

work page 2013

[31] [31]

Prioritized Experience Replay

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[32] [32]

Don’t bury your head in warnings: A game-theoretic approach for intelligent allocation of cyber-security alerts,

A. Schlenker, H. Xu, M. Guirguis, C. Kiekintveld, A. Sinha, M. Tambe, S. Sonya, D. Balderas, and N. Dunstatter, “Don’t bury your head in warnings: A game-theoretic approach for intelligent allocation of cyber-security alerts,” in Proceedings of the 26th International Joint Conference on Artiﬁcial Intelligence (IJCAI) , 2017, pp. 381–387. [Online]. Availab...

work page doi:10.24963/ijcai.2017/54 2017

[33] [33]

Toward generating a new intrusion detection dataset and intrusion trafﬁc charac- terization,

I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion trafﬁc charac- terization,” in Proceedings of the 4th International Conference on Infor- mation Systems Security and Privacy (ICISSP) – Volume 1 , INSTICC. SciTePress, 2018, pp. 108–116

work page 2018

[34] [34]

Outside the closed world: On using machine learning for network intrusion detection,

R. Sommer and V . Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in 2010 IEEE symposium on security and privacy . IEEE, 2010, pp. 305–316

work page 2010

[35] [35]

TD-Gammon, a self-teaching backgammon program, achieves master-level play,

G. Tesauro, “TD-Gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation, vol. 6, no. 2, pp. 215– 219, 1994

work page 1994

[36] [36]

Security games for controlling contagion,

J. Tsai, T. H. Nguyen, and M. Tambe, “Security games for controlling contagion,” in Proceedings of the 26th AAAI Conference on Artiﬁcial Intelligence, ser. AAAI’12. AAAI Press, 2012, pp. 1464–1470. [Online]. Available: http://dl.acm.org/citation.cfm?id=2900929.2900936

work page arXiv 2012

[37] [37]

Deep reinforcement learning with double Q-learning,

H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence , 2016

work page 2016

[38] [38]

Taxonomy and survey of collaborative intrusion detection,

E. Vasilomanolakis, S. Karuppayah, M. M ¨uhlh¨auser, and M. Fischer, “Taxonomy and survey of collaborative intrusion detection,” ACM Computing Surveys (CSUR) , vol. 47, no. 4, p. 55, 2015

work page 2015

[39] [39]

V orobeychik and M

Y . V orobeychik and M. Kantarcioglu, Adversarial Machine Learning . Morgan and Claypool, 2018

work page 2018

[40] [40]

Dueling network architectures for deep reinforcement learning,

Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML) , 2016, pp. 1995–2003

work page 2016

[41] [41]

Q-learning,

C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279–292, 1992

work page 1992

[42] [42]

Learning from delayed rewards,

C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. disserta- tion, King’s College, Cambridge, 1989

work page 1989

[43] [43]

Get your workload in order: Game theoretic prioritization of database auditing,

C. Yan, B. Li, Y . V orobeychik, A. Laszka, D. Fabbri, and B. Malin, “Get your workload in order: Game theoretic prioritization of database auditing,” in Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE) , April 2018, pp. 1304–1307. APPENDIX A. Best Response Oracle Algorithm The proposed algorithm to compute the best respons...

work page 2018