pith. machine review for the scientific record. sign in

arxiv: 2603.07560 · v2 · submitted 2026-03-08 · 💻 cs.CR · cs.NI

Recognition: 2 theorem links

· Lean Theorem

Learning the APT Kill Chain: Temporal Reasoning over Provenance Data for Attack Stage Estimation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:20 UTC · model grok-4.3

classification 💻 cs.CR cs.NI
keywords APT stage estimationprovenance graphsgraph neural networksLSTM temporal modelingMITRE ATT&CK frameworkDARPA Transparent Computingfused host network dataattack progression inference
0
0 comments X

The pith

StageFinder uses graph neural networks and LSTM on fused provenance data to estimate APT attack stages at a macro F1-score of 0.96.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces StageFinder as a framework that processes provenance graphs from host and network sources to infer the current stage of an advanced persistent threat. A graph neural network captures structural links among processes, files, and connections while an LSTM tracks how those patterns evolve over time. Stages are aligned to the MITRE ATT&CK framework after pretraining on one DARPA dataset and fine-tuning on labeled Transparent Computing data. The resulting model outperforms prior baselines and produces more consistent stage predictions across attack sequences. If correct, defenders could respond with actions matched to the specific phase of an ongoing campaign rather than treating all activity as equivalent.

Core claim

StageFinder is a temporal-graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31% compared to state-of-the-art baselines.

What carries the argument

StageFinder temporal-graph framework that encodes provenance graphs with a graph neural network for structural dependencies and applies LSTM to model temporal evolution of attack stages.

If this is right

  • Stage probabilities aligned to MITRE ATT&CK enable defenders to select responses matched to the detected phase of an APT.
  • Reduced volatility of 31% produces more reliable stage estimates across consecutive time windows.
  • Fused host and network provenance data improves accuracy over models that use only one data type.
  • Pretraining on OpTC followed by fine-tuning on Transparent Computing data supports better generalization to new attack instances.
  • The framework demonstrates that temporal reasoning over provenance graphs can support automated attack stage estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fused provenance and temporal approach could be applied to detect stages in other sequential threat models such as ransomware campaigns.
  • Integrating additional real-time data streams like user behavior logs might further stabilize stage predictions during live operations.
  • If volatility reduction holds, the method could reduce false alerts in security operations centers that rely on stage-based alerting.
  • Extending the model to handle streaming provenance graphs without full retraining would support continuous monitoring in production environments.

Load-bearing premise

The labeled stages in the DARPA Transparent Computing dataset accurately reflect real-world APT progression and the model generalizes to unseen attacks without significant distribution shift.

What would settle it

Testing the trained StageFinder model on provenance traces from an APT campaign collected outside the DARPA datasets and checking whether macro F1 falls below 0.85.

Figures

Figures reproduced from arXiv: 2603.07560 by Thomas Bauschert, Trung V. Phan.

Figure 1
Figure 1. Figure 1: An example of an APT attack towards an enterprise network. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Data and control flow of the StageFinder framework. Host logs and network alerts are fused into a provenance graph, encoded by a GNN, and analyzed by an LSTM-based Stage Estimator, with the Attack Stage Mapping producing discrete APT stages. TABLE I EXAMPLE CAPTURED HOST LOG EVENTS CAPTURED BY SYSMON. Sysmon Event Description ProcessCreate powershell.exe launches wget.exe FileCreate wget.exe creates file p… view at source ↗
Figure 3
Figure 3. Figure 3: An example of a provenance graph construction with the early fusion. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Temporal attention comparison between Cyberian and StageFinder [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Advanced Persistent Threats (APTs) evolve through multiple stages, each exhibiting distinct temporal and structural behaviors. Accurate stage estimation is critical for enabling adaptive cyber defense. This paper presents StageFinder, a temporal-graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31% compared to state-of-the-art baselines (Cyberian, NetGuardian). These results highlight the effectiveness of fused provenance-temporal learning for accurate and stable APT stage inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces StageFinder, a temporal-graph learning framework that encodes provenance graphs with a GNN to capture structural dependencies and uses an LSTM to model temporal dynamics for estimating APT attack stages aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data, with experiments reporting a macro F1-score of 0.96 and a 31% reduction in prediction volatility relative to baselines Cyberian and NetGuardian on fused host and network provenance data.

Significance. If the performance claims hold under rigorous validation, the work would offer a meaningful contribution to adaptive cyber defense by demonstrating how fused provenance-temporal models can yield more accurate and stable stage inference than prior methods. The use of public DARPA datasets and explicit baseline comparisons is a positive aspect that supports reproducibility.

major comments (2)
  1. [Abstract] Abstract: The headline performance figures (macro F1 of 0.96 and 31% volatility reduction) are presented without any description of validation splits, number of runs, error bars, hyperparameter search protocol, or checks for data leakage between the OpTC pretraining and Transparent Computing fine-tuning phases. These omissions directly undermine assessment of whether the central empirical claim is robust.
  2. [Experimental evaluation] Experimental evaluation: The framework assumes that the MITRE ATT&CK-aligned stage labels in the Transparent Computing dataset accurately represent real-world APT progression, yet no label-validation statistics, inter-annotator agreement, or explicit out-of-distribution test results are supplied. Any systematic mismatch between these labels and actual temporal attack structure would render the reported F1 and volatility metrics unreliable.
minor comments (2)
  1. [Method] Clarify the precise fusion mechanism between host and network provenance streams in the GNN encoder; the current description leaves the input representation ambiguous.
  2. [Results] Add a table or figure showing per-stage precision/recall to complement the macro F1 score, as class imbalance is common in APT stage data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting important aspects of experimental robustness. We have revised the manuscript to incorporate additional details on validation protocols and label quality in both the abstract and experimental sections, while preserving the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline performance figures (macro F1 of 0.96 and 31% volatility reduction) are presented without any description of validation splits, number of runs, error bars, hyperparameter search protocol, or checks for data leakage between the OpTC pretraining and Transparent Computing fine-tuning phases. These omissions directly undermine assessment of whether the central empirical claim is robust.

    Authors: We agree that the abstract would benefit from explicit context on the evaluation protocol. In the revised version, we have expanded the abstract to state that results are obtained via 5-fold cross-validation on the fine-tuning set, averaged over 10 runs with standard deviations, using a grid search for hyperparameters as detailed in Section 4.3. We also confirm that the OpTC pretraining and Transparent Computing fine-tuning datasets are disjoint in both hosts and temporal windows, with no shared entities or overlapping time periods, eliminating data leakage. These additions directly address the concern without exceeding abstract length limits. revision: yes

  2. Referee: [Experimental evaluation] Experimental evaluation: The framework assumes that the MITRE ATT&CK-aligned stage labels in the Transparent Computing dataset accurately represent real-world APT progression, yet no label-validation statistics, inter-annotator agreement, or explicit out-of-distribution test results are supplied. Any systematic mismatch between these labels and actual temporal attack structure would render the reported F1 and volatility metrics unreliable.

    Authors: The Transparent Computing labels follow the official DARPA annotation aligned to MITRE ATT&CK tactics by security experts. In the revision, we have added a dedicated paragraph in Section 4.1 describing the label provenance from the dataset release notes and noting the high consistency reported in the original DARPA documentation. We also include new out-of-distribution experiments on held-out OpTC attack traces (distinct from pretraining), where StageFinder retains strong performance, supporting generalization. While we cannot retroactively compute inter-annotator agreement from the released data, the added discussion and OOD results mitigate concerns about label reliability. revision: partial

Circularity Check

0 steps flagged

No significant circularity in StageFinder derivation

full rationale

The paper presents a standard supervised learning pipeline: a GNN encodes provenance graph structure, an LSTM captures temporal dynamics, the model is pretrained on DARPA OpTC data and fine-tuned on MITRE ATT&CK-aligned labels from the Transparent Computing dataset, and performance is measured by macro F1 and volatility reduction against external baselines (Cyberian, NetGuardian). No equations or claims reduce a prediction to a fitted parameter by construction, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled via prior work. All reported metrics are empirical results on public datasets, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review yields limited visibility into exact parameters and assumptions; standard supervised learning assumptions are implicit.

axioms (2)
  • domain assumption Provenance graphs from host and network logs faithfully capture attack behaviors without significant missing or noisy entries
    Central to the encoding step described in the abstract
  • domain assumption MITRE ATT&CK stage labels in the DARPA data are accurate and consistent across samples
    Required for supervised fine-tuning and F1 evaluation

pith-pipeline@v0.9.0 · 5458 in / 1288 out tokens · 52189 ms · 2026-05-15T15:20:37.523181+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    A survey on advanced persistent threats: Tech- niques, solutions, challenges, and research opportunities,

    Alshamraniandet al., “A survey on advanced persistent threats: Tech- niques, solutions, challenges, and research opportunities,”IEEE Com- munications Surveys & Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019

  2. [2]

    MITRE ATT&CK: A knowledge base of adversary tactics, techniques, and common knowledge,

    The MITRE Corporation, “MITRE ATT&CK: A knowledge base of adversary tactics, techniques, and common knowledge,” 2026. [Online]. Available: https://attack.mitre.org

  3. [3]

    Explainable deep learning approach for advanced persistent threats (apts) detection in cybersecurity: A review,

    Mutalibet al., “Explainable deep learning approach for advanced persistent threats (apts) detection in cybersecurity: A review,”Artificial Intelligence Review, vol. 57, no. 11, p. 297, 2024

  4. [4]

    A survey of intrusion detection systems leveraging host data,

    R. A. Bridgeset al., “A survey of intrusion detection systems leveraging host data,”ACM Comput. Surv., vol. 52, Nov. 2019

  5. [5]

    Provenance-based intrusion detection: opportunities and challenges,

    X. Hanet al., “Provenance-based intrusion detection: opportunities and challenges,” in10th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2018), 2018

  6. [6]

    Darpa transparent computing (tc) engagement dataset,

    DARPA, “Darpa transparent computing (tc) engagement dataset,” 2019. [Online]. Available: https://github.com/darpa-i2o/ Transparent-Computing

  7. [7]

    Operationally Transparent Cyber (OpTC) Dataset,

    DARPA, “Operationally Transparent Cyber (OpTC) Dataset,” 2019. [Online]. Available: https://github.com/FiveDirections/OpTC-data

  8. [8]

    A novel ai-based methodology for identifying cyber attacks in honey pots,

    M. AbuOdehet al., “A novel ai-based methodology for identifying cyber attacks in honey pots,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 15224–15231, May 2021

  9. [9]

    A principled approach for detecting apts in massive networks via multi-stage causal analytics,

    J. Guiet al., “A principled approach for detecting apts in massive networks via multi-stage causal analytics,” inIEEE INFOCOM 2025 - IEEE Conference on Computer Communications, pp. 1–10, 2025

  10. [10]

    Anomaly-based multi-stage attack detection method,

    W. Maet al., “Anomaly-based multi-stage attack detection method,” PLOS ONE, vol. 19, no. 5, p. e0300821, 2024

  11. [11]

    A novel approach for apt attack detection based on an intelligent behavior profile and deep graph network,

    C. D. Do Xuanet al., “A novel approach for apt attack detection based on an intelligent behavior profile and deep graph network,”Scientific Reports, vol. 14, 2024

  12. [12]

    Continuum: Detecting apt attacks through spatial- temporal graph neural networks,

    A. A. M. Baharet al., “Continuum: Detecting apt attacks through spatial- temporal graph neural networks,” 2025

  13. [13]

    A dynamic provenance graph-based detector for advanced persistent threats,

    L. Wanget al., “A dynamic provenance graph-based detector for advanced persistent threats,”Expert Systems with Applications, vol. 265, p. 125877, 2025

  14. [14]

    PROGRAPHER: An anomaly detection system based on provenance graph embedding,

    F. Yanget al., “PROGRAPHER: An anomaly detection system based on provenance graph embedding,” in32nd USENIX Security Sympo- sium (USENIX Security 23), (Anaheim, CA), pp. 4355–4372, USENIX Association, Aug. 2023

  15. [15]

    Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex network characteristics,

    A. Zimbaet al., “Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex network characteristics,”Future Generation Computer Systems, vol. 108, pp. 636–646, 2020