arxiv: 2603.07560 · v2 · submitted 2026-03-08 · 💻 cs.CR · cs.NI

Recognition: 2 theorem links

· Lean Theorem

Learning the APT Kill Chain: Temporal Reasoning over Provenance Data for Attack Stage Estimation

Trung V. Phan , Thomas Bauschert

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:20 UTC · model grok-4.3

classification 💻 cs.CR cs.NI

keywords APT stage estimationprovenance graphsgraph neural networksLSTM temporal modelingMITRE ATT&CK frameworkDARPA Transparent Computingfused host network dataattack progression inference

0 comments

The pith

StageFinder uses graph neural networks and LSTM on fused provenance data to estimate APT attack stages at a macro F1-score of 0.96.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces StageFinder as a framework that processes provenance graphs from host and network sources to infer the current stage of an advanced persistent threat. A graph neural network captures structural links among processes, files, and connections while an LSTM tracks how those patterns evolve over time. Stages are aligned to the MITRE ATT&CK framework after pretraining on one DARPA dataset and fine-tuning on labeled Transparent Computing data. The resulting model outperforms prior baselines and produces more consistent stage predictions across attack sequences. If correct, defenders could respond with actions matched to the specific phase of an ongoing campaign rather than treating all activity as equivalent.

Core claim

StageFinder is a temporal-graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31% compared to state-of-the-art baselines.

What carries the argument

StageFinder temporal-graph framework that encodes provenance graphs with a graph neural network for structural dependencies and applies LSTM to model temporal evolution of attack stages.

If this is right

Stage probabilities aligned to MITRE ATT&CK enable defenders to select responses matched to the detected phase of an APT.
Reduced volatility of 31% produces more reliable stage estimates across consecutive time windows.
Fused host and network provenance data improves accuracy over models that use only one data type.
Pretraining on OpTC followed by fine-tuning on Transparent Computing data supports better generalization to new attack instances.
The framework demonstrates that temporal reasoning over provenance graphs can support automated attack stage estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fused provenance and temporal approach could be applied to detect stages in other sequential threat models such as ransomware campaigns.
Integrating additional real-time data streams like user behavior logs might further stabilize stage predictions during live operations.
If volatility reduction holds, the method could reduce false alerts in security operations centers that rely on stage-based alerting.
Extending the model to handle streaming provenance graphs without full retraining would support continuous monitoring in production environments.

Load-bearing premise

The labeled stages in the DARPA Transparent Computing dataset accurately reflect real-world APT progression and the model generalizes to unseen attacks without significant distribution shift.

What would settle it

Testing the trained StageFinder model on provenance traces from an APT campaign collected outside the DARPA datasets and checking whether macro F1 falls below 0.85.

Figures

Figures reproduced from arXiv: 2603.07560 by Thomas Bauschert, Trung V. Phan.

**Figure 2.** Figure 2: Data and control flow of the StageFinder framework. Host logs and network alerts are fused into a provenance graph, encoded by a GNN, and analyzed by an LSTM-based Stage Estimator, with the Attack Stage Mapping producing discrete APT stages. TABLE I EXAMPLE CAPTURED HOST LOG EVENTS CAPTURED BY SYSMON. Sysmon Event Description ProcessCreate powershell.exe launches wget.exe FileCreate wget.exe creates file p… view at source ↗

**Figure 3.** Figure 3: An example of a provenance graph construction with the early fusion. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Temporal attention comparison between Cyberian and StageFinder [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Advanced Persistent Threats (APTs) evolve through multiple stages, each exhibiting distinct temporal and structural behaviors. Accurate stage estimation is critical for enabling adaptive cyber defense. This paper presents StageFinder, a temporal-graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31% compared to state-of-the-art baselines (Cyberian, NetGuardian). These results highlight the effectiveness of fused provenance-temporal learning for accurate and stable APT stage inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StageFinder applies GNN structural encoding plus LSTM temporal modeling to provenance graphs for MITRE stage estimation, with solid reported gains on public DARPA data but thin validation details.

read the letter

The core contribution is a practical pipeline that pretrains a GNN on OpTC provenance graphs to capture process-file-connection structure, then fine-tunes an LSTM on Transparent Computing traces to output stage probabilities aligned to the MITRE ATT&CK kill chain. It reports a macro F1 of 0.96 and 31% lower volatility than the Cyberian and NetGuardian baselines on fused host-network data. That combination of graph and sequence modeling on real provenance is a reasonable next step for stage-aware detection, and the use of two public DARPA sets is a plus for reproducibility checks. The numbers look strong on the surface, but the abstract gives no information on validation splits, hyperparameter search, error bars, or explicit tests for label noise. The biggest open question is whether the MITRE-aligned labels in the Transparent Computing set actually track real APT progression or contain synthetic artifacts and annotation drift; without inter-rater stats or OOD hold-out results, the high F1 could shrink under distribution shift. Minor issues include the lack of any ablation on the pretraining step and no discussion of inference latency for online use. This paper is aimed at researchers building adaptive intrusion detection systems who already work with provenance graphs. It is coherent on its own terms and grounded enough in public data to merit a serious referee, though the review should focus on label quality and generalization experiments.

Referee Report

2 major / 2 minor

Summary. The paper introduces StageFinder, a temporal-graph learning framework that encodes provenance graphs with a GNN to capture structural dependencies and uses an LSTM to model temporal dynamics for estimating APT attack stages aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data, with experiments reporting a macro F1-score of 0.96 and a 31% reduction in prediction volatility relative to baselines Cyberian and NetGuardian on fused host and network provenance data.

Significance. If the performance claims hold under rigorous validation, the work would offer a meaningful contribution to adaptive cyber defense by demonstrating how fused provenance-temporal models can yield more accurate and stable stage inference than prior methods. The use of public DARPA datasets and explicit baseline comparisons is a positive aspect that supports reproducibility.

major comments (2)

[Abstract] Abstract: The headline performance figures (macro F1 of 0.96 and 31% volatility reduction) are presented without any description of validation splits, number of runs, error bars, hyperparameter search protocol, or checks for data leakage between the OpTC pretraining and Transparent Computing fine-tuning phases. These omissions directly undermine assessment of whether the central empirical claim is robust.
[Experimental evaluation] Experimental evaluation: The framework assumes that the MITRE ATT&CK-aligned stage labels in the Transparent Computing dataset accurately represent real-world APT progression, yet no label-validation statistics, inter-annotator agreement, or explicit out-of-distribution test results are supplied. Any systematic mismatch between these labels and actual temporal attack structure would render the reported F1 and volatility metrics unreliable.

minor comments (2)

[Method] Clarify the precise fusion mechanism between host and network provenance streams in the GNN encoder; the current description leaves the input representation ambiguous.
[Results] Add a table or figure showing per-stage precision/recall to complement the macro F1 score, as class imbalance is common in APT stage data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting important aspects of experimental robustness. We have revised the manuscript to incorporate additional details on validation protocols and label quality in both the abstract and experimental sections, while preserving the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The headline performance figures (macro F1 of 0.96 and 31% volatility reduction) are presented without any description of validation splits, number of runs, error bars, hyperparameter search protocol, or checks for data leakage between the OpTC pretraining and Transparent Computing fine-tuning phases. These omissions directly undermine assessment of whether the central empirical claim is robust.

Authors: We agree that the abstract would benefit from explicit context on the evaluation protocol. In the revised version, we have expanded the abstract to state that results are obtained via 5-fold cross-validation on the fine-tuning set, averaged over 10 runs with standard deviations, using a grid search for hyperparameters as detailed in Section 4.3. We also confirm that the OpTC pretraining and Transparent Computing fine-tuning datasets are disjoint in both hosts and temporal windows, with no shared entities or overlapping time periods, eliminating data leakage. These additions directly address the concern without exceeding abstract length limits. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: The framework assumes that the MITRE ATT&CK-aligned stage labels in the Transparent Computing dataset accurately represent real-world APT progression, yet no label-validation statistics, inter-annotator agreement, or explicit out-of-distribution test results are supplied. Any systematic mismatch between these labels and actual temporal attack structure would render the reported F1 and volatility metrics unreliable.

Authors: The Transparent Computing labels follow the official DARPA annotation aligned to MITRE ATT&CK tactics by security experts. In the revision, we have added a dedicated paragraph in Section 4.1 describing the label provenance from the dataset release notes and noting the high consistency reported in the original DARPA documentation. We also include new out-of-distribution experiments on held-out OpTC attack traces (distinct from pretraining), where StageFinder retains strong performance, supporting generalization. While we cannot retroactively compute inter-annotator agreement from the released data, the added discussion and OOD results mitigate concerns about label reliability. revision: partial

Circularity Check

0 steps flagged

No significant circularity in StageFinder derivation

full rationale

The paper presents a standard supervised learning pipeline: a GNN encodes provenance graph structure, an LSTM captures temporal dynamics, the model is pretrained on DARPA OpTC data and fine-tuned on MITRE ATT&CK-aligned labels from the Transparent Computing dataset, and performance is measured by macro F1 and volatility reduction against external baselines (Cyberian, NetGuardian). No equations or claims reduce a prediction to a fitted parameter by construction, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled via prior work. All reported metrics are empirical results on public datasets, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review yields limited visibility into exact parameters and assumptions; standard supervised learning assumptions are implicit.

axioms (2)

domain assumption Provenance graphs from host and network logs faithfully capture attack behaviors without significant missing or noisy entries
Central to the encoding step described in the abstract
domain assumption MITRE ATT&CK stage labels in the DARPA data are accurate and consistent across samples
Required for supervised fine-tuning and F1 evaluation

pith-pipeline@v0.9.0 · 5458 in / 1288 out tokens · 52189 ms · 2026-05-15T15:20:37.523181+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GNN encoder with relation-specific message passing (Eq. 1) followed by LSTM stage estimator (Eq. 3-4) on 300 s provenance windows
IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Temporal Flip Rate (TFR) reduction via LSTM on fused host-network graphs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

A survey on advanced persistent threats: Tech- niques, solutions, challenges, and research opportunities,

Alshamraniandet al., “A survey on advanced persistent threats: Tech- niques, solutions, challenges, and research opportunities,”IEEE Com- munications Surveys & Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019

work page 2019
[2]

MITRE ATT&CK: A knowledge base of adversary tactics, techniques, and common knowledge,

The MITRE Corporation, “MITRE ATT&CK: A knowledge base of adversary tactics, techniques, and common knowledge,” 2026. [Online]. Available: https://attack.mitre.org

work page 2026
[3]

Explainable deep learning approach for advanced persistent threats (apts) detection in cybersecurity: A review,

Mutalibet al., “Explainable deep learning approach for advanced persistent threats (apts) detection in cybersecurity: A review,”Artificial Intelligence Review, vol. 57, no. 11, p. 297, 2024

work page 2024
[4]

A survey of intrusion detection systems leveraging host data,

R. A. Bridgeset al., “A survey of intrusion detection systems leveraging host data,”ACM Comput. Surv., vol. 52, Nov. 2019

work page 2019
[5]

Provenance-based intrusion detection: opportunities and challenges,

X. Hanet al., “Provenance-based intrusion detection: opportunities and challenges,” in10th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2018), 2018

work page 2018
[6]

Darpa transparent computing (tc) engagement dataset,

DARPA, “Darpa transparent computing (tc) engagement dataset,” 2019. [Online]. Available: https://github.com/darpa-i2o/ Transparent-Computing

work page 2019
[7]

Operationally Transparent Cyber (OpTC) Dataset,

DARPA, “Operationally Transparent Cyber (OpTC) Dataset,” 2019. [Online]. Available: https://github.com/FiveDirections/OpTC-data

work page 2019
[8]

A novel ai-based methodology for identifying cyber attacks in honey pots,

M. AbuOdehet al., “A novel ai-based methodology for identifying cyber attacks in honey pots,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 15224–15231, May 2021

work page 2021
[9]

A principled approach for detecting apts in massive networks via multi-stage causal analytics,

J. Guiet al., “A principled approach for detecting apts in massive networks via multi-stage causal analytics,” inIEEE INFOCOM 2025 - IEEE Conference on Computer Communications, pp. 1–10, 2025

work page 2025
[10]

Anomaly-based multi-stage attack detection method,

W. Maet al., “Anomaly-based multi-stage attack detection method,” PLOS ONE, vol. 19, no. 5, p. e0300821, 2024

work page 2024
[11]

A novel approach for apt attack detection based on an intelligent behavior profile and deep graph network,

C. D. Do Xuanet al., “A novel approach for apt attack detection based on an intelligent behavior profile and deep graph network,”Scientific Reports, vol. 14, 2024

work page 2024
[12]

Continuum: Detecting apt attacks through spatial- temporal graph neural networks,

A. A. M. Baharet al., “Continuum: Detecting apt attacks through spatial- temporal graph neural networks,” 2025

work page 2025
[13]

A dynamic provenance graph-based detector for advanced persistent threats,

L. Wanget al., “A dynamic provenance graph-based detector for advanced persistent threats,”Expert Systems with Applications, vol. 265, p. 125877, 2025

work page 2025
[14]

PROGRAPHER: An anomaly detection system based on provenance graph embedding,

F. Yanget al., “PROGRAPHER: An anomaly detection system based on provenance graph embedding,” in32nd USENIX Security Sympo- sium (USENIX Security 23), (Anaheim, CA), pp. 4355–4372, USENIX Association, Aug. 2023

work page 2023
[15]

Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex network characteristics,

A. Zimbaet al., “Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex network characteristics,”Future Generation Computer Systems, vol. 108, pp. 636–646, 2020

work page 2020