pith. sign in

arxiv: 2604.18716 · v1 · submitted 2026-04-20 · 💻 cs.CR · cs.LG

TrEEStealer: Stealing Decision Trees via Enclave Side Channels

Pith reviewed 2026-05-10 04:06 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords model extractiondecision treesside-channel attackstrusted execution environmentscontrol-flow leakageAMD SEVIntel SGXmachine learning security
0
0 comments X

The pith

TrEEStealer extracts decision trees from TEE-protected environments by exploiting control-flow side channels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

TrEEStealer is an attack that steals decision trees even when they run inside trusted execution environments such as AMD SEV and Intel SGX. It captures leaked control-flow details during inference and pairs them with a query strategy that extracts maximum structure from each run. This matters because decision trees appear in security-critical and financial applications, and stolen models open paths to privacy breaches on training data as well as evasion attacks. The method works without rich API access or assumptions about tree shape and beats earlier extraction techniques in speed and accuracy. It also locates exploitable leaks inside common libraries including OpenCV, mlpack, and emlearn.

Core claim

TrEEStealer is a high-fidelity extraction attack for stealing TEE-protected DTs that exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. Two primitives acquire the CFI: SEV-Step plus performance counters on AMD SEV, and a new Branch-History-Register technique on Intel SGX. Corresponding vulnerabilities exist in OpenCV, mlpack, and emlearn. The attack achieves superior efficiency and extraction fidelity compared to prior attacks and,

What carries the argument

The TrEEStealer algorithm, which couples side-channel-derived control-flow information with passive information tracking to maximize structure recovered per query.

If this is right

  • Stolen decision trees enable white-box attacks including privacy attacks on training data and model evasion.
  • Model extraction becomes practical against TEE-protected deployments that were previously considered isolated.
  • Popular machine-learning libraries contain exploitable control-flow leaks during decision-tree inference.
  • Current trusted execution environments do not adequately isolate control flow for machine-learning workloads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Other tree-based models or inference routines that expose branch decisions may face comparable extraction risks.
  • Hardware vendors and library maintainers may need stronger control-flow protections to restore isolation guarantees.
  • Model owners might combine TEEs with additional software-level obfuscation or query limiting to reduce leakage.

Load-bearing premise

Control-flow side channels remain exploitable in current TEE hardware and popular decision-tree libraries, and inference leaks enough branch information to allow accurate reconstruction.

What would settle it

An experiment showing that no usable branch history or control-flow data leaks from inference runs inside a TEE equipped with current mitigations would demonstrate the attack cannot succeed.

Figures

Figures reproduced from arXiv: 2604.18716 by Anja Rabich, David Oswald, Felix Maurer, Jonah Heller, Jonas Sander, Nick Mahling, Qifan Wang, Thomas Eisenbarth.

Figure 1
Figure 1. Figure 1: Attack scenario of TrEEStealer in which a model owner provides a proprietary model protected through a TEE via a ML as a Service (MLaaS) API. The model thief respects the threat model of the TEE and just controls the operating system respective hypervisor, but still is able to steal whole real-world models precisely and efficiently. as it may include sensitive information like medical records. Additionally… view at source ↗
Figure 2
Figure 2. Figure 2: Target DT with rectangles representing decision nodes (i.e., internal nodes). [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of TrEEStealer’s attack workflow. find threshold values, and traverse all paths to extract the whole DT. While the shadow DT is incomplete the attack logic returns to step 1 and sends the adjusted input to the attack primitive, otherwise the attack logic terminates the attack by sending finish. 3.3 Attack Logic Below, we note lists with bold capital letters, e.g., X and structs with calligraphic l… view at source ↗
Figure 4
Figure 4. Figure 4: Shadow DT after the first crafted input and with per-node extracted threshold [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Shadow DT with the extracted root node’s feature and threshold. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Shadow DT with the extracted inner node’s feature and threshold. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Shadow DT before and after the extraction of the duplicated feature node. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Illustration of the READ_PHR helper while extracting the first doublet of the PHR. previously first doublet in the last position while all other doublets are shifted out. Hence, the rest of the PHR is in a known state containing only zero doublets. The only unknown value is the isolated doublet Y , which has four possible values. In the end we execute a test branch such that it results in a non-taken outco… view at source ↗
Figure 9
Figure 9. Figure 9: Pareto frontier of the cost vs. extraction accuracy tradeoff. Note, that only data [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion. In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register. We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces TrEEStealer, a side-channel attack on TEE-protected decision trees that extracts model structure by coupling control-flow information (via SEV-Step/performance counters on AMD SEV and a new Branch-History-Register primitive on Intel SGX) with passive tracking. It demonstrates the attack on three libraries (OpenCV, mlpack, emlearn), claiming higher efficiency and fidelity than prior black-box DT extraction methods without strong assumptions on API outputs or tree structure, thereby showing that TEEs do not protect against control-flow leakage.

Significance. If the empirical results hold with detailed validation, the work would establish a new state-of-the-art for decision-tree extraction attacks and provide concrete evidence that current TEE implementations remain vulnerable to control-flow side channels during ML inference. This has direct implications for the security of MLaaS deployments relying on hardware enclaves.

major comments (3)
  1. [Experimental Evaluation] Experimental Evaluation section: The abstract asserts 'superior efficiency and extraction fidelity' and 'new state-of-the-art' but provides no quantitative metrics (query counts, fidelity scores such as structural similarity or test-set accuracy, or direct comparisons to baselines) or error analysis; without these, the central SOTA claim cannot be verified and is load-bearing for the contribution.
  2. [Attack Algorithm] Attack Algorithm section: The novel coupling of CFI with passive information tracking is presented as maximizing information per query, yet the manuscript supplies no formal information-theoretic bound or worst-case analysis showing that side-channel observations plus black-box responses suffice to disambiguate all possible trees; this assumption is load-bearing for the claim of high-fidelity extraction on arbitrary DTs and deeper trees.
  3. [Threat Model] Threat Model and Assumptions: The attack relies on exploitable control-flow leakage in current TEEs and libraries without mitigations, but the manuscript does not address or test potential countermeasures (e.g., CFI hardening or noise injection) that could invalidate the practical threat; this is central to the conclusion that 'TEEs fail to protect against control-flow leakage.'
minor comments (3)
  1. [Abstract] The abstract and introduction would benefit from explicit definitions of 'fidelity' and 'efficiency' metrics early on to clarify what is being optimized.
  2. [Evaluation] Figure captions and table headers in the evaluation should include error bars or confidence intervals for any reported success rates to improve clarity.
  3. [Attack Algorithm] A few sentences on the exact reconstruction procedure (how branch history maps to tree nodes) would help readers follow the algorithm without ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experimental Evaluation] Experimental Evaluation section: The abstract asserts 'superior efficiency and extraction fidelity' and 'new state-of-the-art' but provides no quantitative metrics (query counts, fidelity scores such as structural similarity or test-set accuracy, or direct comparisons to baselines) or error analysis; without these, the central SOTA claim cannot be verified and is load-bearing for the contribution.

    Authors: We appreciate this observation. The experimental section reports efficiency and fidelity results across OpenCV, mlpack, and emlearn, but we agree that consolidated quantitative comparisons (query counts, structural similarity, test-set accuracy) and error analysis against prior black-box DT extraction baselines are not presented. In the revised manuscript we will add a dedicated comparison table and analysis to directly substantiate the SOTA claim. revision: yes

  2. Referee: [Attack Algorithm] Attack Algorithm section: The novel coupling of CFI with passive information tracking is presented as maximizing information per query, yet the manuscript supplies no formal information-theoretic bound or worst-case analysis showing that side-channel observations plus black-box responses suffice to disambiguate all possible trees; this assumption is load-bearing for the claim of high-fidelity extraction on arbitrary DTs and deeper trees.

    Authors: The referee correctly identifies the lack of a formal information-theoretic bound. The algorithm is validated empirically on trees of varying depths from three libraries, showing high-fidelity extraction in practice. Deriving a tight worst-case bound for arbitrary trees is non-trivial given structural variability and side-channel noise; we will add a discussion of per-query information gain and limitations for deeper trees, while retaining that the empirical results support the practical claims. revision: partial

  3. Referee: [Threat Model] Threat Model and Assumptions: The attack relies on exploitable control-flow leakage in current TEEs and libraries without mitigations, but the manuscript does not address or test potential countermeasures (e.g., CFI hardening or noise injection) that could invalidate the practical threat; this is central to the conclusion that 'TEEs fail to protect against control-flow leakage.'

    Authors: We agree that addressing countermeasures would improve completeness. The manuscript demonstrates vulnerabilities in current TEE implementations and libraries. We will expand the threat model and discussion sections to cover potential mitigations such as CFI hardening and noise injection, clarifying that our conclusions apply to unprotected current systems and highlighting the need for such defenses in future designs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical attack with experimental results.

full rationale

The manuscript describes an empirical side-channel attack (TrEEStealer) that couples CFI primitives with passive tracking to extract DTs from three libraries. All load-bearing claims rest on direct experimental measurements of extraction fidelity and query efficiency rather than any derivation chain, equations, fitted parameters, or predictions that reduce to prior inputs by construction. References to SEV-Step and prior SGX findings are used only to reproduce known primitives for the new attack; the novel algorithm and SOTA claims are validated by fresh experiments on OpenCV, mlpack, and emlearn, with no self-citation load-bearing the central results. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that TEE side channels leak usable control-flow information during DT inference and that standard library implementations do not mitigate these leaks.

axioms (1)
  • domain assumption Control-flow information from performance counters and branch history registers leaks during TEE-protected DT inference
    Invoked when describing the two primitives for acquiring CFI on AMD SEV and Intel SGX.

pith-pipeline@v0.9.0 · 5660 in / 1210 out tokens · 34505 ms · 2026-05-10T04:06:15.260749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Sgx-step: A practical attack framework for precise enclave execution control

    [BPS17] Jo Van Bulck, Frank Piessens, and Raoul Strackx. Sgx-step: A practical attack framework for precise enclave execution control. InProceedings of the 2nd Workshop on System Software for Trusted Execution, SysTEX@SOSP 2017, Shanghai, China, October 28, 2017, pages 4:1–4:6. ACM,

  2. [2]

    Nemesis: Studying microarchitectural timing leaks in rudimentary CPU interrupt logic

    [BPS18] Jo Van Bulck, Frank Piessens, and Raoul Strackx. Nemesis: Studying microarchitectural timing leaks in rudimentary CPU interrupt logic. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors,Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, p...

  3. [3]

    Off-limits: Abusing legacy x86 memory segmentation to spy on enclaved execution

    [GVBPS18] Jago Gyselinck, Jo Van Bulck, Frank Piessens, and Raoul Strackx. Off-limits: Abusing legacy x86 memory segmentation to spy on enclaved execution. In Engineering Secure Software and Systems: 10th International Symposium, ESSoS 2018, France, June 26-27, 2018, pages 44–60. Springer,

  4. [4]

    Counterseveillance: Performance-counter attacks on AMD SEV-SNP

    Jonas Sander, Anja Rabich, Nick Mahling, Felix Maurer, Jonah Heller, Qifan Wang, Thomas Eisenbarth and David Oswald 23 [GWSG25] Stefan Gast, Hannes Weissteiner, Robin Leander Schröder, and Daniel Gruss. Counterseveillance: Performance-counter attacks on AMD SEV-SNP. In 32nd Annual Network and Distributed System Security Symposium, NDSS 2025, San Diego, Ca...

  5. [5]

    Secure collaborative training and inference for xgboost

    [LLP+20] Andrew Law, Chester Leung, Rishabh Poddar, Raluca Ada Popa, Chenyu Shi, Octavian Sima, Chaofan Yu, Xingmeng Zhang, and Wenting Zheng. Secure collaborative training and inference for xgboost. InProceedings of the 2020 workshop on privacy-preserving machine learning in practice, pages 21–26,

  6. [6]

    Ganred: Gan-based reverse engineering of dnns via cache side-channel

    [LS20] Yuntao Liu and Ankur Srivastava. Ganred: Gan-based reverse engineering of dnns via cache side-channel. InProceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop, pages 41–52,

  7. [7]

    Deepcache: Revisiting cache side-channel attacks in deep neural networks executables

    [LYC+24] Zhibo Liu, Yuanyuan Yuan, Yanzuo Chen, Sihang Hu, Tianxiang Li, and Shuai Wang. Deepcache: Revisiting cache side-channel attacks in deep neural networks executables. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 4495–4508,

  8. [8]

    Copycat: Controlled instruction-level attacks on enclaves

    [MBH+20] Daniel Moghimi, Jo Van Bulck, Nadia Heninger, Frank Piessens, and Berk Sunar. Copycat: Controlled instruction-level attacks on enclaves. In Srdjan Capkun and Franziska Roesner, editors,29th USENIX Security Sympo- sium, USENIX Security 2020, August 12-14, 2020, pages 469–486. USENIX Association,

  9. [9]

    Cachezoom: How SGX amplifies the power of cache attacks

    [MIE17] Ahmad Moghimi, Gorka Irazoqui, and Thomas Eisenbarth. Cachezoom: How SGX amplifies the power of cache attacks. In Wieland Fischer and Naofumi Homma, editors,Cryptographic Hardware and Embedded Systems - CHES 2017 - 19th International Conference, Taipei, Taiwan, September 25- 28, 2017, Proceedings, volume 10529 ofLecture Notes in Computer Science, ...

  10. [10]

    Oswald, Mark Ryan, and Jo Van Bulck

    [SORB25] Jesse Spielman, David F. Oswald, Mark Ryan, and Jo Van Bulck. Activation functions considered harmful: Recovering neural network weights through controlled channels.CoRR, abs/2503.19142,

  11. [11]

    Reiter, and Thomas Ristenpart

    [TZJ+16] Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In Thorsten Holz and Stefan Savage, editors,25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016, pages 601–618. USENIX Association,

  12. [12]

    Leaky cauldron on the dark land: Understanding memory side-channel hazards in sgx

    [WCP+17] Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang, Vincent Bindschaedler, Haixu Tang, and Carl A Gunter. Leaky cauldron on the dark land: Understanding memory side-channel hazards in sgx. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2421–2434,

  13. [13]

    Enclavetree: Privacy-preserving data stream training and inference using tee

    [WCZ+22] Qifan Wang, Shujie Cui, Lei Zhou, Ocean Wu, Yonghua Zhu, and Giovanni Russello. Enclavetree: Privacy-preserving data stream training and inference using tee. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pages 741–755,

  14. [14]

    Barkbeetle: Stealing decision tree models with fault injection.arXiv preprint arXiv:2507.06986,

    [WSJ+25] Qifan Wang, Jonas Sander, Minmin Jiang, Thomas Eisenbarth, and David Oswald. Barkbeetle: Stealing decision tree models with fault injection.arXiv preprint arXiv:2507.06986,

  15. [15]

    Sev- step A single-stepping framework for AMD-SEV.IACR Trans

    [WWRE24] Luca Wilke, Jan Wichelmann, Anja Rabich, and Thomas Eisenbarth. Sev- step A single-stepping framework for AMD-SEV.IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024(1):180–206,

  16. [16]

    Model extraction attacks on graph neural networks: Taxonomy and realisation

    [WYPY22] Bang Wu, Xiangwen Yang, Shirui Pan, and Xingliang Yuan. Model extraction attacks on graph neural networks: Taxonomy and realisation. InProceedings of the 2022 ACM on Asia conference on computer and communications security, pages 337–350,

  17. [17]

    Controlled-channel attacks: Deterministic side channels for untrusted operating systems

    [XCP15] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 640–656. IEEE Computer Society,

  18. [18]

    Cache telepathy: Leveraging shared resource attacks to learn{DNN}architectures

    [YFT20] Mengjia Yan, Christopher W Fletcher, and Josep Torrellas. Cache telepathy: Leveraging shared resource attacks to learn{DNN}architectures. In29th USENIX Security Symposium (USENIX Security 20), pages 2003–2020,

  19. [19]

    Half&half: Demystifying intel’s directional branch predictors for fast, secure partitioned execution

    [YTN+23] Hosein Yavarzadeh, Mohammadkazem Taram, Shravan Narayan, Deian Stefan, and Dean Tullsen. Half&half: Demystifying intel’s directional branch predictors for fast, secure partitioned execution. In2023 IEEE Symposium on Security and Privacy (SP), pages 1220–1237. IEEE, 2023