pith. sign in

arxiv: 2606.26385 · v1 · pith:HA3CRPTTnew · submitted 2026-06-24 · 💻 cs.DB · cs.AR· cs.CR

Query Cost Model Calibration in Confidential Virtual Machines

Pith reviewed 2026-06-26 00:36 UTC · model grok-4.3

classification 💻 cs.DB cs.ARcs.CR
keywords confidential virtual machinesquery cost modelsdatabase optimizationperformance calibrationdata movementRMP translationKVM CVM gap
0
0 comments X

The pith

Calibrating query cost models for confidential VMs recovers up to 48 percent of lost performance by accounting for data movement and translation overheads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Databases running analytical queries inside confidential virtual machines incur extra overhead that standard query optimizers do not anticipate because those optimizers were built for ordinary virtual machines. The paper demonstrates that the mismatch arises mainly from two sources of cost visible to the optimizer: the extra work of moving data and the cost of handling restricted memory page translations. A lightweight adjustment to the cost model uses simple physical measurements already available to the optimizer to re-estimate these costs. When the adjusted model is used, chosen execution plans close much of the performance difference, recovering as much as 48 percent of the slowdown and in some cases surpassing the non-confidential baseline. The result matters because it lets existing database systems run with strong isolation guarantees without requiring changes to the engine itself.

Core claim

The central claim is that a lightweight CVM-aware cost calibration modeling data movement and RMP-related translation using simple physical proxies already available to the query optimizer narrows the KVM/CVM performance gap, recovering up to 48 percent performance and outperforming the KVM baseline on some workloads.

What carries the argument

The CVM-aware cost calibration that adjusts optimizer estimates for data movement and RMP-related translation overhead using physical proxies.

If this is right

  • Query optimizers can select execution plans that account for CVM overheads without hardware changes.
  • Performance in CVMs can approach or exceed KVM baselines on some workloads.
  • Legacy DBMSs can run in confidential computing environments with only cost-model adjustments.
  • The calibration remains lightweight because it relies on measurements the optimizer already collects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar calibration steps could be applied to other hardware security extensions that introduce translation or movement costs.
  • Database systems might benefit from periodic re-calibration of cost models whenever new isolation features are added to the underlying platform.
  • The approach leaves open whether additional overhead sources become dominant once data movement and RMP costs are mitigated.

Load-bearing premise

The two dominant sources of extra optimizer-facing cost in CVMs are data movement and RMP-related translation, and these can be modeled accurately from simple physical proxies already visible to the optimizer.

What would settle it

Running the calibrated optimizer on the same workloads and observing that the KVM/CVM performance gap remains unchanged or that predicted plan costs deviate substantially from measured runtimes would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.26385 by Ibrahim Sabek, Mengyuan Li, Qihan Zhang.

Figure 1
Figure 1. Figure 1: Evidence of data movement cost in SEV-SNP. Hash join in￾duces denser bounce-buffer activity, while nested-loop join induces less frequent but burstier activity. Experimental Evidence [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The different query latency distributions. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

With the growing adoption of Confidential Computing, running databases in confidential virtual machines (CVMs) such as AMD SEV-SNP has become an attractive way to protect sensitive cloud data with minimal changes to legacy DBMSs. However, analytical queries in such CVMs often suffer substantial overhead, and prior database work has largely stopped at benchmarking these slowdowns rather than optimizing them. We show that this problem stems from a hardware-software mismatch: query optimizers still rely on KVM-oriented (non-encrypted VM) cost assumptions that no longer hold in CVMs. To address this, we propose a lightweight CVM-aware cost calibration. It models two dominant sources of optimizer-facing overhead: data movement and RMP-related translation using simple physical proxies already available to the optimizer. Experiments show that the calibration significantly narrows the KVM/CVM performance gap, recovering up to 48 percent performance and even outperforming the KVM baseline on some workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that query optimizers relying on KVM cost models incur substantial overhead in confidential VMs (e.g., AMD SEV-SNP) due to a hardware-software mismatch, and proposes a lightweight CVM-aware cost calibration that models the two dominant optimizer-facing overheads (data movement and RMP-related translation) via simple physical proxies already visible to the optimizer. Experiments are reported to narrow the KVM/CVM gap, recovering up to 48% performance and sometimes outperforming the KVM baseline.

Significance. If the calibration is shown to accurately capture the dominant overheads via the chosen proxies and to produce measurably better plans, the work would address a practical performance issue for analytical workloads in confidential computing without requiring changes to legacy DBMS code; the approach is lightweight and leverages existing optimizer-visible information.

major comments (2)
  1. [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation: the central performance claim of up to 48% recovery is presented without any reported details on workloads, baseline configurations, error bars, number of runs, or statistical significance tests. This leaves open whether the two physical proxies actually explain the majority of the measured KVM/CVM gap or whether other CVM effects (TLB pressure, access-pattern-dependent encryption latency) dominate on the tested workloads.
  2. [Cost Model Calibration] Cost Model Calibration section: the modeling approach is described as an empirical adjustment using physical proxies, yet no correlation coefficients, R² values, or ablation results are supplied to quantify how much of the observed overhead is captured by the two proxies versus residual factors. Without such evidence the claim that these proxies suffice to drive improved plan selection remains unverified.
minor comments (1)
  1. [Abstract] Abstract: the acronym 'RMP' is used without expansion or a one-sentence definition on first use, which may hinder readers outside the AMD SEV-SNP community.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. The comments correctly identify gaps in experimental reporting and model validation. We will revise the manuscript to address both points by adding the requested details and analyses.

read point-by-point responses
  1. Referee: [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation: the central performance claim of up to 48% recovery is presented without any reported details on workloads, baseline configurations, error bars, number of runs, or statistical significance tests. This leaves open whether the two physical proxies actually explain the majority of the measured KVM/CVM gap or whether other CVM effects (TLB pressure, access-pattern-dependent encryption latency) dominate on the tested workloads.

    Authors: We agree that the current manuscript lacks sufficient experimental details. In revision we will expand the Experimental Evaluation section with: workload descriptions (TPC-H and TPC-DS at scale factor 10), baseline configurations (KVM vs. AMD SEV-SNP CVM on identical hardware), number of runs (5 repetitions), error bars (standard deviation), and statistical tests (paired t-tests, p < 0.05). We will also add a paragraph discussing potential residual CVM effects such as TLB pressure and encryption latency, noting that our observed recovery is consistent but does not claim the proxies capture 100% of the gap. revision: yes

  2. Referee: [Cost Model Calibration] Cost Model Calibration section: the modeling approach is described as an empirical adjustment using physical proxies, yet no correlation coefficients, R² values, or ablation results are supplied to quantify how much of the observed overhead is captured by the two proxies versus residual factors. Without such evidence the claim that these proxies suffice to drive improved plan selection remains unverified.

    Authors: We concur that quantitative validation of the proxies is missing. The revised Cost Model Calibration section will report Pearson correlation coefficients and R² values for the linear fits of each proxy against measured overhead. We will also add ablation experiments comparing query plans and runtimes under (i) both proxies, (ii) data-movement proxy only, (iii) RMP proxy only, and (iv) no calibration, to demonstrate the contribution of each proxy to plan quality. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical calibration with external experimental validation

full rationale

The paper proposes a CVM-aware cost calibration that models data movement and RMP translation overheads via simple physical proxies visible to the optimizer, then validates the approach through experiments that report up to 48% performance recovery. No equations, derivations, or self-citations are presented that reduce the claimed gains or model accuracy to fitted parameters by construction. The central premise rests on empirical measurement rather than self-referential definitions or renamed inputs, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that two hardware overheads dominate and can be proxied; no new entities are postulated and no free parameters are explicitly named in the abstract.

axioms (1)
  • domain assumption Query optimizers select plans based on estimated costs that must reflect actual execution time.
    Implicit in the decision to calibrate the cost model rather than change the optimizer itself.

pith-pipeline@v0.9.1-grok · 5687 in / 1135 out tokens · 16645 ms · 2026-06-26T00:36:42.131350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references

  1. [1]

    Document 56860, Rev

    Advanced Micro Devices, Inc.AMD SEV Secure Nested Paging Firmware ABI Specification, 2025. Document 56860, Rev. 1.58 (May 2025)

  2. [2]

    Performance analysis of scientific computing workloads on general pur- pose tees

    Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, and Sean Peisert. Performance analysis of scientific computing workloads on general pur- pose tees. In2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1066–1076. IEEE, 2021

  3. [3]

    Confidential computing

    AMD. Confidential computing. https://www.amd.com/en/products/processors/ server/epyc/confidential-computing.html, 2025. Accessed: 2025-12-25

  4. [4]

    A view of cloud computing.Communications of the ACM, 53(4):50–58, 2010

    Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing.Communications of the ACM, 53(4):50–58, 2010

  5. [5]

    Demystifying amd sev performance penalty for nfv deployment

    Syafiq Al Atiiq and Aris Cahyadi Risdianto. Demystifying amd sev performance penalty for nfv deployment. InProceedings of the 2024 13th International Confer- ence on Networks, Communication and Computing, pages 1–8, 2024

  6. [6]

    Duckdb-sgx2: The good, the bad and the ugly within confidential analytical query processing

    Ilaria Battiston, Lotte Felius, Sam Ansmink, Laurens Kuiper, and Peter Boncz. Duckdb-sgx2: The good, the bad and the ugly within confidential analytical query processing. InProceedings of the 20th International Workshop on Data Management on New Hardware, pages 1–5, 2024

  7. [7]

    Making cost-based query optimization asymmetry-aware

    Daniel Bausch, Ilia Petrov, and Alejandro Buchmann. Making cost-based query optimization asymmetry-aware. InProceedings of the Eighth International Work- shop on Data Management on New Hardware, pages 24–32, 2012

  8. [8]

    https://www.tpc.org/tpcds/

    TPC-DS Benchmark. https://www.tpc.org/tpcds/

  9. [9]

    An ele- phant under the microscope: Analyzing the interaction of optimizer components in postgresql.Proceedings of the ACM on Management of Data, 3(1):1–28, 2025

    Rico Bergmann, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. An ele- phant under the microscope: Analyzing the interaction of optimizer components in postgresql.Proceedings of the ACM on Management of Data, 3(1):1–28, 2025

  10. [10]

    Controlling data in the cloud: outsourc- ing computation without outsourcing control

    Richard Chow, Philippe Golle, Markus Jakobsson, Elaine Shi, Jessica Staddon, Ryusuke Masuoka, and Jesus Molina. Controlling data in the cloud: outsourc- ing computation without outsourcing control. InProceedings of the 2009 ACM workshop on Cloud computing security, pages 85–90, 2009

  11. [11]

    Intel sgx explained.Cryptology ePrint Archive, 2016

    Victor Costan and Srinivas Devadas. Intel sgx explained.Cryptology ePrint Archive, 2016

  12. [12]

    The snowflake elastic data warehouse

    Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, et al. The snowflake elastic data warehouse. InProceedings of the 2016 International Conference on Management of Data, pages 215–226, 2016

  13. [13]

    Oblidb: Oblivious query processing for secure databases.arXiv preprint arXiv:1710.00458, 2017

    Saba Eskandarian and Matei Zaharia. Oblidb: Oblivious query processing for secure databases.arXiv preprint arXiv:1710.00458, 2017

  14. [14]

    Encdbdb: Searchable encrypted, fast, compressed, in-memory database using enclaves

    Benny Fuhry, HA Jayanth Jain, and Florian Kerschbaum. Encdbdb: Searchable encrypted, fast, compressed, in-memory database using enclaves. In2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 438–450. IEEE, 2021

  15. [15]

    Sok: Cryptographically protected database search

    Benjamin Fuller, Mayank Varia, Arkady Yerukhimovich, Emily Shen, Ariel Ham- lin, Vijay Gadepally, Richard Shay, John Darby Mitchell, and Robert K Cunning- ham. Sok: Cryptographically protected database search. In2017 IEEE Symposium on Security and Privacy (SP), pages 172–191. IEEE, 2017

  16. [16]

    The cascades framework for query optimization.IEEE Data Eng

    Goetz Graefe. The cascades framework for query optimization.IEEE Data Eng. Bull., 18(3):19–29, 1995

  17. [17]

    The volcano optimizer generator: Extensi- bility and efficient search

    Goetz Graefe and William J McKenna. The volcano optimizer generator: Extensi- bility and efficient search. InProceedings of IEEE 9th international conference on data engineering, pages 209–218. IEEE, 1993

  18. [18]

    Flexway {O-Sort}:{Enclave-Friendly} and optimal oblivious sorting

    Tianyao Gu, Yilei Wang, Afonso Tinoco, Bingnan Chen, Ke Yi, and Elaine Shi. Flexway {O-Sort}:{Enclave-Friendly} and optimal oblivious sorting. In34th USENIX Security Symposium (USENIX Security 25), pages 7563–7582, 2025

  19. [19]

    Prodb: A memory-secure database using hardware enclave and practical oblivious ram.Information Systems, 96:101681, 2021

    Ziyang Han and Haibo Hu. Prodb: A memory-secure database using hardware enclave and practical oblivious ram.Information Systems, 96:101681, 2021

  20. [20]

    Severifast: Minimizing the root of trust for fast startup of sev microvms

    Benjamin Holmes, Jason Waterman, and Dan Williams. Severifast: Minimizing the root of trust for fast startup of sev microvms. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, pages 1045–1060, 2024

  21. [21]

    Intel ® trust domain extensions (intel ® tdx) overview

    Intel. Intel ® trust domain extensions (intel ® tdx) overview. https: //www.intel.com/content/www/us/en/developer/tools/trust-domain- extensions/overview.html, 2025. Accessed: 2025-12-25

  22. [22]

    Efficient oblivious database joins.arXiv preprint arXiv:2003.09481, 2020

    Simeon Krastnikov, Florian Kerschbaum, and Douglas Stebila. Efficient oblivious database joins.arXiv preprint arXiv:2003.09481, 2020

  23. [23]

    How good are query optimizers, really?Proceedings of the VLDB Endowment, 9(3):204–215, 2015

    Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really?Proceedings of the VLDB Endowment, 9(3):204–215, 2015

  24. [24]

    Encrypted databases made secure yet maintainable

    Mingyu Li, Xuyang Zhao, Le Chen, Cheng Tan, Huorong Li, Sheng Wang, Zeyu Mi, Yubin Xia, Feifei Li, and Haibo Chen. Encrypted databases made secure yet maintainable. In17th USENIX symposium on operating systems design and implementation (OSDI 23), pages 117–133, 2023

  25. [25]

    Benchmarking analytical query processing in intel sgxv2.arXiv preprint arXiv:2403.11874, 2024

    Adrian Lutsch, Muhammad El-Hindi, Matthias Heinrich, Daniel Ritter, Zsolt IstvĂĄn, and Carsten Binnig. Benchmarking analytical query processing in intel sgxv2.arXiv preprint arXiv:2403.11874, 2024

  26. [26]

    An analysis of aws nitro enclaves for database workloads

    Adrian Lutsch, Christian Franck, Muhammad El-Hindi, Zsolt István, and Carsten Binnig. An analysis of aws nitro enclaves for database workloads. InProceedings of the 21st International Workshop on Data Management on New Hardware, pages 1–8, 2025

  27. [27]

    Cracking- like join for trusted execution environments.Proceedings of the VLDB Endowment, 16(9):2330–2343, 2023

    Kajetan Maliszewski, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. Cracking- like join for trusted execution environments.Proceedings of the VLDB Endowment, 16(9):2330–2343, 2023

  28. [28]

    What is the price for joining securely? benchmarking equi-joins in trusted exe- cution environments.Proceedings of the VLDB Endowment, 15(3):659–672, 2021

    Kajetan Maliszewski, Jorge-Arnulfo Quiané-Ruiz, Jonas Traub, and Volker Markl. What is the price for joining securely? benchmarking equi-joins in trusted exe- cution environments.Proceedings of the VLDB Endowment, 15(3):659–672, 2021

  29. [29]

    Generic database cost models for hierarchical memory systems

    Stefan Manegold, Peter Boncz, and Martin L Kersten. Generic database cost models for hierarchical memory systems. InVLDB’02: Proceedings of the 28th International Conference on Very Large Databases, pages 191–202. Elsevier, 2002

  30. [30]

    Bao: Making learned query optimization practical

    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Al- izadeh, and Tim Kraska. Bao: Making learned query optimization practical. In SIGMOD, pages 1275–1288, 2021

  31. [31]

    Obliviator: Oblivious parallel joins and other opera- tors in shared memory environments.Cryptology ePrint Archive, 2025

    Apostolos Mavrogiannakis, Xian Wang, Ioannis Demertzis, Dimitrios Papadopou- los, and Minos Garofalakis. Obliviator: Oblivious parallel joins and other opera- tors in shared memory environments.Cryptology ePrint Archive, 2025

  32. [33]

    Confidential vms explained: An empirical analysis of amd sev-snp and intel tdx.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8(3):1–42, 2024

    Masanori Misono, Dimitrios Stavrakakis, Nuno Santos, and Pramod Bhatotia. Confidential vms explained: An empirical analysis of amd sev-snp and intel tdx.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8(3):1–42, 2024

  33. [34]

    A comparison study of intel sgx and amd memory encryption technology

    Saeid Mofrad, Fengwei Zhang, Shiyong Lu, and Weidong Shi. A comparison study of intel sgx and amd memory encryption technology. InProceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy, pages 1–8, 2018

  34. [35]

    Confidential computing—a brave new world

    Dominic P Mulligan, Gustavo Petri, Nick Spinale, Gareth Stockwell, and Hugo JM Vincent. Confidential computing—a brave new world. In2021 international symposium on secure and private execution environment design (SEED), pages 132–138. IEEE, 2021

  35. [36]

    Can homomorphic encryption be practical? InProceedings of the 3rd ACM workshop on Cloud computing security workshop, pages 113–124, 2011

    Michael Naehrig, Kristin Lauter, and Vinod Vaikuntanathan. Can homomorphic encryption be practical? InProceedings of the 3rd ACM workshop on Cloud computing security workshop, pages 113–124, 2011

  36. [37]

    Flow-loss: Learning cardinality estimates that matter.Proc

    Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. Flow-loss: Learning cardinality estimates that matter.Proc. VLDB Endow., 14(11):2019–2032, jul 2021

  37. [38]

    https://www.postgresql.org/

    PostgreSQL. https://www.postgresql.org/

  38. [39]

    The price of pri- vacy: a performance study of confidential virtual machines for database systems

    Lina Qiu, Rebecca Taft, Alexander Shraer, and George Kollios. The price of pri- vacy: a performance study of confidential virtual machines for database systems. InProceedings of the 20th International Workshop on Data Management on New Hardware, pages 1–8, 2024

  39. [40]

    Access path selection in a relational database management system

    P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. Access path selection in a relational database management system. InProceedings of the 1979 ACM SIGMOD international conference on Management of data, pages 23–34, 1979

  40. [41]

    Strengthening vm isolation with integrity protection and more

    AMD Sev-Snp. Strengthening vm isolation with integrity protection and more. White Paper, January, 53(2020):1450–1465, 2020

  41. [42]

    Amazon aurora: Design considerations for high throughput cloud-native relational databases

    Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Ka- mal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. InProceedings of the 2017 ACM International Conference on Management of Data, pages 104...

  42. [43]

    Stealthdb: a scalable encrypted database with full sql query support.Proceedings on Privacy Enhancing Technologies, 2019

    Dhinakaran Vinayagamurthy, Alexey Gribov, and Sergey Gorbunov. Stealthdb: a scalable encrypted database with full sql query support.Proceedings on Privacy Enhancing Technologies, 2019

  43. [44]

    Revisiting virtual memory support for confidential computing environments, 2025

    Haoyu Wang, Noa Zilberman, Ahmad Atamli, and Amro Awad. Revisiting virtual memory support for confidential computing environments, 2025

  44. [45]

    Sevu- rity: No security without integrity: Breaking integrity-free memory encryption with minimal assumptions

    Luca Wilke, Jan Wichelmann, Mathias Morbitzer, and Thomas Eisenbarth. Sevu- rity: No security without integrity: Breaking integrity-free memory encryption with minimal assumptions. In2020 IEEE Symposium on Security and Privacy (SP), pages 1483–1496. IEEE, 2020

  45. [46]

    Performance overheads of confidential virtual machines

    Mingjie Yan and Kartik Gopalan. Performance overheads of confidential virtual machines. In2023 31st International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pages 1–8. IEEE, 2023