pith. sign in

arxiv: 2605.22992 · v1 · pith:JU4VHN7Pnew · submitted 2026-05-21 · 💻 cs.SE · cs.DB

Finding Performance Issues in Database Systems by Exploiting Dormant Code Paths

Pith reviewed 2026-05-25 05:39 UTC · model grok-4.3

classification 💻 cs.SE cs.DB
keywords performance issuesdatabase management systemsquery optimizationbranch analysisDBMS testingTPC-HTPC-DSwhite-box testing
0
0 comments X

The pith

Flipping branches that control database optimizations exposes cases where the optimization itself slows queries down.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Branch Flip Analysis to find performance bugs in database systems. Instead of comparing runs for consistency, it forces each optimization on or off by changing a single branch in the source code and measures the resulting performance. If disabling an optimization makes queries faster, the optimization contains a defect. The method was implemented in a tool called QueryZen and applied to PostgreSQL, MySQL, CockroachDB, and MariaDB using standard TPC-H and TPC-DS workloads, locating 21 previously unknown issues. This approach gives developers a direct way to test whether each optimization logic is sound without requiring an external performance oracle.

Core claim

BFA flips code branches to enforce or disable an optimization, and the performance is expected to be not significantly better. Otherwise, a performance issue exists. QueryZen found 21 previously unknown and unique performance issues with the workload of TPC-H and TPC-DS on PostgreSQL, MySQL, CockroachDB, and MariaDB.

What carries the argument

Branch Flip Analysis (BFA), a white-box technique that toggles individual source-code branches controlling optimizations to isolate and test their performance effects.

If this is right

  • Performance defects can be attributed directly to specific optimization logic rather than to interactions among many components.
  • The same branch-flip procedure works across multiple mature database engines without modification to their query planners.
  • Standard benchmark workloads are sufficient to surface real defects when the analysis is performed at branch granularity.
  • Developers obtain a repeatable test that flags an optimization as suspicious whenever its removal improves measured runtime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could be applied to other performance-critical systems such as compilers or operating-system schedulers by targeting analogous control branches.
  • Automated tooling could maintain a living set of branch-flip tests that run on every code change to catch regressions in optimization quality.
  • When an optimization is found defective, the same flip provides a minimal patch that can be used to quantify the exact performance loss.

Load-bearing premise

Flipping one branch changes only the targeted optimization and introduces no unrelated side effects on performance or correctness.

What would settle it

Apply the same branch flips to a set of hand-verified correct optimizations and check whether any produce large unexplained speedups or correctness failures.

Figures

Figures reproduced from arXiv: 2605.22992 by Jinsheng Ba, Zhendong Su.

Figure 1
Figure 1. Figure 1: Overview of BFA. of an IF statement, DBMS′ disables this optimization by executing the other branch, and vice versa. We only manipu￾late an IF statement per execution. We believe this strategy is simple yet efficient for exploring optimization space in code. In [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The last commit associated with the issues found by [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Branch coverage for TPC-H benchmark before and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
read the original abstract

Performance is a critical characteristic of fundamental systems, such as Database Management Systems (DBMSs). Both academia and industry have invested decades in exploring efficient optimization algorithms. Despite these efforts, DBMSs are prone to performance issues, which incur suboptimal performance. Finding such issues is a longstanding challenge as no ground-truth performance is available. Existing work adopts black-box methods to examine performance consistency across executions, but cannot systematically test optimizations. In this work, we propose a novel, general white-box methodology, Branch Flip Analysis (BFA), to systematically and effectively uncover performance issues. BFA flips code branches to enforce or disable an optimization, and the performance is expected to be not significantly better. Otherwise, a performance issue exists. BFA provides a new perspective to finding performance issues and testing optimization logics in a fine-grained manner. We realized BFA in a prototype system QueryZen, and evaluated it on four widely-used and mature DBMSs: PostgreSQL, MySQL, CockroachDB, and MariaDB. QueryZen found 21 previously unknown and unique performance issues with the workload of the extensively used benchmarks TPC-H and TPC-DS. The core concept of BFA is simple and broadly applicable, and can be adapted to analyze the performance of other software systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Branch Flip Analysis (BFA), a white-box method that identifies performance issues in DBMS query optimizers by flipping code branches to enforce or disable optimizations; performance is expected to show no significant improvement, and deviations indicate defects. Implemented as QueryZen, the approach is evaluated on PostgreSQL, MySQL, CockroachDB, and MariaDB with TPC-H and TPC-DS workloads, reporting discovery of 21 previously unknown and unique performance issues.

Significance. If the isolation property of branch flips holds and the reported issues prove reproducible, BFA supplies a systematic, fine-grained complement to black-box consistency testing for optimizer logic. The evaluation across four production DBMSs and two standard benchmarks, together with the conceptual simplicity of the core idea, indicates potential for adaptation beyond databases.

major comments (2)
  1. [§3] §3 (BFA definition): The claim that a branch flip isolates the effect of one optimization rests on an unverified assumption that no shared planner state, statistics caches, or downstream rewrites are affected; without differential plan or trace analysis before/after the flip, performance deviations cannot be attributed solely to optimization defects. This assumption is load-bearing for the 21-issue claim.
  2. [§5] §5 (evaluation results): No quantitative threshold, statistical test, or error-bar protocol is supplied for 'significantly better' performance, nor are baseline comparisons or reproducibility checks across multiple runs reported; this directly affects whether the 21 issues can be treated as genuine rather than measurement artifacts.
minor comments (2)
  1. [§1] The abstract and §1 repeat the same high-level description of BFA without adding concrete mechanics; a short illustrative code snippet or decision diagram would improve clarity.
  2. [§5] Table or figure captions in the evaluation section should explicitly state the number of runs and workload scale factors used for each reported issue.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [§3] §3 (BFA definition): The claim that a branch flip isolates the effect of one optimization rests on an unverified assumption that no shared planner state, statistics caches, or downstream rewrites are affected; without differential plan or trace analysis before/after the flip, performance deviations cannot be attributed solely to optimization defects. This assumption is load-bearing for the 21-issue claim.

    Authors: We agree that the isolation property is central and that the original manuscript does not provide differential plan or trace analysis to empirically confirm it. The branch flips were chosen via manual code inspection to target individual optimizations, but this does not fully rule out side effects on shared state. We will add differential plan comparisons before and after selected flips in a revised §3 to strengthen the attribution of the 21 issues. revision: yes

  2. Referee: [§5] §5 (evaluation results): No quantitative threshold, statistical test, or error-bar protocol is supplied for 'significantly better' performance, nor are baseline comparisons or reproducibility checks across multiple runs reported; this directly affects whether the 21 issues can be treated as genuine rather than measurement artifacts.

    Authors: We acknowledge that the manuscript lacks an explicit quantitative threshold or statistical protocol for declaring significance. The 21 issues were flagged by observed performance gains that were then manually inspected and confirmed as defects. We will revise §5 to define a concrete threshold (performance improvement exceeding 10%), report the number of runs performed per query, and include basic reproducibility information. revision: yes

Circularity Check

0 steps flagged

No significant circularity; BFA is a direct empirical rule

full rationale

The paper defines BFA as flipping branches to enforce/disable an optimization and checking whether performance improves significantly; any such improvement is labeled a performance issue. This detection criterion is stated directly in terms of observed runtime behavior after the flip and does not reduce to any fitted parameter, self-citation chain, or prior result whose validity depends on the present work. No equations, uniqueness theorems, or ansatzes appear. The method is therefore self-contained and can be evaluated against external benchmarks (TPC-H/TPC-DS on four DBMSs) without circular dependence on its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about the isolating effect of branch flips and on the availability of source code for the target DBMSs; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Flipping a code branch isolates the effect of one optimization without introducing unrelated performance or correctness side effects.
    This premise is required for any performance deviation after the flip to be interpreted as evidence of an optimization defect.

pith-pipeline@v0.9.0 · 5750 in / 1089 out tokens · 26251 ms · 2026-05-25T05:39:25.944665+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Cert: Finding perfor- mance issues in database systems through the lens of cardinality estimation

    Jinsheng Ba and Manuel Rigger. Cert: Finding perfor- mance issues in database systems through the lens of cardinality estimation. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024

  2. [2]

    Keep it simple: Testing databases via differential query plans.Proceedings of the ACM on Management of Data, 2(3):1–26, 2024

    Jinsheng Ba and Manuel Rigger. Keep it simple: Testing databases via differential query plans.Proceedings of the ACM on Management of Data, 2(3):1–26, 2024

  3. [3]

    Hailstorm: Disaggregated compute and storage for distributed lsm-based databases

    Laurent Bindschaedler, Ashvin Goel, and Willy Zwaenepoel. Hailstorm: Disaggregated compute and storage for distributed lsm-based databases. InASPLOS ’20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16-20, 2020, pages 301–316, 2020

  4. [4]

    Boncz, Thomas Neumann, and Orri Erling

    Peter A. Boncz, Thomas Neumann, and Orri Erling. TPC-H analyzed: Hidden messages and lessons learned 12 from an influential benchmark. In Raghunath Nambiar and Meikel Poess, editors,Performance Characteriza- tion and Benchmarking - 5th TPC Technology Confer- ence, TPCTC 2013, Trento, Italy, August 26, 2013, Re- vised Selected Papers, volume 8391 ofLecture...

  5. [5]

    An overview of query optimization in relational systems

    Surajit Chaudhuri. An overview of query optimization in relational systems. In Alberto O. Mendelzon and Jan Paredaens, editors,Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, pages 34–43. ACM Press, 1998

  6. [6]

    Counter strike: Generic top-down join enumeration for hypergraphs

    Pit Fender and Guido Moerkotte. Counter strike: Generic top-down join enumeration for hypergraphs. Proc. VLDB Endow., 6(14):1822–1833, 2013

  7. [7]

    Effective and robust pruning for top-down join enumeration algorithms

    Pit Fender, Guido Moerkotte, Thomas Neumann, and Viktor Leis. Effective and robust pruning for top-down join enumeration algorithms. In Anastasios Kementsiet- sidis and Marcos Antonio Vaz Salles, editors,IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, pages 414–425. IEEE Compu...

  8. [8]

    Griffin : Grammar-free DBMS fuzzing

    Jingzhou Fu, Jie Liang, Zhiyong Wu, Mingzhe Wang, and Yu Jiang. Griffin : Grammar-free DBMS fuzzing. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022, pages 49:1–49:12. ACM, 2022

  9. [9]

    Deployment of query plans on multicores.Proc

    Jana Giceva, Gustavo Alonso, Timothy Roscoe, and Tim Harris. Deployment of query plans on multicores.Proc. VLDB Endow., 8(3):233–244, 2014

  10. [10]

    Auto- mated reasoning and detection of specious configura- tion in large systems with symbolic execution

    Yigong Hu, Gongqi Huang, and Peng Huang. Auto- mated reasoning and detection of specious configura- tion in large systems with symbolic execution. In14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 719–734, 2020

  11. [11]

    Query optimization.ACM Comput- ing Surveys (CSUR), 28(1):121–123, 1996

    Yannis E Ioannidis. Query optimization.ACM Comput- ing Surveys (CSUR), 28(1):121–123, 1996

  12. [12]

    Adaptive Cardinality Estimation

    Oleg Ivanov and Sergey Bartunov. Adaptive cardinality estimation.arXiv preprint arXiv:1711.08330, 2017

  13. [13]

    Query optimization in database systems.ACM Computing surveys (CsUR), 16(2):111–152, 1984

    Matthias Jarke and Jurgen Koch. Query optimization in database systems.ACM Computing surveys (CsUR), 16(2):111–152, 1984

  14. [14]

    Dynsql: Stateful fuzzing for database management systems with complex and valid sql query generation

    Zu-Ming Jiang, Jia-Ju Bai, and Zhendong Su. Dynsql: Stateful fuzzing for database management systems with complex and valid sql query generation. In32st USENIX Security Symposium (USENIX Security 23), 2023

  15. [15]

    Understanding and detecting real- world performance bugs

    Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scher- pelz, and Shan Lu. Understanding and detecting real- world performance bugs. In Jan Vitek, Haibo Lin, and Frank Tip, editors,ACM SIGPLAN Conference on Pro- gramming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, pages 77–88. ACM, 2012

  16. [16]

    APOLLO: automatic detection and di- agnosis of performance regressions in database systems

    Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woon-Hak Kang. APOLLO: automatic detection and di- agnosis of performance regressions in database systems. Proc. VLDB Endow., 13(1):57–70, 2019

  17. [17]

    Boncz, Alfons Kemper, and Thomas Neumann

    Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really?Proc. VLDB Endow., 9(3):204–215, 2015

  18. [18]

    Perffuzz: Automatically generating patho- logical inputs

    Caroline Lemieux, Rohan Padhye, Koushik Sen, and Dawn Song. Perffuzz: Automatically generating patho- logical inputs. InProceedings of the 27th ACM SIG- SOFT international symposium on software testing and analysis, pages 254–265, 2018

  19. [19]

    Stat- ically inferring performance properties of software con- figurations

    Chi Li, Shu Wang, Henry Hoffmann, and Shan Lu. Stat- ically inferring performance properties of software con- figurations. InProceedings of the Fifteenth European Conference on Computer Systems, pages 1–16, 2020

  20. [20]

    Mozi: Discovering dbms bugs via configuration-based equivalent transformation

    Jie Liang, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, Chengnian Sun, and Yu Jiang. Mozi: Discovering dbms bugs via configuration-based equivalent transformation. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–12, 2024

  21. [21]

    Automatic detection of performance bugs in database systems using equivalent queries

    Xinyu Liu, Qi Zhou, Joy Arulraj, and Alessandro Orso. Automatic detection of performance bugs in database systems using equivalent queries. In44th IEEE/ACM 44th International Conference on Software Engineer- ing, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 225–236. ACM, 2022

  22. [22]

    JUNO: optimizing high-dimensional approximate near- est neighbour search with sparsity-aware algorithm and ray-tracing core mapping

    Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, and Yuhao Zhu. JUNO: optimizing high-dimensional approximate near- est neighbour search with sparsity-aware algorithm and ray-tracing core mapping. In Rajiv Gupta, Nael B. Abu- Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors, Proceedings of the 29th ACM International...

  23. [23]

    Reza Taheri, Matthew Emmerton, Forrest Carman, and Michael Majdalany

    Raghunath Othayoth Nambiar, Meikel Poess, Andrew Masland, H. Reza Taheri, Matthew Emmerton, Forrest Carman, and Michael Majdalany. TPC benchmark 13 roadmap 2012. In Raghunath Othayoth Nambiar and Meikel Poess, editors,Selected Topics in Performance Evaluation and Benchmarking - 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, ...

  24. [24]

    Query simplification: graceful degra- dation for join-order optimization

    Thomas Neumann. Query simplification: graceful degra- dation for join-order optimization. In Ugur Çetintemel, Stanley B. Zdonik, Donald Kossmann, and Nesime Tat- bul, editors,Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 403–414. ACM, 2009

  25. [25]

    GPL: A gpu- based pipelined query processing engine

    Johns Paul, Jiong He, and Bingsheng He. GPL: A gpu- based pipelined query processing engine. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, Proceedings of the 2016 International Conference on Management oData, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1935–1950. ACM, 2016

  26. [26]

    Slowfuzz: Automated domain- independent detection of algorithmic complexity vul- nerabilities

    Theofilos Petsios, Jason Zhao, Angelos D Keromytis, and Suman Jana. Slowfuzz: Automated domain- independent detection of algorithmic complexity vul- nerabilities. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 2155–2168, 2017

  27. [27]

    Detecting optimiza- tion bugs in database engines via non-optimizing refer- ence engine construction

    Manuel Rigger and Zhendong Su. Detecting optimiza- tion bugs in database engines via non-optimizing refer- ence engine construction. In Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann, editors,ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Soft- ware Engineering, Virtual Event, USA, November 8-...

  28. [28]

    Finding bugs in database systems via query partitioning.Proc

    Manuel Rigger and Zhendong Su. Finding bugs in database systems via query partitioning.Proc. ACM Program. Lang., 4(OOPSLA):211:1–211:30, 2020

  29. [29]

    Testing database engines via pivoted query synthesis

    Manuel Rigger and Zhendong Su. Testing database engines via pivoted query synthesis. In14th USENIX Symposium on Operating Systems Design and Imple- mentation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 667–682. USENIX Association, 2020

  30. [30]

    Persistent memory disag- gregation for cloud-native relational databases

    Chaoyi Ruan, Yingqiang Zhang, Chao Bi, Xiaosong Ma, Hao Chen, Feifei Li, Xinjun Yang, Cheng Li, Ashraf Aboulnaga, and Yinlong Xu. Persistent memory disag- gregation for cloud-native relational databases. In Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift, editors,Proceedings of the 28th ACM Interna- tional Conference on Architectural Suppor...

  31. [31]

    Testing database systems via differential query execution

    Jiansen Song, Wensheng Dou, Ziyu Cui, Qianwang Dai, Wei Wang, Jun Wei, Hua Zhong, and Tao Huang. Testing database systems via differential query execution. In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE), 2023

  32. [32]

    Memsnap µcheckpoints: A data single level store for fearless persistence

    Emil Tsalapatis, Ryan Hancock, Rakeeb Hossain, and Ali José Mashtizadeh. Memsnap µcheckpoints: A data single level store for fearless persistence. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors,Proceedings of the 29th ACM Inter- national Conference on Architectural Support for Pro- gramming Languages and Operating Systems,...

  33. [33]

    Tpc-ds benchmark

    Website. Tpc-ds benchmark. https://www.tpc.org/ tpcds/, 1988. Accessed: 2022-11-15

  34. [34]

    American fuzzy lop (afl) fuzzer

    Website. American fuzzy lop (afl) fuzzer. http://lcamtuf.coredump.cx/afl/technical_ details.txt, 2013. Accessed: 2022-11-15

  35. [35]

    Sqlsmith

    Website. Sqlsmith. https://github.com/anse1/ sqlsmith, 2015. Accessed: 2022-11-15

  36. [36]

    Cockroachdb code coverage measurement

    Website. Cockroachdb code coverage measurement. https://cockroachlabs.atlassian.net/wiki/ spaces/CRDB/pages/73171260/Code+coverage,

  37. [37]

    Accessed: 2025-05-15

  38. [38]

    Database systems ranking

    Website. Database systems ranking. https: //db-engines.com/en/ranking, 2022. Accessed: 2025-05-15

  39. [39]

    Performance on widely used work- loads

    Website. Performance on widely used work- loads. https://www.postgresql.org/ message-id/CAMkU%3D1xoU06eW4CrEZyDDn% 2BfnJaCe3b04rE3mdVu4Gsxmj9KFA%40mail.gmail. com, 2025. Accessed: 2025-11-15

  40. [40]

    Paine, Martha A

    Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. Q100: the architecture and design of a database processing unit. In Rajeev Balasubramonian, Al Davis, and Sarita V . Adve, editors, Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, Salt Lake City, UT, USA, March 1-5, 2014, pages 255–268. ACM, 2014

  41. [41]

    Puppy: Finding performance degradation bugs in dbmss via limited-optimization plan construc- tion

    Zhiyong Wu, Jie Liang, Jingzhou Fu, Mingzhe Wang, and Yu Jiang. Puppy: Finding performance degradation bugs in dbmss via limited-optimization plan construc- tion. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 560–571. IEEE Computer Society, 2024. 14

  42. [42]

    SQUIRREL: testing database management systems with language validity and coverage feedback

    Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Dinghao Wu. SQUIRREL: testing database management systems with language validity and coverage feedback. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna, editors,CCS ’20: 2020 ACM SIGSAC Conference on Computer and Com- munications Security, Virtual Event, USA, November 9-13, 202...