Finding Performance Issues in Database Systems by Exploiting Dormant Code Paths
Pith reviewed 2026-05-25 05:39 UTC · model grok-4.3
The pith
Flipping branches that control database optimizations exposes cases where the optimization itself slows queries down.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BFA flips code branches to enforce or disable an optimization, and the performance is expected to be not significantly better. Otherwise, a performance issue exists. QueryZen found 21 previously unknown and unique performance issues with the workload of TPC-H and TPC-DS on PostgreSQL, MySQL, CockroachDB, and MariaDB.
What carries the argument
Branch Flip Analysis (BFA), a white-box technique that toggles individual source-code branches controlling optimizations to isolate and test their performance effects.
If this is right
- Performance defects can be attributed directly to specific optimization logic rather than to interactions among many components.
- The same branch-flip procedure works across multiple mature database engines without modification to their query planners.
- Standard benchmark workloads are sufficient to surface real defects when the analysis is performed at branch granularity.
- Developers obtain a repeatable test that flags an optimization as suspicious whenever its removal improves measured runtime.
Where Pith is reading between the lines
- The technique could be applied to other performance-critical systems such as compilers or operating-system schedulers by targeting analogous control branches.
- Automated tooling could maintain a living set of branch-flip tests that run on every code change to catch regressions in optimization quality.
- When an optimization is found defective, the same flip provides a minimal patch that can be used to quantify the exact performance loss.
Load-bearing premise
Flipping one branch changes only the targeted optimization and introduces no unrelated side effects on performance or correctness.
What would settle it
Apply the same branch flips to a set of hand-verified correct optimizations and check whether any produce large unexplained speedups or correctness failures.
Figures
read the original abstract
Performance is a critical characteristic of fundamental systems, such as Database Management Systems (DBMSs). Both academia and industry have invested decades in exploring efficient optimization algorithms. Despite these efforts, DBMSs are prone to performance issues, which incur suboptimal performance. Finding such issues is a longstanding challenge as no ground-truth performance is available. Existing work adopts black-box methods to examine performance consistency across executions, but cannot systematically test optimizations. In this work, we propose a novel, general white-box methodology, Branch Flip Analysis (BFA), to systematically and effectively uncover performance issues. BFA flips code branches to enforce or disable an optimization, and the performance is expected to be not significantly better. Otherwise, a performance issue exists. BFA provides a new perspective to finding performance issues and testing optimization logics in a fine-grained manner. We realized BFA in a prototype system QueryZen, and evaluated it on four widely-used and mature DBMSs: PostgreSQL, MySQL, CockroachDB, and MariaDB. QueryZen found 21 previously unknown and unique performance issues with the workload of the extensively used benchmarks TPC-H and TPC-DS. The core concept of BFA is simple and broadly applicable, and can be adapted to analyze the performance of other software systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Branch Flip Analysis (BFA), a white-box method that identifies performance issues in DBMS query optimizers by flipping code branches to enforce or disable optimizations; performance is expected to show no significant improvement, and deviations indicate defects. Implemented as QueryZen, the approach is evaluated on PostgreSQL, MySQL, CockroachDB, and MariaDB with TPC-H and TPC-DS workloads, reporting discovery of 21 previously unknown and unique performance issues.
Significance. If the isolation property of branch flips holds and the reported issues prove reproducible, BFA supplies a systematic, fine-grained complement to black-box consistency testing for optimizer logic. The evaluation across four production DBMSs and two standard benchmarks, together with the conceptual simplicity of the core idea, indicates potential for adaptation beyond databases.
major comments (2)
- [§3] §3 (BFA definition): The claim that a branch flip isolates the effect of one optimization rests on an unverified assumption that no shared planner state, statistics caches, or downstream rewrites are affected; without differential plan or trace analysis before/after the flip, performance deviations cannot be attributed solely to optimization defects. This assumption is load-bearing for the 21-issue claim.
- [§5] §5 (evaluation results): No quantitative threshold, statistical test, or error-bar protocol is supplied for 'significantly better' performance, nor are baseline comparisons or reproducibility checks across multiple runs reported; this directly affects whether the 21 issues can be treated as genuine rather than measurement artifacts.
minor comments (2)
- [§1] The abstract and §1 repeat the same high-level description of BFA without adding concrete mechanics; a short illustrative code snippet or decision diagram would improve clarity.
- [§5] Table or figure captions in the evaluation section should explicitly state the number of runs and workload scale factors used for each reported issue.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [§3] §3 (BFA definition): The claim that a branch flip isolates the effect of one optimization rests on an unverified assumption that no shared planner state, statistics caches, or downstream rewrites are affected; without differential plan or trace analysis before/after the flip, performance deviations cannot be attributed solely to optimization defects. This assumption is load-bearing for the 21-issue claim.
Authors: We agree that the isolation property is central and that the original manuscript does not provide differential plan or trace analysis to empirically confirm it. The branch flips were chosen via manual code inspection to target individual optimizations, but this does not fully rule out side effects on shared state. We will add differential plan comparisons before and after selected flips in a revised §3 to strengthen the attribution of the 21 issues. revision: yes
-
Referee: [§5] §5 (evaluation results): No quantitative threshold, statistical test, or error-bar protocol is supplied for 'significantly better' performance, nor are baseline comparisons or reproducibility checks across multiple runs reported; this directly affects whether the 21 issues can be treated as genuine rather than measurement artifacts.
Authors: We acknowledge that the manuscript lacks an explicit quantitative threshold or statistical protocol for declaring significance. The 21 issues were flagged by observed performance gains that were then manually inspected and confirmed as defects. We will revise §5 to define a concrete threshold (performance improvement exceeding 10%), report the number of runs performed per query, and include basic reproducibility information. revision: yes
Circularity Check
No significant circularity; BFA is a direct empirical rule
full rationale
The paper defines BFA as flipping branches to enforce/disable an optimization and checking whether performance improves significantly; any such improvement is labeled a performance issue. This detection criterion is stated directly in terms of observed runtime behavior after the flip and does not reduce to any fitted parameter, self-citation chain, or prior result whose validity depends on the present work. No equations, uniqueness theorems, or ansatzes appear. The method is therefore self-contained and can be evaluated against external benchmarks (TPC-H/TPC-DS on four DBMSs) without circular dependence on its own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Flipping a code branch isolates the effect of one optimization without introducing unrelated performance or correctness side effects.
Reference graph
Works this paper leans on
-
[1]
Cert: Finding perfor- mance issues in database systems through the lens of cardinality estimation
Jinsheng Ba and Manuel Rigger. Cert: Finding perfor- mance issues in database systems through the lens of cardinality estimation. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024
work page 2024
-
[2]
Jinsheng Ba and Manuel Rigger. Keep it simple: Testing databases via differential query plans.Proceedings of the ACM on Management of Data, 2(3):1–26, 2024
work page 2024
-
[3]
Hailstorm: Disaggregated compute and storage for distributed lsm-based databases
Laurent Bindschaedler, Ashvin Goel, and Willy Zwaenepoel. Hailstorm: Disaggregated compute and storage for distributed lsm-based databases. InASPLOS ’20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16-20, 2020, pages 301–316, 2020
work page 2020
-
[4]
Boncz, Thomas Neumann, and Orri Erling
Peter A. Boncz, Thomas Neumann, and Orri Erling. TPC-H analyzed: Hidden messages and lessons learned 12 from an influential benchmark. In Raghunath Nambiar and Meikel Poess, editors,Performance Characteriza- tion and Benchmarking - 5th TPC Technology Confer- ence, TPCTC 2013, Trento, Italy, August 26, 2013, Re- vised Selected Papers, volume 8391 ofLecture...
work page 2013
-
[5]
An overview of query optimization in relational systems
Surajit Chaudhuri. An overview of query optimization in relational systems. In Alberto O. Mendelzon and Jan Paredaens, editors,Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, pages 34–43. ACM Press, 1998
work page 1998
-
[6]
Counter strike: Generic top-down join enumeration for hypergraphs
Pit Fender and Guido Moerkotte. Counter strike: Generic top-down join enumeration for hypergraphs. Proc. VLDB Endow., 6(14):1822–1833, 2013
work page 2013
-
[7]
Effective and robust pruning for top-down join enumeration algorithms
Pit Fender, Guido Moerkotte, Thomas Neumann, and Viktor Leis. Effective and robust pruning for top-down join enumeration algorithms. In Anastasios Kementsiet- sidis and Marcos Antonio Vaz Salles, editors,IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, pages 414–425. IEEE Compu...
work page 2012
-
[8]
Griffin : Grammar-free DBMS fuzzing
Jingzhou Fu, Jie Liang, Zhiyong Wu, Mingzhe Wang, and Yu Jiang. Griffin : Grammar-free DBMS fuzzing. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022, pages 49:1–49:12. ACM, 2022
work page 2022
-
[9]
Deployment of query plans on multicores.Proc
Jana Giceva, Gustavo Alonso, Timothy Roscoe, and Tim Harris. Deployment of query plans on multicores.Proc. VLDB Endow., 8(3):233–244, 2014
work page 2014
-
[10]
Yigong Hu, Gongqi Huang, and Peng Huang. Auto- mated reasoning and detection of specious configura- tion in large systems with symbolic execution. In14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 719–734, 2020
work page 2020
-
[11]
Query optimization.ACM Comput- ing Surveys (CSUR), 28(1):121–123, 1996
Yannis E Ioannidis. Query optimization.ACM Comput- ing Surveys (CSUR), 28(1):121–123, 1996
work page 1996
-
[12]
Adaptive Cardinality Estimation
Oleg Ivanov and Sergey Bartunov. Adaptive cardinality estimation.arXiv preprint arXiv:1711.08330, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
Query optimization in database systems.ACM Computing surveys (CsUR), 16(2):111–152, 1984
Matthias Jarke and Jurgen Koch. Query optimization in database systems.ACM Computing surveys (CsUR), 16(2):111–152, 1984
work page 1984
-
[14]
Dynsql: Stateful fuzzing for database management systems with complex and valid sql query generation
Zu-Ming Jiang, Jia-Ju Bai, and Zhendong Su. Dynsql: Stateful fuzzing for database management systems with complex and valid sql query generation. In32st USENIX Security Symposium (USENIX Security 23), 2023
work page 2023
-
[15]
Understanding and detecting real- world performance bugs
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scher- pelz, and Shan Lu. Understanding and detecting real- world performance bugs. In Jan Vitek, Haibo Lin, and Frank Tip, editors,ACM SIGPLAN Conference on Pro- gramming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, pages 77–88. ACM, 2012
work page 2012
-
[16]
APOLLO: automatic detection and di- agnosis of performance regressions in database systems
Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woon-Hak Kang. APOLLO: automatic detection and di- agnosis of performance regressions in database systems. Proc. VLDB Endow., 13(1):57–70, 2019
work page 2019
-
[17]
Boncz, Alfons Kemper, and Thomas Neumann
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really?Proc. VLDB Endow., 9(3):204–215, 2015
work page 2015
-
[18]
Perffuzz: Automatically generating patho- logical inputs
Caroline Lemieux, Rohan Padhye, Koushik Sen, and Dawn Song. Perffuzz: Automatically generating patho- logical inputs. InProceedings of the 27th ACM SIG- SOFT international symposium on software testing and analysis, pages 254–265, 2018
work page 2018
-
[19]
Stat- ically inferring performance properties of software con- figurations
Chi Li, Shu Wang, Henry Hoffmann, and Shan Lu. Stat- ically inferring performance properties of software con- figurations. InProceedings of the Fifteenth European Conference on Computer Systems, pages 1–16, 2020
work page 2020
-
[20]
Mozi: Discovering dbms bugs via configuration-based equivalent transformation
Jie Liang, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, Chengnian Sun, and Yu Jiang. Mozi: Discovering dbms bugs via configuration-based equivalent transformation. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–12, 2024
work page 2024
-
[21]
Automatic detection of performance bugs in database systems using equivalent queries
Xinyu Liu, Qi Zhou, Joy Arulraj, and Alessandro Orso. Automatic detection of performance bugs in database systems using equivalent queries. In44th IEEE/ACM 44th International Conference on Software Engineer- ing, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 225–236. ACM, 2022
work page 2022
-
[22]
Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, and Yuhao Zhu. JUNO: optimizing high-dimensional approximate near- est neighbour search with sparsity-aware algorithm and ray-tracing core mapping. In Rajiv Gupta, Nael B. Abu- Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors, Proceedings of the 29th ACM International...
work page 2024
-
[23]
Reza Taheri, Matthew Emmerton, Forrest Carman, and Michael Majdalany
Raghunath Othayoth Nambiar, Meikel Poess, Andrew Masland, H. Reza Taheri, Matthew Emmerton, Forrest Carman, and Michael Majdalany. TPC benchmark 13 roadmap 2012. In Raghunath Othayoth Nambiar and Meikel Poess, editors,Selected Topics in Performance Evaluation and Benchmarking - 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, ...
work page 2012
-
[24]
Query simplification: graceful degra- dation for join-order optimization
Thomas Neumann. Query simplification: graceful degra- dation for join-order optimization. In Ugur Çetintemel, Stanley B. Zdonik, Donald Kossmann, and Nesime Tat- bul, editors,Proceedings of the ACM SIGMOD Interna- tional Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 403–414. ACM, 2009
work page 2009
-
[25]
GPL: A gpu- based pipelined query processing engine
Johns Paul, Jiong He, and Bingsheng He. GPL: A gpu- based pipelined query processing engine. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, Proceedings of the 2016 International Conference on Management oData, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1935–1950. ACM, 2016
work page 2016
-
[26]
Slowfuzz: Automated domain- independent detection of algorithmic complexity vul- nerabilities
Theofilos Petsios, Jason Zhao, Angelos D Keromytis, and Suman Jana. Slowfuzz: Automated domain- independent detection of algorithmic complexity vul- nerabilities. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 2155–2168, 2017
work page 2017
-
[27]
Detecting optimiza- tion bugs in database engines via non-optimizing refer- ence engine construction
Manuel Rigger and Zhendong Su. Detecting optimiza- tion bugs in database engines via non-optimizing refer- ence engine construction. In Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann, editors,ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Soft- ware Engineering, Virtual Event, USA, November 8-...
work page 2020
-
[28]
Finding bugs in database systems via query partitioning.Proc
Manuel Rigger and Zhendong Su. Finding bugs in database systems via query partitioning.Proc. ACM Program. Lang., 4(OOPSLA):211:1–211:30, 2020
work page 2020
-
[29]
Testing database engines via pivoted query synthesis
Manuel Rigger and Zhendong Su. Testing database engines via pivoted query synthesis. In14th USENIX Symposium on Operating Systems Design and Imple- mentation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 667–682. USENIX Association, 2020
work page 2020
-
[30]
Persistent memory disag- gregation for cloud-native relational databases
Chaoyi Ruan, Yingqiang Zhang, Chao Bi, Xiaosong Ma, Hao Chen, Feifei Li, Xinjun Yang, Cheng Li, Ashraf Aboulnaga, and Yinlong Xu. Persistent memory disag- gregation for cloud-native relational databases. In Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift, editors,Proceedings of the 28th ACM Interna- tional Conference on Architectural Suppor...
work page 2023
-
[31]
Testing database systems via differential query execution
Jiansen Song, Wensheng Dou, Ziyu Cui, Qianwang Dai, Wei Wang, Jun Wei, Hua Zhong, and Tao Huang. Testing database systems via differential query execution. In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE), 2023
work page 2023
-
[32]
Memsnap µcheckpoints: A data single level store for fearless persistence
Emil Tsalapatis, Ryan Hancock, Rakeeb Hossain, and Ali José Mashtizadeh. Memsnap µcheckpoints: A data single level store for fearless persistence. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors,Proceedings of the 29th ACM Inter- national Conference on Architectural Support for Pro- gramming Languages and Operating Systems,...
work page 2024
-
[33]
Website. Tpc-ds benchmark. https://www.tpc.org/ tpcds/, 1988. Accessed: 2022-11-15
work page 1988
-
[34]
American fuzzy lop (afl) fuzzer
Website. American fuzzy lop (afl) fuzzer. http://lcamtuf.coredump.cx/afl/technical_ details.txt, 2013. Accessed: 2022-11-15
work page 2013
- [35]
-
[36]
Cockroachdb code coverage measurement
Website. Cockroachdb code coverage measurement. https://cockroachlabs.atlassian.net/wiki/ spaces/CRDB/pages/73171260/Code+coverage,
-
[37]
Accessed: 2025-05-15
work page 2025
-
[38]
Website. Database systems ranking. https: //db-engines.com/en/ranking, 2022. Accessed: 2025-05-15
work page 2022
-
[39]
Performance on widely used work- loads
Website. Performance on widely used work- loads. https://www.postgresql.org/ message-id/CAMkU%3D1xoU06eW4CrEZyDDn% 2BfnJaCe3b04rE3mdVu4Gsxmj9KFA%40mail.gmail. com, 2025. Accessed: 2025-11-15
work page 2025
-
[40]
Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. Q100: the architecture and design of a database processing unit. In Rajeev Balasubramonian, Al Davis, and Sarita V . Adve, editors, Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, Salt Lake City, UT, USA, March 1-5, 2014, pages 255–268. ACM, 2014
work page 2014
-
[41]
Puppy: Finding performance degradation bugs in dbmss via limited-optimization plan construc- tion
Zhiyong Wu, Jie Liang, Jingzhou Fu, Mingzhe Wang, and Yu Jiang. Puppy: Finding performance degradation bugs in dbmss via limited-optimization plan construc- tion. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 560–571. IEEE Computer Society, 2024. 14
work page 2024
-
[42]
SQUIRREL: testing database management systems with language validity and coverage feedback
Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Dinghao Wu. SQUIRREL: testing database management systems with language validity and coverage feedback. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna, editors,CCS ’20: 2020 ACM SIGSAC Conference on Computer and Com- munications Security, Virtual Event, USA, November 9-13, 202...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.