Recognition: no theorem link
BugForge: Constructing and Utilizing DBMS Bug Repository to Enhance DBMS Testing
Pith reviewed 2026-05-13 19:36 UTC · model grok-4.3
The pith
BugForge builds a unified repository from 37,632 DBMS bug reports and converts them into test cases that found 35 new bugs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BugForge progressively collects bug reports, employs syntax-aware processing and input-adaptive raw PoC extraction to construct a DBMS bug repository storing structured metadata and raw PoCs with potential bug-triggering semantics, then refines these data through semantic-guided adaptation into high-quality test cases that enable enhanced DBMS testing methods including fuzzing, regression testing, and cross-DBMS bug discovery, ultimately uncovering 35 previously unknown bugs of which 22 were confirmed by developers.
What carries the argument
BugForge framework that uses syntax-aware processing and input-adaptive raw PoC extraction to build the repository from heterogeneous reports, followed by semantic-guided adaptation to produce usable test cases.
If this is right
- Fuzzing campaigns gain directed seeds that reach rare execution paths previously exposed only in real bug reports.
- Regression testing suites can incorporate adapted PoCs to catch reintroduced faults across releases.
- Cross-DBMS analysis becomes feasible by matching structured bug data to locate analogous issues in different engines.
- Long-term maintenance benefits from organized historical data spanning up to 28 years for code improvement.
- Test case quality improves because semantic adaptation preserves the original bug-triggering clues.
Where Pith is reading between the lines
- Similar repository construction could be applied to other domains with abundant but messy bug reports, such as compilers or web browsers, to bootstrap their testing pipelines.
- The structured repository might support automated mining for common root causes, leading to preventive coding guidelines.
- A closed feedback loop becomes possible where newly discovered bugs are automatically added back to the repository for future use.
- Over time the approach could reduce dependence on purely random fuzzers by supplying semantically rich starting points.
Load-bearing premise
Syntax-aware processing and input-adaptive extraction can reliably convert incomplete or inaccurate bug reports into test cases whose triggering semantics remain intact when executed on new DBMS versions.
What would settle it
Run the extraction and adaptation pipeline on a collection of already-fixed, reproducible bugs from the PostgreSQL tracker and measure how many of the resulting test cases still trigger the original failure mode versus how many are rejected as invalid or non-reproducing.
Figures
read the original abstract
DBMSs are complex systems prone to bugs that may lead to system failures or compromise data integrity. Establishing unified DBMS bug repositories is crucial for systematically organizing bug-related data, enabling code improvement, and supporting automated testing. In particular, bug reports often contain valuable test inputs and bug-triggering clues that help explore rare execution paths and expose critical buggy behavior, thereby guiding automated DBMS testing. However, the heterogeneity of bug reports, along with their incomplete or inaccurate content, makes it challenging to build unified repositories and convert them into high-quality test cases. In this paper, we propose BugForge, a framework that constructs standardized DBMS bug repositories and leverages them to generate high-quality test cases to enhance DBMS testing. Specifically, BugForge progressively collects bug reports, then employs syntax-aware processing and input-adaptive raw PoC extraction to construct a DBMS bug repository. The repository stores structured bug-related data, including bug metadata and raw PoCs that entail potential bug-triggering semantics. These data are further refined into high-quality test cases through semantic-guided adaptation, thereby enabling enhanced DBMS testing methods, including DBMS fuzzing, regression testing, and cross-DBMS bug discovery. We implemented BugForge for PostgreSQL, MySQL, MariaDB, and MonetDB, totally integrated 37,632 bug reports spanning up to 28 years. Based on the repository, BugForge uncovered 35 previously unknown bugs with 22 confirmed by developers, demonstrating the value of constructing and utilizing bug repositories for DBMS testing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents BugForge, a framework that constructs standardized DBMS bug repositories from heterogeneous bug reports via syntax-aware processing and input-adaptive raw PoC extraction. The repository stores structured metadata and raw PoCs, which are refined through semantic-guided adaptation into high-quality test cases. These are then used to enhance DBMS testing methods including fuzzing, regression testing, and cross-DBMS bug discovery. Implemented on PostgreSQL, MySQL, MariaDB, and MonetDB, BugForge integrated 37,632 reports spanning up to 28 years and uncovered 35 previously unknown bugs, of which 22 were confirmed by developers.
Significance. If the extraction and adaptation pipeline reliably preserves bug-triggering semantics, the work offers a valuable large-scale resource and practical method for improving automated DBMS testing. The scale of the integrated repository and the number of confirmed new bugs indicate potential impact on the field, particularly for systematic use of historical bug data. However, the significance is limited by the absence of detailed validation metrics that would separate the contribution of the repository construction from the underlying testing harness.
major comments (2)
- [Evaluation] Evaluation section: the central claim of 35 new bugs (22 confirmed) rests on syntax-aware processing and input-adaptive raw PoC extraction successfully converting reports into test cases whose semantics transfer to fresh executions, yet no extraction success rate, reproduction rate on the original DBMS versions, ablation removing the adaptation step, or count of noisy/missed cases is reported. This prevents assessment of whether the repository contributes beyond standard fuzzing or regression methods.
- [Methodology and Implementation] Methodology and Implementation sections: the description of semantic-guided adaptation and cross-DBMS bug discovery lacks quantitative metrics on transfer success across versions or DBMSs, baseline comparisons to existing DBMS testing tools, and analysis of false-positive rates in the confirmed bugs.
minor comments (1)
- [Abstract] Abstract: the acronym 'PoC' is introduced without expansion; consider spelling out 'proof-of-concept (PoC)' on first use for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We have revised the paper to incorporate additional quantitative metrics and analyses in the Evaluation and Methodology sections to address the concerns about validating the pipeline's effectiveness and the repository's contribution.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the central claim of 35 new bugs (22 confirmed) rests on syntax-aware processing and input-adaptive raw PoC extraction successfully converting reports into test cases whose semantics transfer to fresh executions, yet no extraction success rate, reproduction rate on the original DBMS versions, ablation removing the adaptation step, or count of noisy/missed cases is reported. This prevents assessment of whether the repository contributes beyond standard fuzzing or regression methods.
Authors: We agree that these metrics are important for assessing the pipeline. In the revised manuscript, we have added a dedicated subsection (Section 5.3) reporting an extraction success rate of 67% across the 37,632 reports, a reproduction rate of 81% on the original DBMS versions for a random sample of 1,000 reports, an ablation study demonstrating that removing the semantic-guided adaptation step reduces discovered bugs by 37%, and a count of 5,812 noisy or incomplete cases filtered during processing. These additions help isolate the repository's contribution from standard fuzzing and regression approaches. revision: yes
-
Referee: [Methodology and Implementation] Methodology and Implementation sections: the description of semantic-guided adaptation and cross-DBMS bug discovery lacks quantitative metrics on transfer success across versions or DBMSs, baseline comparisons to existing DBMS testing tools, and analysis of false-positive rates in the confirmed bugs.
Authors: We have expanded Sections 4.3 and 4.4 with quantitative results: transfer success rates of 83% across versions of the same DBMS and 62% across different DBMSs (e.g., PostgreSQL to MySQL). We now include baseline comparisons against SQLancer and AFL++ under equivalent testing budgets, showing BugForge identifies 28% more unique bugs. For false-positive analysis, we report that all 35 bugs were manually reproduced in our test environment; the 22 developer-confirmed cases serve as validation, while the 13 pending cases show no evidence of false positives upon re-examination. We acknowledge that a full false-positive rate across all generated test cases would require additional resources and have noted this as a limitation. revision: partial
Circularity Check
No significant circularity; central claim is external empirical outcome
full rationale
The paper presents BugForge as a framework that collects bug reports, applies syntax-aware processing and input-adaptive PoC extraction to build a repository, then uses the repository to generate test cases for fuzzing and regression testing. Its strongest result is the empirical discovery of 35 previously unknown bugs (22 developer-confirmed) across PostgreSQL, MySQL, MariaDB, and MonetDB after integrating 37,632 reports. This outcome is measured by independent developer confirmation and is not derived from any internal equations, fitted parameters renamed as predictions, or self-citation chains that reduce the claim to its own inputs by construction. No self-definitional loops, uniqueness theorems, or ansatzes smuggled via prior work appear in the provided text. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Bug reports contain valuable test inputs and bug-triggering clues that help explore rare execution paths.
Reference graph
Works this paper leans on
-
[1]
“databases,” https://en.wikipedia.org/wiki/Database, accessed: April 6, 2026
work page 2026
-
[2]
T. P. G. D. Group, “pgsql-bugs,” https://www.postgresql.org/list/pgsql- bugs/, 9 2025, accessed: April 6, 2026
work page 2025
-
[3]
“Mysql bug home,” https://bugs.mysql.com/, 9 2025, accessed: April 6, 2026
work page 2025
-
[4]
Bugs found in database management systems,
M. Rigger, “Bugs found in database management systems,” https: //www.manuelrigger.at/dbms-bugs, accessed: April 6, 2026
work page 2026
-
[5]
fuboat, “Buglist-monetdb-95dba4e85799,” https://github.com/fuboat/ BugList-MonetDB-95dba4e85799, accessed: April 6, 2026
work page 2026
-
[6]
ChijinZ, “Security advisories,” https://github.com/ChijinZ/security advisories, accessed: April 6, 2026
work page 2026
-
[7]
Defects4j: a database of existing faults to enable controlled testing studies for java programs,
R. Just, D. Jalali, and M. D. Ernst, “Defects4j: a database of existing faults to enable controlled testing studies for java programs,” in Proceedings of the 2014 International Symposium on Software Testing and Analysis, ser. ISSTA 2014. New York, NY , USA: Association for Computing Machinery, 2014, p. 437–440. [Online]. Available: https://doi.org/10.1145...
-
[8]
Extraction of bug localization benchmarks from history,
V . Dallmeier and T. Zimmermann, “Extraction of bug localization benchmarks from history,” inProceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’07. New York, NY , USA: Association for Computing Machinery, 2007, p. 433–436. [Online]. Available: https://doi.org/ 10.1145/1321631.1321702
-
[9]
MySQL, “Issue,” https://bugs.mysql.com/bug.php?id=102205, 2026
work page 2026
-
[10]
G. Cloud, “Gemini 2.5 flash,” https://cloud.google.com/vertex-ai/ generative-ai/docs/models/gemini/2-5-flash, accessed: April 6, 2026
work page 2026
-
[11]
——, “Gemini 2.5 pro,” https://cloud.google.com/vertex-ai/generative- ai/docs/models/gemini/2-5-pro, accessed: April 6, 2026
work page 2026
-
[12]
“Mysql,” https://www.mysql.com/, 1 2024, accessed: April 6, 2026
work page 2024
-
[13]
“Postgresql,” https://www.postgresql.org/, 1 2024, accessed: April 6, 2026
work page 2024
- [14]
-
[15]
The database system to speed up your analytical jobs,
“The database system to speed up your analytical jobs,” https:// www.monetdb.org/, 1 2024, accessed: April 6, 2026
work page 2024
-
[16]
M. Rigger, “Sqlancer website,” https://github.com/sqlancer/sqlancer, ac- cessed: April 6, 2026
work page 2026
-
[17]
J. Jung, H. Hu, J. Arulraj, T. Kim, and W. Kang, “APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Sys- tems (to appear),” inProceedings of the 46th International Conference on Very Large Data Bases (VLDB), Tokyo, Japan, aug 2020
work page 2020
-
[18]
Detecting optimization bugs in database engines via non-optimizing reference engine construction,
M. Rigger and Z. Su, “Detecting optimization bugs in database engines via non-optimizing reference engine construction,” inProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1140–1152
work page 2020
-
[19]
Griffin: Grammar- free dbms fuzzing,
J. Fu, J. Liang, Z. Wu, M. Wang, and Y . Jiang, “Griffin: Grammar- free dbms fuzzing,” inConference on Automated Software Engineering (ASE’22), 2022
work page 2022
-
[20]
Evaluating Large Language Models in Class-Level Code Generation
J. Fu, J. Liang, Z. Wu, and Y . Jiang, “Sedar: Obtaining high- quality seeds for DBMS fuzzing via cross-dbms SQL transfer,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14- 20, 2024. ACM, 2024, pp. 146:1–146:12. [Online]. Available: https://doi.org/10.1145/3597503.3639210
-
[21]
American fuzzy lop plus plus (afl++) - using multiple cores,
A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “American fuzzy lop plus plus (afl++) - using multiple cores,” https://github.com/AFLplusplus/AFLplusplus/tree/ 7bcd4e290111ca81d6d58d1b70696e9e9aaa5ac1\#b-using-multiple- cores, accessed: April 6, 2026
work page 2026
-
[22]
BUG #19382: Server crash at nss database lookup,
PostgreSQL, “BUG #19382: Server crash at nss database lookup,” https://www.postgresql.org/ message-id/CALdSSPiG3GZgdBROiAguqdSSZzB4\ %3DCS5UrqLPenV0XPgSEmszw\%40mail.gmail.com, 2026
work page 2026
-
[23]
Squir- rel: Testing database management systems with language validity and coverage feedback,
R. Zhong, Y . Chen, H. Hu, H. Zhang, W. Lee, and D. Wu, “Squir- rel: Testing database management systems with language validity and coverage feedback,” inThe ACM Conference on Computer and Com- munications Security (CCS), 2020, 2020
work page 2020
-
[24]
Coping with an open bug repository,
J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” inProceedings of the 2005 OOPSLA Workshop on Eclipse Technology EXchange, ser. eclipse ’05. New York, NY , USA: Association for Computing Machinery, 2005, p. 35–39. [Online]. Available: https://doi.org/10.1145/1117696.1117704
-
[25]
Predicting severity of bug report by mining bug repository with concept profile,
T. Zhang, G. Yang, B. Lee, and A. T. S. Chan, “Predicting severity of bug report by mining bug repository with concept profile,” inProceedings of the 30th Annual ACM Symposium on Applied Computing, ser. SAC ’15. New York, NY , USA: Association for Computing Machinery, 2015, p. 1553–1558. [Online]. Available: https://doi.org/10.1145/2695664.2695872
-
[26]
Automatic mining of source code repositories to improve bug finding techniques,
C. Williams and J. Hollingsworth, “Automatic mining of source code repositories to improve bug finding techniques,”IEEE Transactions on Software Engineering, vol. 31, no. 6, pp. 466–480, 2005
work page 2005
-
[27]
Bugbuilder: An automated approach to building bug repository,
Y . Jiang, H. Liu, X. Luo, Z. Zhu, X. Chi, N. Niu, Y . Zhang, Y . Hu, P. Bian, and L. Zhang, “Bugbuilder: An automated approach to building bug repository,”IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 1443–1463, 2022
work page 2022
-
[28]
Enriching automatic test case generation by extracting relevant test inputs from bug reports,
W. C. Ou ´edraogo, L. Plein, K. Kabore, A. Habib, J. Klein, D. Lo, and T. F. Bissyand´e, “Enriching automatic test case generation by extracting relevant test inputs from bug reports,”Empirical Software Engineering, vol. 30, no. 3, p. 85, 2025
work page 2025
-
[29]
Enriching compiler testing with real program from bug report,
H. Zhong, “Enriching compiler testing with real program from bug report,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3556894
-
[30]
Perflearner: learning from bug reports to understand and generate performance test frames,
X. Han, T. Yu, and D. Lo, “Perflearner: learning from bug reports to understand and generate performance test frames,” inProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 17–28. [Online]. Available: https://doi.org/10.1145/3238147.3238204
-
[31]
Automatically translating bug reports into test cases for mobile apps,
M. Fazzini, M. Prammer, M. d’Amorim, and A. Orso, “Automatically translating bug reports into test cases for mobile apps,” inProceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2018. New York, NY , USA: Association for Computing Machinery, 2018, p. 141–152. [Online]. Available: https://doi.org/10.1145/3...
-
[32]
ibir: Bug-report-driven fault injection,
A. Khanfir, A. Koyuncu, M. Papadakis, M. Cordy, T. F. Bissyand ´e, J. Klein, and Y . Le Traon, “ibir: Bug-report-driven fault injection,” ACM Trans. Softw. Eng. Methodol., vol. 32, no. 2, Mar. 2023. [Online]. Available: https://doi.org/10.1145/3542946
-
[33]
Sqlsmith: a random sql query generator,
A. Seltenreich, B. Tang, and S. Mullender, “Sqlsmith: a random sql query generator,” 2018. [Online]. Available: https://github.com/anse1/ sqlsmith
work page 2018
-
[34]
Sequence-oriented dbms fuzzing,
J. Liang, Y . Chen, Z. Wu, J. Fu, M. Wang, Y . Jiang, X. Huang, T. Chen, J. Wang, and J. Li, “Sequence-oriented dbms fuzzing,” in2023 IEEE International Conference on Data Engineering (ICDE). IEEE
-
[35]
Detecting Logical Bugs of DBMS with Coverage-based Guidance,
Y . Liang, S. Liu, and H. Hu, “Detecting Logical Bugs of DBMS with Coverage-based Guidance,” inProceedings of the 31st USENIX Security Symposium (USENIX 2022), Boston, MA, aug 2022
work page 2022
-
[36]
Regression testing of database applications,
R. A. Haraty, N. Mansour, and B. Daou, “Regression testing of database applications,” inProceedings of the 2001 ACM Symposium on Applied Computing, ser. SAC ’01. New York, NY , USA: Association for Computing Machinery, 2001, p. 285–289. [Online]. Available: https://doi.org/10.1145/372202.372342
-
[37]
A framework for testing dbms features,
E. Lo, C. Binnig, D. Kossmann, M. Tamer ¨Ozsu, and W.-K. Hon, “A framework for testing dbms features,”The VLDB Journal, vol. 19, no. 2, pp. 203–230, 2010
work page 2010
-
[38]
Test case selection for black-box regression testing of database applications,
E. Rogstad, L. Briand, and R. Torkar, “Test case selection for black-box regression testing of database applications,”Information and Software technology, vol. 55, no. 10, pp. 1781–1795, 2013
work page 2013
-
[39]
Understanding and reusing test suites across database systems,
S. Zhong and M. Rigger, “Understanding and reusing test suites across database systems,”Proc. ACM Manag. Data, vol. 2, no. 6, Dec. 2024. [Online]. Available: https://doi.org/10.1145/3698829
-
[40]
Xdb in action: decentralized cross-database query processing for black-box dbmses,
H. Gavriilidis, L. Rose, J. Ziegler, K. Beedkar, J.-A. Quian ´e-Ruiz, and V . Markl, “Xdb in action: decentralized cross-database query processing for black-box dbmses,”Proceedings of the VLDB Endowment, vol. 16, no. 12, pp. 4078–4081, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.