AgenticDB: Agentic Performance Reconfiguration for Database Workloads
Pith reviewed 2026-06-29 04:54 UTC · model grok-4.3
The pith
AgenticDB turns database tuning into a self-refining process by letting an agent propose safe DBMS and OS changes guided by runtime feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgenticDB implements a context-grounded harness that interacts with the target database environment by proposing DBMS- and OS-level changes, applying them under safety constraints, observing workload performance and runtime states, and using execution feedback to guide subsequent decisions, turning database tuning into a self-refining reconfiguration process.
What carries the argument
The context-grounded harness, which proposes changes, enforces safety constraints during application, records runtime states, and incorporates feedback into planning.
If this is right
- AgenticDB reaches the best final performance on every evaluated workload.
- It improves over the strongest baseline by 118.1 percent on average.
- It reduces aggregate time-to-best by 22.6 percent.
- Its OS-level action space, robust execution lifecycle, and memory-enhanced planning each contribute to the gains.
- Experience accumulated within and across tasks improves later reconfiguration decisions.
Where Pith is reading between the lines
- The same harness structure could be reused on other relational or NoSQL systems that expose runtime metrics.
- Over repeated deployments the accumulated experience might reduce the need for per-workload expert intervention.
- Adding explicit cost models for change application time could further shorten the observed time-to-best.
- The approach may extend naturally to multi-tenant or cloud-managed databases where OS-level knobs are partially exposed.
Load-bearing premise
The harness can reliably diagnose bottlenecks from runtime states and enforce safety constraints to prevent unsafe actions without creating new failure modes.
What would settle it
On the same MySQL and PostgreSQL instances with YCSB, Sysbench, and TPC-H workloads, AgenticDB produces final performance no better than the strongest baseline or triggers unsafe configuration changes that the harness was supposed to block.
Figures
read the original abstract
Database configuration tuning is critical for workload performance, but practical tuning on real deployments remains difficult. Existing automatic tuners mostly formulate tuning as iterative search over DBMS knob values. This formulation often incurs high execution cost, depends on predefined DBMS-only search spaces, and provides limited support for using runtime feedback to diagnose bottlenecks and safely apply configuration changes on real servers. To address these limitations, we propose AgenticDB, an agentic framework for database workload reconfiguration. AgenticDB implements a context-grounded harness that interacts with the target database environment by proposing DBMS- and OS-level changes, applying them under safety constraints, observing workload performance and runtime states, and using execution feedback to guide subsequent decisions. This runtime interaction enables AgenticDB to diagnose bottlenecks, explore a broader DBMS- and OS-level reconfiguration space, avoid unsafe or unsupported actions, and accumulate experience within and across reconfiguration tasks. As a result, AgenticDB turns database tuning into a self-refining reconfiguration process in which runtime feedback iteratively improves later decisions. We conduct extensive experiments on MySQL and PostgreSQL using YCSB, Sysbench, and TPC-H workloads. The results show that AgenticDB achieves the best final performance on all evaluated workloads, improving over the strongest baseline by 118.1% on average and reducing aggregate time-to-best by 22.6%. The results also demonstrate that its OS-level action space, robust execution lifecycle, and memory-enhanced planning contribute to more effective and practical database reconfiguration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AgenticDB, an agentic framework for database workload reconfiguration. It implements a context-grounded harness that proposes DBMS- and OS-level configuration changes, applies them under safety constraints, observes runtime performance and states, and uses execution feedback to guide subsequent decisions. Experiments on MySQL and PostgreSQL with YCSB, Sysbench, and TPC-H workloads are reported to show that AgenticDB achieves the best final performance on all workloads, improving over the strongest baseline by 118.1% on average and reducing aggregate time-to-best by 22.6%, with contributions from the OS action space, robust lifecycle, and memory-enhanced planning.
Significance. If the empirical claims hold with verifiable experimental support, the work could advance practical database tuning by expanding the reconfiguration space to OS-level actions and incorporating runtime feedback in a self-refining agentic loop, addressing limitations of traditional knob-search approaches.
major comments (2)
- [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation section: the central claim of 118.1% average improvement and 22.6% time-to-best reduction is presented without any description of baselines, workload parameters, number of trials, statistical significance tests, or error bars, rendering the performance numbers unverifiable from the manuscript text.
- [Framework Description] Framework section (interaction loop description): the assertion that the context-grounded harness can reliably diagnose bottlenecks and enforce safety constraints without new failure modes is stated at a high level with no concrete examples, pseudocode, or failure-mode analysis, which is load-bearing for the practical applicability claim.
minor comments (1)
- The abstract refers to 'extensive experiments' but supplies no table or figure references for the reported aggregate metrics.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and verifiability of the claims.
read point-by-point responses
-
Referee: [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation section: the central claim of 118.1% average improvement and 22.6% time-to-best reduction is presented without any description of baselines, workload parameters, number of trials, statistical significance tests, or error bars, rendering the performance numbers unverifiable from the manuscript text.
Authors: We agree that the abstract presents the aggregate performance numbers at a high level. While the Experimental Evaluation section provides descriptions of the baselines, workloads (YCSB, Sysbench, TPC-H), and MySQL/PostgreSQL setups, we acknowledge that explicit details on the number of trials, statistical significance tests, and error bars are not sufficiently highlighted. To address this, we will revise the abstract to briefly note the baselines and experimental conditions, and expand the Experimental Evaluation section to include the number of independent trials, statistical tests, and error bars on the reported improvements. revision: yes
-
Referee: [Framework Description] Framework section (interaction loop description): the assertion that the context-grounded harness can reliably diagnose bottlenecks and enforce safety constraints without new failure modes is stated at a high level with no concrete examples, pseudocode, or failure-mode analysis, which is load-bearing for the practical applicability claim.
Authors: The Framework section describes the context-grounded harness, the proposal-application-observation loop, safety constraints, and use of runtime feedback for bottleneck diagnosis. We agree that the description remains high-level and would benefit from additional support. In revision, we will add pseudocode for the harness decision process, concrete examples of bottleneck diagnosis from observed states (e.g., CPU, I/O metrics), and a dedicated discussion of potential new failure modes with mitigation via the safety constraints and rollback mechanisms. revision: yes
Circularity Check
No significant circularity; empirical results only
full rationale
The paper describes an agentic framework for database reconfiguration and reports performance improvements from direct experiments on MySQL/PostgreSQL with YCSB/Sysbench/TPC-H workloads. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the provided text. The central claims rest on experimental outcomes rather than any derivation that reduces to its own definitions or prior fitted values by construction. This is the expected non-finding for a purely empirical systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Anthropic. 2026. Introducing Claude Opus 4.7. https://www.anthropic.com/ne ws/claude-opus-4-7
2026
-
[2]
Baoqing Cai, Yu Liu, Ce Zhang, Guangyu Zhang, Ke Zhou, Li Liu, Chunhua Li, Bin Cheng, Jie Yang, and Jiashu Xing. 2022. HUNTER: an online cloud database hybrid tuning system for personalized requirements. InProceedings of the 2022 International Conference on Management of Data. 646–659
2022
-
[3]
Ben Cane. 2017. Improving Linux System Performance with I/O Scheduler Tuning. https://www.cloudbees.com/blog/linux-io-scheduler-tuning
2017
-
[4]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM symposium on Cloud computing. 143–154
2010
-
[5]
Biplob K Debnath, David J Lilja, and Mohamed F Mokbel. 2008. SARD: A statistical approach for ranking database tuning parameters. In2008 IEEE 24th International Conference on Data Engineering Workshop. IEEE, 11–18
2008
-
[6]
2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
DeepSeek-AI. 2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence. Technical Report. DeepSeek-AI. https://huggingface.co/deepseek- ai/DeepSeek-V4-Pro/resolve/main/DeepSeek_V4.pdf
2026
-
[7]
Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning database configuration parameters with ituned.Proc. VLDB Endow.2, 1 (2009), 1246–1257
2009
-
[8]
Victor Giannakouris and Immanuel Trummer. 2025. 𝜆-tune: Harnessing large language models for automated database system tuning.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26
2025
- [9]
-
[10]
Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential model- based optimization for general algorithm configuration. InInternational confer- ence on learning and intelligent optimization. Springer, 507–523
2011
-
[11]
Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman
-
[12]
In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)
Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)
- [13]
-
[14]
2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool
Alexey Kopytov. 2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool. MySQL AB. https://imysql.com/wp-content/uploads/2014/10/ sysbench-manual.pdf
2009
-
[15]
Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2025. Gptuner: An llm-based database tuning system.ACM SIGMOD Record54, 1 (2025), 101–110
2025
-
[16]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning.Proceedings of the VLDB Endowment12, 12 (2019), 2118–2130
2019
-
[17]
Yiyan Li, Haoyang Li, Jing Zhang, Renata Borovica-Gajic, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. 2025. AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Proceedings of the ACM on Management of Data3, 6 (2025), 1–29
2025
-
[18]
Moonshot AI. 2026. Kimi K2.6 API Guide. https://platform.kimi.ai/docs/guide/ki mi-k2-6-quickstart
2026
-
[19]
OpenAI. 2026. Introducing GPT-5.5. https://openai.com/index/introducing-gpt- 5-5/
2026
-
[20]
Oracle. 2020. Oracle Database 2 Day + Performance Tuning Guide, 19c. https: //docs.oracle.com/en/database/oracle/oracle-database/19/tdppt/
2020
-
[21]
Oracle. 2022. Oracle Database Performance Tuning Guide, 19c. https://docs.ora cle.com/en/database/oracle/oracle-database/19/tgdba/
2022
-
[22]
2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online
Oracle. 2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online. Oracle. https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool- resize.html
2026
-
[23]
Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023)
2023
-
[24]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22
2023
-
[25]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self- Driving Database Management Systems.. InCIDR, Vol. 4. 1
2017
-
[26]
2026.PostgreSQL 18 Documentation: Monitoring Database Activity
PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Monitoring Database Activity. PostgreSQL Global Development Group. https: //www.postgresql.org/docs/18/monitoring.html
2026
-
[27]
2026.PostgreSQL 18 Documentation: Resource Consumption
PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Resource Consumption. PostgreSQL Global Development Group. https://www. postgresql.org/docs/18/runtime-config-resource.html
2026
-
[28]
Red Hat. 2020. Red Hat Enterprise Linux 6 Performance Tuning Guide: Tuning Virtual Memory. https://docs.redhat.com/en/documentation/red_hat_enterpris e_linux/6/html/performance_tuning_guide/s-memory-tunables
2020
-
[29]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems36 (2023), 68539–68551
2023
-
[30]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems36 (2023), 8634–8652
2023
-
[31]
2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1
Transaction Processing Performance Council. 2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1. Transaction Processing Performance Council. https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC- H_v3.0.1.pdf
2022
-
[32]
Reads the Manual
Immanuel Trummer. 2022. DB-BERT: a Database Tuning Tool that" Reads the Manual". InProceedings of the 2022 international conference on management of data. 190–203
2022
-
[33]
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM international conference on management of data. 1009–1024
2017
-
[34]
Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Bilien, and Andrew Pavlo. 2021. An inquiry into machine learning- based automatic configuration tuning services on real-world database manage- ment systems.Proceedings of the VLDB Endowment14, 7 (2021), 1241–1253
2021
-
[35]
Morreale
Rik van Riel and Peter W. Morreale. 2008. Documentation for /proc/sys/vm. https://docs.kernel.org/admin-guide/sysctl/vm.html
2008
-
[36]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling
2024
-
[38]
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. 2025. The rise and potential of large language model based agents: A survey.Science China Information Sciences 68, 2 (2025), 121101
2025
-
[39]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[40]
Z.AI. 2026. GLM-5.1 Overview. https://docs.z.ai/guides/llm/glm-5.1
2026
-
[41]
Yueyang Zhan, Rui Xi, Jianming Liao, Shuhuan Fan, and Mengshu Hou. 2024. KnobTune: A dynamic database configuration tuning strategy leveraging his- torical workload similarities. InProceedings of the International Conference on Computing, Machine Learning and Data Science. 1–8
2024
-
[42]
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 international conference on management of data. 415–432
2019
- [43]
- [44]
-
[45]
Xinyi Zhang, Zhuo Chang, Hong Wu, Yang Li, Jia Chen, Jian Tan, Feifei Li, and Bin Cui. 2023. A unified and efficient coordinating framework for autonomous DBMS tuning.Proceedings of the ACM on Management of Data1, 2 (2023), 1–26
2023
-
[46]
Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta- learning for cloud databases. InProceedings of the 2021 international conference on management of data. 2102–2114. SIGMOD/PODS ’27, June, 2027, California, USA Yang et al
2021
-
[47]
Xinyi Zhang, Hong Wu, Yang Li, Jian Tan, Feifei Li, and Bin Cui. 2022. Towards dynamic and safe configuration tuning for cloud databases. InProceedings of the 2022 International Conference on Management of Data. 631–645
2022
-
[48]
Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems 43, 6 (2025), 1–47
2025
-
[49]
Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. Expel: Llm agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19632–19642
2024
-
[50]
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731
2024
-
[51]
Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the perfor- mance potential of systems via automatic configuration tuning. InProceedings of the 2017 symposium on cloud computing. 338–350
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.