pith. sign in

arxiv: 2606.20318 · v2 · pith:DGS4IME6new · submitted 2026-06-18 · 💻 cs.DB

AgenticDB: Agentic Performance Reconfiguration for Database Workloads

Pith reviewed 2026-06-29 04:54 UTC · model grok-4.3

classification 💻 cs.DB
keywords database configuration tuningagentic frameworkruntime feedbackworkload reconfigurationMySQLPostgreSQLOS-level actionsperformance optimization
0
0 comments X

The pith

AgenticDB turns database tuning into a self-refining process by letting an agent propose safe DBMS and OS changes guided by runtime feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that existing automatic database tuners are limited by high execution costs, narrow DBMS-only search spaces, and weak use of runtime states for diagnosis or safety. AgenticDB addresses this with a context-grounded harness that proposes changes at both DBMS and OS levels, applies them under constraints, observes performance and states, and feeds results back into planning. This interaction lets the system diagnose bottlenecks, avoid unsafe actions, and build experience across tasks. If the claim holds, tuning on real servers becomes more practical because feedback iteratively improves decisions instead of relying on blind search. The experiments on MySQL and PostgreSQL with standard workloads support this by showing consistent gains over prior methods.

Core claim

AgenticDB implements a context-grounded harness that interacts with the target database environment by proposing DBMS- and OS-level changes, applying them under safety constraints, observing workload performance and runtime states, and using execution feedback to guide subsequent decisions, turning database tuning into a self-refining reconfiguration process.

What carries the argument

The context-grounded harness, which proposes changes, enforces safety constraints during application, records runtime states, and incorporates feedback into planning.

If this is right

  • AgenticDB reaches the best final performance on every evaluated workload.
  • It improves over the strongest baseline by 118.1 percent on average.
  • It reduces aggregate time-to-best by 22.6 percent.
  • Its OS-level action space, robust execution lifecycle, and memory-enhanced planning each contribute to the gains.
  • Experience accumulated within and across tasks improves later reconfiguration decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same harness structure could be reused on other relational or NoSQL systems that expose runtime metrics.
  • Over repeated deployments the accumulated experience might reduce the need for per-workload expert intervention.
  • Adding explicit cost models for change application time could further shorten the observed time-to-best.
  • The approach may extend naturally to multi-tenant or cloud-managed databases where OS-level knobs are partially exposed.

Load-bearing premise

The harness can reliably diagnose bottlenecks from runtime states and enforce safety constraints to prevent unsafe actions without creating new failure modes.

What would settle it

On the same MySQL and PostgreSQL instances with YCSB, Sysbench, and TPC-H workloads, AgenticDB produces final performance no better than the strongest baseline or triggers unsafe configuration changes that the harness was supposed to block.

Figures

Figures reproduced from arXiv: 2606.20318 by Chaozheng Wang, Chen Zheng, Heng Zhang, Xinyue Yang, Yanjun Wu.

Figure 1
Figure 1. Figure 1: Overview of the AgenticDB framework [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall reconfiguration progression of AgenticDB and baseline methods across MySQL and PostgreSQL workloads. For YCSB and Sysbench, each workload reports the objective, TPS, and P95 latency; TPC-H reports execution time [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reconfiguration progression of AgenticDB with different LLM backends on representative MySQL workloads. 6.4 RQ4: Effects of LLM Backends and Memory RQ4 evaluates how two design choices affect AgenticDB’s recon￾figuration behavior: the LLM backend and the memory mecha￾nism. We first compare multiple LLM backends under the same AgenticDB Harness to assess their impact on planning quality, end-to-end interact… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of experience memory on representative [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative reconfiguration traces showing how [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Database configuration tuning is critical for workload performance, but practical tuning on real deployments remains difficult. Existing automatic tuners mostly formulate tuning as iterative search over DBMS knob values. This formulation often incurs high execution cost, depends on predefined DBMS-only search spaces, and provides limited support for using runtime feedback to diagnose bottlenecks and safely apply configuration changes on real servers. To address these limitations, we propose AgenticDB, an agentic framework for database workload reconfiguration. AgenticDB implements a context-grounded harness that interacts with the target database environment by proposing DBMS- and OS-level changes, applying them under safety constraints, observing workload performance and runtime states, and using execution feedback to guide subsequent decisions. This runtime interaction enables AgenticDB to diagnose bottlenecks, explore a broader DBMS- and OS-level reconfiguration space, avoid unsafe or unsupported actions, and accumulate experience within and across reconfiguration tasks. As a result, AgenticDB turns database tuning into a self-refining reconfiguration process in which runtime feedback iteratively improves later decisions. We conduct extensive experiments on MySQL and PostgreSQL using YCSB, Sysbench, and TPC-H workloads. The results show that AgenticDB achieves the best final performance on all evaluated workloads, improving over the strongest baseline by 118.1% on average and reducing aggregate time-to-best by 22.6%. The results also demonstrate that its OS-level action space, robust execution lifecycle, and memory-enhanced planning contribute to more effective and practical database reconfiguration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes AgenticDB, an agentic framework for database workload reconfiguration. It implements a context-grounded harness that proposes DBMS- and OS-level configuration changes, applies them under safety constraints, observes runtime performance and states, and uses execution feedback to guide subsequent decisions. Experiments on MySQL and PostgreSQL with YCSB, Sysbench, and TPC-H workloads are reported to show that AgenticDB achieves the best final performance on all workloads, improving over the strongest baseline by 118.1% on average and reducing aggregate time-to-best by 22.6%, with contributions from the OS action space, robust lifecycle, and memory-enhanced planning.

Significance. If the empirical claims hold with verifiable experimental support, the work could advance practical database tuning by expanding the reconfiguration space to OS-level actions and incorporating runtime feedback in a self-refining agentic loop, addressing limitations of traditional knob-search approaches.

major comments (2)
  1. [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation section: the central claim of 118.1% average improvement and 22.6% time-to-best reduction is presented without any description of baselines, workload parameters, number of trials, statistical significance tests, or error bars, rendering the performance numbers unverifiable from the manuscript text.
  2. [Framework Description] Framework section (interaction loop description): the assertion that the context-grounded harness can reliably diagnose bottlenecks and enforce safety constraints without new failure modes is stated at a high level with no concrete examples, pseudocode, or failure-mode analysis, which is load-bearing for the practical applicability claim.
minor comments (1)
  1. The abstract refers to 'extensive experiments' but supplies no table or figure references for the reported aggregate metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and verifiability of the claims.

read point-by-point responses
  1. Referee: [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation section: the central claim of 118.1% average improvement and 22.6% time-to-best reduction is presented without any description of baselines, workload parameters, number of trials, statistical significance tests, or error bars, rendering the performance numbers unverifiable from the manuscript text.

    Authors: We agree that the abstract presents the aggregate performance numbers at a high level. While the Experimental Evaluation section provides descriptions of the baselines, workloads (YCSB, Sysbench, TPC-H), and MySQL/PostgreSQL setups, we acknowledge that explicit details on the number of trials, statistical significance tests, and error bars are not sufficiently highlighted. To address this, we will revise the abstract to briefly note the baselines and experimental conditions, and expand the Experimental Evaluation section to include the number of independent trials, statistical tests, and error bars on the reported improvements. revision: yes

  2. Referee: [Framework Description] Framework section (interaction loop description): the assertion that the context-grounded harness can reliably diagnose bottlenecks and enforce safety constraints without new failure modes is stated at a high level with no concrete examples, pseudocode, or failure-mode analysis, which is load-bearing for the practical applicability claim.

    Authors: The Framework section describes the context-grounded harness, the proposal-application-observation loop, safety constraints, and use of runtime feedback for bottleneck diagnosis. We agree that the description remains high-level and would benefit from additional support. In revision, we will add pseudocode for the harness decision process, concrete examples of bottleneck diagnosis from observed states (e.g., CPU, I/O metrics), and a dedicated discussion of potential new failure modes with mitigation via the safety constraints and rollback mechanisms. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results only

full rationale

The paper describes an agentic framework for database reconfiguration and reports performance improvements from direct experiments on MySQL/PostgreSQL with YCSB/Sysbench/TPC-H workloads. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the provided text. The central claims rest on experimental outcomes rather than any derivation that reduces to its own definitions or prior fitted values by construction. This is the expected non-finding for a purely empirical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework description implies unstated assumptions about agent planning reliability and safety enforcement but supplies no details for enumeration.

pith-pipeline@v0.9.1-grok · 5804 in / 1122 out tokens · 37941 ms · 2026-06-29T04:54:00.946077+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    Anthropic. 2026. Introducing Claude Opus 4.7. https://www.anthropic.com/ne ws/claude-opus-4-7

  2. [2]

    Baoqing Cai, Yu Liu, Ce Zhang, Guangyu Zhang, Ke Zhou, Li Liu, Chunhua Li, Bin Cheng, Jie Yang, and Jiashu Xing. 2022. HUNTER: an online cloud database hybrid tuning system for personalized requirements. InProceedings of the 2022 International Conference on Management of Data. 646–659

  3. [3]

    Ben Cane. 2017. Improving Linux System Performance with I/O Scheduler Tuning. https://www.cloudbees.com/blog/linux-io-scheduler-tuning

  4. [4]

    Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM symposium on Cloud computing. 143–154

  5. [5]

    Biplob K Debnath, David J Lilja, and Mohamed F Mokbel. 2008. SARD: A statistical approach for ranking database tuning parameters. In2008 IEEE 24th International Conference on Data Engineering Workshop. IEEE, 11–18

  6. [6]

    2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

    DeepSeek-AI. 2026.DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence. Technical Report. DeepSeek-AI. https://huggingface.co/deepseek- ai/DeepSeek-V4-Pro/resolve/main/DeepSeek_V4.pdf

  7. [7]

    Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning database configuration parameters with ituned.Proc. VLDB Endow.2, 1 (2009), 1246–1257

  8. [8]

    Victor Giannakouris and Immanuel Trummer. 2025. 𝜆-tune: Harnessing large language models for automated database system tuning.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26

  9. [9]

    Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Tieying Zhang, Jianjun Chen, Hong Chen, and Cuiping Li. 2024. E2etune: End- to-end knob tuning via fine-tuned generative language model.arXiv preprint arXiv:2404.11581(2024)

  10. [10]

    Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential model- based optimization for general algorithm configuration. InInternational confer- ence on learning and intelligent optimization. Springer, 507–523

  11. [11]

    Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman

  12. [12]

    In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)

    Too many knobs to tune? towards faster database tuning by pre-selecting important knobs. In12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20)

  13. [13]

    Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. 2022. LlamaTune: sample-efficient DBMS configu- ration tuning.arXiv preprint arXiv:2203.05128(2022)

  14. [14]

    2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool

    Alexey Kopytov. 2009.SysBench: A Modular, Cross-Platform and Multi-Threaded Benchmark Tool. MySQL AB. https://imysql.com/wp-content/uploads/2014/10/ sysbench-manual.pdf

  15. [15]

    Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2025. Gptuner: An llm-based database tuning system.ACM SIGMOD Record54, 1 (2025), 101–110

  16. [16]

    Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning.Proceedings of the VLDB Endowment12, 12 (2019), 2118–2130

  17. [17]

    Yiyan Li, Haoyang Li, Jing Zhang, Renata Borovica-Gajic, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. 2025. AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Proceedings of the ACM on Management of Data3, 6 (2025), 1–29

  18. [18]

    Moonshot AI. 2026. Kimi K2.6 API Guide. https://platform.kimi.ai/docs/guide/ki mi-k2-6-quickstart

  19. [19]

    OpenAI. 2026. Introducing GPT-5.5. https://openai.com/index/introducing-gpt- 5-5/

  20. [20]

    Oracle. 2020. Oracle Database 2 Day + Performance Tuning Guide, 19c. https: //docs.oracle.com/en/database/oracle/oracle-database/19/tdppt/

  21. [21]

    Oracle. 2022. Oracle Database Performance Tuning Guide, 19c. https://docs.ora cle.com/en/database/oracle/oracle-database/19/tgdba/

  22. [22]

    2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online

    Oracle. 2026.MySQL 8.0 Reference Manual: Configuring InnoDB Buffer Pool Size Online. Oracle. https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool- resize.html

  23. [23]

    Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023)

  24. [24]

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

  25. [25]

    Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self- Driving Database Management Systems.. InCIDR, Vol. 4. 1

  26. [26]

    2026.PostgreSQL 18 Documentation: Monitoring Database Activity

    PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Monitoring Database Activity. PostgreSQL Global Development Group. https: //www.postgresql.org/docs/18/monitoring.html

  27. [27]

    2026.PostgreSQL 18 Documentation: Resource Consumption

    PostgreSQL Global Development Group. 2026.PostgreSQL 18 Documentation: Resource Consumption. PostgreSQL Global Development Group. https://www. postgresql.org/docs/18/runtime-config-resource.html

  28. [28]

    Red Hat. 2020. Red Hat Enterprise Linux 6 Performance Tuning Guide: Tuning Virtual Memory. https://docs.redhat.com/en/documentation/red_hat_enterpris e_linux/6/html/performance_tuning_guide/s-memory-tunables

  29. [29]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems36 (2023), 68539–68551

  30. [30]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems36 (2023), 8634–8652

  31. [31]

    2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1

    Transaction Processing Performance Council. 2022.TPC Benchmark H (TPC- H), Standard Specification, Revision 3.0.1. Transaction Processing Performance Council. https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC- H_v3.0.1.pdf

  32. [32]

    Reads the Manual

    Immanuel Trummer. 2022. DB-BERT: a Database Tuning Tool that" Reads the Manual". InProceedings of the 2022 international conference on management of data. 190–203

  33. [33]

    Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. InProceedings of the 2017 ACM international conference on management of data. 1009–1024

  34. [34]

    Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Bilien, and Andrew Pavlo. 2021. An inquiry into machine learning- based automatic configuration tuning services on real-world database manage- ment systems.Proceedings of the VLDB Endowment14, 7 (2021), 1241–1253

  35. [35]

    Morreale

    Rik van Riel and Peter W. Morreale. 2008. Documentation for /proc/sys/vm. https://docs.kernel.org/admin-guide/sysctl/vm.html

  36. [36]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291(2023)

  37. [37]

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling

  38. [38]

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. 2025. The rise and potential of large language model based agents: A survey.Science China Information Sciences 68, 2 (2025), 121101

  39. [39]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629(2022)

  40. [40]

    Z.AI. 2026. GLM-5.1 Overview. https://docs.z.ai/guides/llm/glm-5.1

  41. [41]

    Yueyang Zhan, Rui Xi, Jianming Liao, Shuhuan Fan, and Mengshu Hou. 2024. KnobTune: A dynamic database configuration tuning strategy leveraging his- torical workload similarities. InProceedings of the International Conference on Computing, Machine Learning and Data Science. 1–8

  42. [42]

    Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. InProceedings of the 2019 international conference on management of data. 415–432

  43. [43]

    Limeng Zhang and M Ali Babar. 2024. Automatic configuration tuning on cloud database: A survey.arXiv preprint arXiv:2404.06043(2024)

  44. [44]

    Xinyi Zhang, Zhuo Chang, Yang Li, Hong Wu, Jian Tan, Feifei Li, and Bin Cui. 2021. Facilitating database tuning with hyper-parameter optimization: a comprehensive experimental evaluation.arXiv preprint arXiv:2110.12654(2021)

  45. [45]

    Xinyi Zhang, Zhuo Chang, Hong Wu, Yang Li, Jia Chen, Jian Tan, Feifei Li, and Bin Cui. 2023. A unified and efficient coordinating framework for autonomous DBMS tuning.Proceedings of the ACM on Management of Data1, 2 (2023), 1–26

  46. [46]

    Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta- learning for cloud databases. InProceedings of the 2021 international conference on management of data. 2102–2114. SIGMOD/PODS ’27, June, 2027, California, USA Yang et al

  47. [47]

    Xinyi Zhang, Hong Wu, Yang Li, Jian Tan, Feifei Li, and Bin Cui. 2022. Towards dynamic and safe configuration tuning for cloud databases. InProceedings of the 2022 International Conference on Management of Data. 631–645

  48. [48]

    Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems 43, 6 (2025), 1–47

  49. [49]

    Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. Expel: Llm agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19632–19642

  50. [50]

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731

  51. [51]

    Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the perfor- mance potential of systems via automatic configuration tuning. InProceedings of the 2017 symposium on cloud computing. 338–350