pith. sign in

arxiv: 2605.17617 · v1 · pith:2JN7WILBnew · submitted 2026-05-17 · 💻 cs.AI

GraphMind: From Operational Traces to Self-Evolving Workflow Automation

Pith reviewed 2026-05-20 12:02 UTC · model grok-4.3

classification 💻 cs.AI
keywords workflow graphsself-evolving automationoperational tracesmulti-agent systemsincident investigationadaptive reinforcementcloud operations
0
0 comments X

The pith

GraphMind builds action-centric workflow graphs from past resolution traces and lets them evolve through execution feedback to automate incident investigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GraphMind as an end-to-end system that turns large volumes of human problem-solving records into structured graphs showing problems, actions, and causal links. An online engine then uses these graphs to guide multi-agent reasoning when handling new incidents, combining retrieval along graph paths with step-by-step decision making. A reinforcement component strengthens paths that succeed in real executions and weakens those that do not, creating a closed loop where the graph updates itself. This matters for operations that still depend on repeated human coordination because the approach removes the need for ongoing manual redesign of workflows as conditions change. The system has been applied to incident investigation in production cloud database services.

Core claim

GraphMind constructs, executes, and evolves action-centric workflow graphs without human effort through an offline extraction pipeline that builds graphs from resolution traces, an online multi-agent traversal engine that navigates and executes them, and an Adaptive Traversal Reinforcement layer that reinforces successful paths while decaying stale elements.

What carries the argument

Action-centric workflow graphs, built offline from traces and traversed online by a multi-agent engine, with Adaptive Traversal Reinforcement updating the graph from execution results.

If this is right

  • Operational workflows can run with far less ongoing human input once the initial graph is built from traces.
  • Diagnostic performance improves over simple retrieval of similar past traces in reach, accuracy, and speed.
  • The graph adapts to shifting conditions through direct feedback from its own executions rather than external updates.
  • Production deployment across multiple services is feasible at the scale of real cloud operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar trace-to-graph pipelines could be tried in other high-volume operational domains such as IT support tickets or supply-chain adjustments.
  • Over time the reinforced graph might capture patterns that human-written procedures miss because it draws directly from observed outcomes.
  • Testing the system on entirely novel problems outside the initial trace distribution would reveal how far the evolution mechanism extends coverage.

Load-bearing premise

The offline pipeline accurately extracts causal relationships and structured workflow graphs from human resolution traces without introducing significant noise or missing key context.

What would settle it

A clear drop in mitigation success or a rise in required human corrections when the system encounters new incident types absent from the original traces would show the extraction and evolution steps are not sufficient.

Figures

Figures reproduced from arXiv: 2605.17617 by Anna Pavlenko, Divya Vermareddy, Hannah Lerner, Hemkesh Vijaya Kumar, Joyce Cahoon, Katherine Lin, Mathieu Demarne, Meina Wang, Miso Cilimdzic, Nima Shahbazi, Qiushi Bai, Steve Toscano, Subru Krishnan, Swati Bararia, Wenjing Wang, Yiwen Zhu.

Figure 1
Figure 1. Figure 1: Three-phase architecture of GraphMind: offline [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Incremental graph construction. As new opera [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Graph size under varying clustering thresholds for [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Impact of retrieval parameters (𝑘𝑝, 𝑘𝑎) on online troubleshooting metrics. Best cell marked with ∗. 0 1 2 3 4 5 6 Epoch 20k 20k 20k Total Edges Total Synth. 0 100 200 300 Synth. (a) Cumul. edge synthesis. 0 1 2 3 4 5 6 Epoch 0.00 0.20 0.40 0.60 Gini Coeff. Edge Node (b) Gini coefficient [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Reinforcement evolution over six epochs. (a) 289 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Reinforced subgraph evolution across six epochs. Node size and edge thickness are proportional to reinforcement [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Engagement and responsiveness across 62 produc [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

Complex operational workflows coordinating personnel, tools, and information are central to enterprise operations, yet end-to-end automation remains challenging due to extensive requirements for human inputs and the inability to adapt over time. We present GraphMind, an end-to-end system that constructs, executes, and evolves action-centric workflow graphs without human effort. The system operates in three phases. First, a scalable offline pipeline extracts structured workflow graphs from large volumes of human resolution traces, capturing problems, actions, and their causal relationships. Second, an online multi-agent traversal engine navigates the graph to dynamically construct and execute workflows, combining graph-guided retrieval with LLM-driven reasoning at each step. Third, Adaptive Traversal Reinforcement (ATR) reinforces successful traversal paths and decays stale elements. This closed-loop mechanism enables the graph to self-optimize and adapt to shifting operational conditions. GraphMind has been deployed across four production cloud database services for incident investigation. Evaluated on production data, the system substantially outperforms a Trace-RAG baseline in mitigation reach, groundedness, and diagnostic throughput, scoring 4.95/5 in blind expert review. The ATR layer provides further gains across all metrics, demonstrating that workflow graphs can learn and improve from execution-derived feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents GraphMind, an end-to-end system that constructs, executes, and evolves action-centric workflow graphs from human resolution traces for automating complex operational workflows. It operates via three phases: an offline pipeline extracting structured graphs capturing problems, actions, and causal relationships; an online multi-agent traversal engine combining graph-guided retrieval with LLM reasoning; and Adaptive Traversal Reinforcement (ATR) for reinforcing successful paths and enabling self-optimization. The system is reported to be deployed across four production cloud database services for incident investigation, substantially outperforming a Trace-RAG baseline in mitigation reach, groundedness, and diagnostic throughput, with a 4.95/5 blind expert review score and further gains from ATR.

Significance. If the extraction accuracy and evaluation results hold, the work could meaningfully advance self-evolving automation in enterprise IT operations by minimizing human inputs and enabling adaptation to changing conditions. The production deployment across multiple services and the closed-loop ATR mechanism represent practical strengths with potential for broader impact in AI-driven workflow systems. However, the absence of methodological details currently limits the assessed significance.

major comments (2)
  1. [Abstract] Abstract: The abstract reports strong production results, outperformance over Trace-RAG, and a 4.95/5 expert score but provides no details on evaluation methodology, baselines, statistical significance, trace processing, or dataset scale. This is load-bearing for the central claims, as it prevents verification of whether the gains are robust or influenced by post-hoc choices.
  2. [Offline pipeline] Offline pipeline description: The system's foundation is the offline extraction of causal relationships and structured workflow graphs from human resolution traces. No quantitative metrics on extraction accuracy, error analysis, ablation studies on graph quality, or handling of noise/missing context are provided, which directly affects confidence in the online traversal, mitigation reach, and ATR-based evolution claims.
minor comments (1)
  1. [Abstract] Abstract: A high-level figure illustrating the three-phase flow (offline extraction to online traversal to ATR) would improve clarity of the system architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical value of GraphMind's production deployment and closed-loop ATR mechanism. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract reports strong production results, outperformance over Trace-RAG, and a 4.95/5 expert score but provides no details on evaluation methodology, baselines, statistical significance, trace processing, or dataset scale. This is load-bearing for the central claims, as it prevents verification of whether the gains are robust or influenced by post-hoc choices.

    Authors: We agree that the abstract, due to length constraints, presents results at a high level. The full manuscript contains the requested details on evaluation methodology, dataset scale, Trace-RAG baseline construction, and expert review protocol in the Experiments section. To improve accessibility, we will revise the abstract to incorporate a concise statement on evaluation setup, dataset size, and statistical reporting while preserving the high-level focus. revision: yes

  2. Referee: [Offline pipeline] Offline pipeline description: The system's foundation is the offline extraction of causal relationships and structured workflow graphs from human resolution traces. No quantitative metrics on extraction accuracy, error analysis, ablation studies on graph quality, or handling of noise/missing context are provided, which directly affects confidence in the online traversal, mitigation reach, and ATR-based evolution claims.

    Authors: The referee is correct that the current description of the offline pipeline lacks quantitative validation. While the pipeline architecture and causal extraction process are detailed in the manuscript, we did not report accuracy metrics or ablations. In the revised manuscript we will add a dedicated evaluation subsection reporting precision/recall on a held-out annotated trace set, an error analysis for noise and missing context, and an ablation on graph quality's impact on downstream traversal performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper describes a three-phase pipeline: (1) offline extraction of structured workflow graphs from human resolution traces, (2) online multi-agent traversal combining graph-guided retrieval with LLM reasoning, and (3) ATR that reinforces successful paths and decays stale elements using execution-derived feedback. Evaluation metrics (mitigation reach, groundedness, diagnostic throughput, 4.95/5 expert review) are reported against a Trace-RAG baseline on production data from four deployed services. No equations, fitted parameters, or self-citations are presented that reduce any claimed prediction or result to the input traces by construction. The reinforcement loop explicitly draws from new execution outcomes rather than re-using the original trace data for both construction and scoring. The central claims therefore rest on externally observable deployment performance and comparative evaluation rather than tautological re-labeling of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only, the central claim rests on the assumption that human resolution traces contain sufficient structured causal information for reliable graph extraction and that LLM-driven reasoning at traversal steps produces grounded actions. No explicit free parameters or invented entities are named.

axioms (1)
  • domain assumption Human resolution traces contain extractable problems, actions, and causal relationships that form accurate workflow graphs.
    Invoked in the first phase description of the offline pipeline.

pith-pipeline@v0.9.0 · 5809 in / 1384 out tokens · 35411 ms · 2026-05-20T12:02:18.443545+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

  1. [1]

    Anthropic. 2024. Model Context Protocol (MCP). https://modelcontextprotocol. io/. Open standard for connecting AI applications to external systems; en- ables structured, secure interaction between LLMs and data sources/tools via a universal protocol

  2. [2]

    Anthropic. 2025. Claude Code. https://docs.anthropic.com/en/docs/claude-code. Agentic coding tool with built-in MCP client support

  3. [3]

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. InProceedings of the International Conference on Learning Representations (ICLR)

  4. [4]

    Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok- sana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. InAdvances in Neural Information Processing Systems (NeurIPS)

  5. [5]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al . 2020. Language Models are Few-Shot Learners. InAdvances in Neural Information Processing Systems (NeurIPS)

  6. [6]

    Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, Jun Zeng, Supriyo Ghosh, Xuchao Zhang, Chaoyun Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Tianyin Xu. 2024. Automatic Root Cause Analysis via Large Language Models for Cloud Incidents. InProceedings of the 19th European Conference o...

  7. [7]

    DeLong, Ramon Fernandez Mir, and Jacques D

    Lara N. DeLong, Ramon Fernandez Mir, and Jacques D. Fleuriot. 2024. Neurosym- bolic AI for Reasoning Over Knowledge Graphs: A Survey.IEEE Transactions on Neural Networks and Learning Systems(2024). https://doi.org/10.1109/TNNLS. 2024.3420218

  8. [8]

    Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D Knowledge Graph Embeddings. InProceedings of the AAAI Conference on Artificial Intelligence

  9. [9]

    Patrizia d’Ettorre and Alain Lenoir. 2010. Nestmate Recognition. InAnt Ecology, Lori Lach, Catherine L. Parr, and Kirsti L. Abbott (Eds.). Oxford University Press, 194–209

  10. [10]

    Gianni Di Caro and Marco Dorigo. 1998. AntNet: distributed stigmergetic control for communications networks.Journal of Artificial Intelligence Research9 (1998), 317–365

  11. [11]

    Marco Dorigo and Luca Maria Gambardella. 1997. Ant colony system: a coopera- tive learning approach to the traveling salesman problem. InIEEE Transactions on Evolutionary Computation, Vol. 1. 53–66. https://doi.org/10.1109/4235.585892

  12. [12]

    Marco Dorigo, Vittorio Maniezzo, and Alberto Colorni. 1996. Ant system: optimization by a colony of cooperating agents.IEEE Transactions on Sys- tems, Man, and Cybernetics, Part B (Cybernetics)26, 1 (1996), 29–41. https: //doi.org/10.1109/3477.484436

  13. [13]

    2004.Ant Colony Optimization

    Marco Dorigo and Thomas Stützle. 2004.Ant Colony Optimization. MIT Press

  14. [14]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2025. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130 [cs.CL] https://arxiv.org/abs/2404.16130

  15. [15]

    Corrado Gini. 1921. Measurement of inequality of incomes.The economic journal 31, 121 (1921), 124–125

  16. [16]

    GitHub. 2026. GitHub Copilot CLI. https://github.blog/changelog/2026-02-25- github-copilot-cli-is-now-generally-available/. MCP-compatible agentic coding assistant for the command line

  17. [17]

    Retrieval-Augmented Generation with Graphs (GraphRAG)

    Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A. Rossi, Subhabrata Mukherjee, et al . 2025. Retrieval-Augmented Generation with Graphs (GraphRAG).arXiv preprint arXiv:2501.00309(2025)

  18. [18]

    Stephen C. Johnson. 1967. Hierarchical Clustering Schemes.Psychometrika32, 3 (Sept. 1967), 241–254. https://doi.org/10.1007/BF02289588

  19. [19]

    Liubov Kovriguina, Irina Toma, et al. 2024. LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs. InProceedings of the International Semantic Web Conference (ISWC). https://arxiv.org/abs/2410.06062

  20. [20]

    GraphMind : From Operational Traces to Self-Evolving Workflow Automation Conference’17, July 2017, Washington, DC, USA

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. GraphMind : From Operational Traces to Self-Evolving Workflow Automation Conference’17, July 2017, Washington, DC, USA

  21. [21]

    In Advances in Neural Information Processing Systems (NeurIPS)

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems (NeurIPS)

  22. [22]

    Daniel Merkle, Martin Middendorf, and Hartmut Schmeck. 2002. Ant colony optimization for resource-constrained project scheduling. InIEEE Transactions on Evolutionary Computation, Vol. 6. 333–346

  23. [23]

    OpenAI. 2023. Function Calling. https://platform.openai.com/docs/guides/ function-calling. Accessed: 2025-05-15

  24. [24]

    OpenAI. 2023. GPT-4 Technical Report. https://cdn.openai.com/papers/gpt-4.pdf

  25. [25]

    Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu

  26. [26]

    IEEE Transactions on Knowledge and Data Engineering36, 7 (2024), 3580–3601

    Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering36, 7 (2024), 3580–3601. https://doi.org/10.1109/TKDE.2024.3352100

  27. [27]

    Kartik Ravichandran, Namrata Gurumurthy Kumar, Prateek Mishra, and Rahul Agrawal. 2025. OG-RAG: Ontology-Grounded Retrieval-Augmented Generation for Large Language Models. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2025.emnlp- main.1674

  28. [28]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS)

  29. [29]

    Mili Shah, Joyce Cahoon, Mirco Milletari, Jing Tian, Fotis Psallidas, Andreas Mueller, and Nick Litombe. 2024. Improving LLM-based KGQA for multi-hop Question Answering with implicit reasoning in few-shot examples. InProceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024). Association for Computational Linguistics, Bangk...

  30. [30]

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. 2025. Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG.arXiv preprint arXiv:2501.09136(2025)

  31. [31]

    Vikramank Singh, Kapil Eknath Vaidya, Vinayshekhar Bannihatti Kumar, Sopan Khosla, Murali Narayanaswamy, Rashmi Gangadharaiah, and Tim Kraska. 2024. Panda: Performance Debugging for Databases using LLM Agents. InProceedings of the Conference on Innovative Data Systems Research (CIDR)

  32. [32]

    Thomas Stützle and Holger H. Hoos. 2000. MAX-MIN Ant System.Future Generation Computer Systems16, 8 (2000), 889–914. https://doi.org/10.1016/S0167- 739X(00)00043-1

  33. [33]

    Sina Tabakhi, Parham Moradi, and Fardin Akhlaghian. 2014. An unsupervised feature selection algorithm based on ant colony optimization.Engineering Appli- cations of Artificial Intelligence32 (2014), 112–123

  34. [34]

    Wil M. P. van der Aalst. 2011.Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer. https://doi.org/10.1007/978-3-642- 19345-3

  35. [35]

    Wil M. P. van der Aalst, Ton Weijters, and Laura Maruster. 2004. Workflow Mining: Discovering Process Models from Event Logs.IEEE Transactions on Knowledge and Data Engineering16, 9 (2004), 1128–1142. https://doi.org/10.1109/TKDE.2004.47

  36. [36]

    van Zweden and Patrizia d’Ettorre

    Jelle S. van Zweden and Patrizia d’Ettorre. 2010. Nestmate Recognition in Social Insects and the Role of Hydrocarbons. InInsect Hydrocarbons: Biology, Biochem- istry, and Chemical Ecology, Gary J. Blomquist and Anne-Geneviève Bagnères (Eds.). Cambridge University Press, 222–243

  37. [37]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2024. A survey on large language model based autonomous agents.Frontiers of Computer Science18, 6 (2024), 186345. https://doi.org/10.1007/s11704-024-40231-1

  38. [38]

    Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. InProceedings of the AAAI Conference on Artificial Intelligence

  39. [39]

    Xinyi Xia, Yijun Zhu, Jianan Guo, et al . 2025. Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models. InProceedings of the International Conference on Learning Representations (ICLR). https:// openreview.net/forum?id=oMFOKjwaRS

  40. [40]

    Yuqing Xue, Zhuoran Wang, Wei Sun, Fanxin Meng, Wenchao Zhang, Zhenyu Li, et al. 2025. TRIANGLE: A Benchmark and Framework for Automated Incident Triage in Large-Scale Cloud Systems. InProceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://netman.aiops.org/wp-conte...

  41. [41]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InProceedings of the International Conference on Learning Representations (ICLR)

  42. [42]

    William Zhang, Yiwen Zhu, Yunlei Lu, Mathieu Demarne, Wenjing Wang, Kai Deng, Nutan Sahoo, Katherine Lin, Miso Cilimdzic, and Subru Krishnan. 2025. FLAIR: Feedback Learning for Adaptive Information Retrieval. InProceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25). Association for Computing Machinery, Seou...

  43. [43]

    Xuanhe Zhou, Guoliang Li, Zhaoyan Liu, et al. 2024. D-Bot: Database Diagnosis System using Large Language Models.Proceedings of the VLDB Endowment17, 10 (2024), 2514–2527. https://doi.org/10.14778/3675034.3675043

  44. [44]

    Yiwen Zhu, Mathieu Demarne, Kai Deng, Wenjing Wang, Nutan Sahoo, Divya Vermareddy, Hannah Lerner, Yunlei Lu, Swati Bararia, Anjali Bhavan, William Zhang, Xia Li, Katherine Lin, Miso Cilimdzic, and Subru Krishnan. 2025. DECO: Life-Cycle Management of Enterprise-Grade Copilots. https://doi.org/10.1145/ 3770854.3783949 arXiv:2412.06099 [cs.SE]