Pomona: Continuous Code Quality Improvement via Small, Automated Changes at Bloomberg

Angelos Evripiotis; David Williams; Federica Sarro; Harry Morgan; Peter Wainwright; Sergey Magidovich; Serkan Kirbas

arxiv: 2606.06752 · v1 · pith:6ABDH3S5new · submitted 2026-06-04 · 💻 cs.SE

Pomona: Continuous Code Quality Improvement via Small, Automated Changes at Bloomberg

David Williams , Angelos Evripiotis , Serkan Kirbas , Harry Morgan , Sergey Magidovich , Peter Wainwright , Federica Sarro This is my paper

Pith reviewed 2026-06-27 23:53 UTC · model grok-4.3

classification 💻 cs.SE

keywords code quality improvementautomated pull requestsagentic toolscontinuous improvementtechnical debtsmall changeshuman-in-the-loopsoftware maintenance

0 comments

The pith

Pomona uses scanning and repair skills to generate tiny pull requests that fix code issues and merge quickly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Pomona as a lightweight tool that runs a cycle of finding code quality tasks and fixing them in small automated changes. A scanning component spots issues such as lint violations or test gaps and builds a backlog, while a repair component creates pull requests limited to roughly ten lines of diff. In a one-month test inside one team, 15 of 17 such requests were merged with a median close time below two hours. A survey of ten engineers found eight wanted to keep using the system because the changes stayed small and targeted quality improvements directly.

Core claim

Pomona automates a Kaizen-style loop in which a Scanning skill identifies and prioritizes improvement tasks and a Repair skill produces tiny pull requests; the one-month deployment showed 15 of 17 generated PRs merged at a median time-to-close under two hours, and eight of ten surveyed engineers expressed a desire to adopt the tool for its small diff sizes and code-quality focus.

What carries the argument

The human-in-the-loop cycle of a Scanning skill that builds a structured backlog of tasks and a Repair skill that emits ~10-line diffs for incremental fixes.

If this is right

Small, low-risk changes allow continuous technical-debt reduction without large disruptions.
Quick merge times and focused diffs preserve engineer trust and productivity.
A structured backlog helps teams address quality issues in priority order.
The design keeps humans in control while automating routine maintenance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scanning-plus-repair pattern could be tested in teams with different languages or codebases.
Longer deployments might reveal whether the rate of new issues stays ahead of the fixes produced.
Adding more skill types could expand the range of improvements handled automatically.

Load-bearing premise

Results from a single team over one month and surveys of ten engineers will generalize to other teams and longer periods.

What would settle it

A second deployment in a different team in which fewer than half the generated PRs merge or the median close time exceeds one day.

Figures

Figures reproduced from arXiv: 2606.06752 by Angelos Evripiotis, David Williams, Federica Sarro, Harry Morgan, Peter Wainwright, Sergey Magidovich, Serkan Kirbas.

**Figure 1.** Figure 1: Overview of Pomona. Each shaded region represents the responsibilities of an agent skill (left: Scanning; right: Repair). This short paper provides an in-depth description of Pomona’s implementation and offers insights into practical strategies for deploying agentic AI4SE in a large-scale industrial environment. 2 Pomona Overview [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Excerpt of Pomona’s backlog for the adopting Bloomberg engineering team featured in §3.1. configuration files) and looks for long functions (i.e., functions longer than 50 lines) that can be decomposed into smaller parts. Once all agents complete their searches, the Scanning skill concludes by aggregating their findings. This aggregation is done in two steps: First, the findings are listed and concatenate… view at source ↗

read the original abstract

In this short experience paper, we present Pomona, a lightweight agentic tool that utilises agent skills for continuous automated code quality improvement. Inspired by the philosophy of Kaizen(TM), Pomona automates a cycle of discovery and incremental repair: a Scanning skill identifies improvement tasks (e.g., linting violations, technical debt markers, and test gaps) and prioritises them in a structured backlog, while a Repair skill generates tiny pull requests (PRs) targeting ~10 lines of diff. This human-in-the-loop design enables frequent, low-risk improvements while maintaining engineer trust and productivity and reducing technical debt. We evaluated Pomona through a one-month deployment in a team and a questionnaire distributed to 10 senior engineers. Our preliminary results are promising: 15 of 17 generated PRs were successfully merged with a median time-to-close of under 2 hours. Furthermore, 8/10 of surveyed engineers expressed a desire to adopt Pomona, praising small diff sizes and Pomona's focus on improving code quality. We conclude by discussing actionable insights for researchers and practitioners on strategies for effective agentic deployment in industry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pomona reports a one-month single-team trial of tiny agent-generated PRs with 15/17 merges and 8/10 positive survey responses, but the numbers are too small and uncontrolled to support broader claims.

read the letter

Pomona is a lightweight agent that scans for lint violations and similar issues then produces very small PRs of about 10 lines. In a one-month run inside one Bloomberg team it produced 17 PRs, 15 of which merged with a median close time under two hours, and 8 out of 10 surveyed engineers said they would like to keep using it.

The concrete deployment and the explicit design choice to keep diffs tiny are the useful parts. The Kaizen-style human-in-the-loop backlog plus separate scan and repair skills gives a practical pattern for keeping agent changes low-risk. Reporting actual merge counts and engineer reactions from an industrial setting is better than another lab prototype.

The evaluation is the clear weak point. One team, 17 PRs, and 10 questionnaire answers over one month is a very narrow base. There are no baseline merge rates for normal PRs, no count of how many issues were missed or wrongly flagged, no breakdown of what the PRs actually changed, and no longer-term measures of code quality. The paper itself calls the results preliminary, which matches the data.

This is an experience report aimed at practitioners who build or evaluate automated maintenance tools and at researchers who want real deployment stories. It is worth sending to peer review so the authors can add more context on the PR contents and any false-positive handling, but it does not yet give strong evidence that the approach will transfer or scale.

Referee Report

2 major / 2 minor

Summary. The paper presents Pomona, a lightweight agentic tool inspired by Kaizen that uses a Scanning skill to identify code quality tasks (linting, technical debt, test gaps) and a Repair skill to generate small (~10-line) pull requests. It reports results from a one-month deployment in a single team (15 of 17 PRs merged, median time-to-close under 2 hours) and a questionnaire to 10 senior engineers (8/10 expressed desire to adopt), framing the outcomes as preliminary evidence that the human-in-the-loop design supports frequent, low-risk improvements.

Significance. If the small-diff, human-in-the-loop approach can be shown to scale, it would provide a practical template for deploying agentic tools in industry to incrementally reduce technical debt without disrupting engineer productivity. The emphasis on tiny, reviewable changes and explicit trust-preserving design is a concrete contribution to the literature on automated code improvement.

major comments (2)

[Evaluation / Deployment section] Evaluation / Deployment section: The central effectiveness claim (15/17 merged PRs, 8/10 adoption desire) rests on a single-team, one-month deployment with N=17 PRs and N=10 survey responses. No baseline merge rates for the team, no denominator for total PR volume during the period, no false-positive rate for the Scanning skill, and no longitudinal code-quality metrics are supplied, so the reported percentages cannot be shown to be stable or transferable beyond this setting.
[Abstract and questionnaire description] Abstract and § on questionnaire: The survey reports only positive desire-to-adopt figures without describing question wording, sampling method, response rate, or any negative or neutral feedback; this limits the ability to interpret the 80 % figure as representative engineer acceptance.

minor comments (2)

[Deployment description] The manuscript should specify the exact size and composition of the deployment team and the total number of PRs opened by the team during the one-month window to contextualize the 17 generated PRs.
[Results] Provide a brief breakdown (table or list) of the categories of improvements attempted by the Repair skill and any cases where generated PRs were rejected or required substantial revision.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments on our experience paper. We agree the evaluation is preliminary and will revise to better contextualize limitations while providing additional details on the questionnaire where possible.

read point-by-point responses

Referee: [Evaluation / Deployment section] Evaluation / Deployment section: The central effectiveness claim (15/17 merged PRs, 8/10 adoption desire) rests on a single-team, one-month deployment with N=17 PRs and N=10 survey responses. No baseline merge rates for the team, no denominator for total PR volume during the period, no false-positive rate for the Scanning skill, and no longitudinal code-quality metrics are supplied, so the reported percentages cannot be shown to be stable or transferable beyond this setting.

Authors: We agree the deployment was limited to one team and one month, yielding a small sample. As an experience report, the aim was to document initial industrial observations rather than a controlled study with baselines. We will revise the Evaluation section to explicitly note the lack of baselines, total PR volume, false-positive rates, and longitudinal metrics, and to stress that results are preliminary and not claimed to be generalizable. We cannot supply the missing quantitative baselines because they were not collected during the focused deployment of Pomona-generated PRs. revision: partial
Referee: [Abstract and questionnaire description] Abstract and § on questionnaire: The survey reports only positive desire-to-adopt figures without describing question wording, sampling method, response rate, or any negative or neutral feedback; this limits the ability to interpret the 80 % figure as representative engineer acceptance.

Authors: We will revise the abstract and questionnaire section to describe the sampling method (distributed to the 10 senior engineers who had reviewed Pomona PRs), the response rate (100%), and the question wording. The questionnaire focused on desire to adopt and perceived benefits of small diffs; we will note that all responses on adoption desire were positive with no negative or neutral feedback recorded. This will improve interpretability of the 8/10 result. revision: yes

standing simulated objections not resolved

We cannot provide baseline merge rates, total PR volume during the period, false-positive rates for the Scanning skill, or longitudinal code-quality metrics, as these were not measured in the deployment.

Circularity Check

0 steps flagged

No circularity: paper reports direct empirical observations from deployment with no derivations or fitted models.

full rationale

The paper is an experience report describing a one-month deployment and survey results. It contains no equations, no parameter fitting, no predictions derived from models, and no self-citation chains supporting uniqueness theorems or ansatzes. All claims (e.g., 15/17 PRs merged) are presented as raw observed counts from the deployment, not as outputs of any derivation that reduces to inputs by construction. This matches the default case of a self-contained empirical description with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an experience report on a software tool deployment. It introduces no free parameters, mathematical axioms, or new invented entities. All claims rest on the described deployment observations and questionnaire responses.

pith-pipeline@v0.9.1-grok · 5749 in / 1139 out tokens · 29787 ms · 2026-06-27T23:53:46.191222+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 11 canonical work pages · 6 internal anchors

[1]

Salaheddin Alzubi, Noah Provenzano, Jaydon Bingham, Weiyuan Chen, and Tu Vu. 2026. EvoSkill: Automated Skill Discovery for Multi-Agent Systems. arXiv:2603.02766 [cs.AI] https://arxiv.org/abs/2603.02766

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. 2025. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. doi:10.48550/arXiv.2507.09089 arXiv:2507.09089 [cs]

work page doi:10.48550/arxiv.2507.09089 2025
[3]

Inc. Github. 2026. About agent skills. https://docs.github.com/en/copilot/ concepts/agents/about-agent-skills. Accessed: April 30, 2026. arXiv Preprint, 2026, Online Williams et al

2026
[4]

Jingzhi Gong, Ruizhen Gu, Zhiwei Fei, Yazhuo Cao, Lukas Twist, Alina Geiger, Shuo Han, Dominik Sobania, Federica Sarro, and Jie M. Zhang. 2026. Skill- MOO: Multi-Objective Optimization of Agent Skills for Software Engineering. arXiv:2604.09297 [cs.SE] https://arxiv.org/abs/2604.09297

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Tingxu Han, Yi Zhang, Wei Song, Chunrong Fang, Zhenyu Chen, Youcheng Sun, and Lijie Hu. 2026. SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering? arXiv:2603.15401 [cs.SE] https://arxiv.org/ abs/2603.15401

work page arXiv 2026
[6]

Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. 2026. Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects. arXiv:2511.04427 [cs.SE] doi:10.1145/3793302.3793349

work page doi:10.1145/3793302.3793349 2026
[7]

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Binxu Li, Qunhong Zeng, Di Wang, Xuandong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi Liu, X...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[8]

Pan, Guilin Qi, Haofen Wang, and Huajun Chen

Yuan Liang, Ruobin Zhong, Haoming Xu, Chen Jiang, Yi Zhong, Runnan Fang, Jia-Chen Gu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Xin Xu, Tongtong Wu, Kun Wang, Yang Liu, Zhen Bi, Jungang Lou, Yuchen Eleanor Jiang, Hangcheng Zhu, Gang Yu, Haiwen Hong, Longtao Huang, Hui Xue, Chenxi Wang, Yijun Wang, Zifei Shan, Xi Chen, Zhaopeng Tu, Feiyu Xiong, X...
[9]

arXiv:2603.04448 [cs.AI] https://arxiv.org/abs/2603.04448

SkillNet: Create, Evaluate, and Connect AI Skills. arXiv:2603.04448 [cs.AI] https://arxiv.org/abs/2603.04448

work page arXiv
[10]

Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Ying Zhang, and Leo Yu Zhang. 2026. Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study. arXiv:2602.06547 [cs.CR] https://arxiv.org/abs/2602. 06547

work page internal anchor Pith review Pith/arXiv arXiv 2026
[11]

Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. 2026. Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale. arXiv:2601.10338 [cs.CR] https://arxiv.org/ abs/2601.10338

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Anthropic PBC. 2026. Skills | Agent Skills. https://platform.claude.com/docs/en/ agents-and-tools/agent-skills/overview. Accessed: April 30, 2026

2026
[13]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. https: //arxiv.org/abs/2302.06590v1

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Haifeng Ruan, View Profile, Yuntong Zhang, View Profile, Abhik Roychoudhury, and View Profile. 2025. SpecRover: Code Intent Extraction via LLMs. InPro- ceedings of the IEEE/ACM 47th International Conference on Software Engineering. 963–974. doi:10.1109/ICSE55347.2025.00080

work page doi:10.1109/icse55347.2025.00080 2025
[15]

Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. 2025. OpenHands: An Open Platform for A...

2025
[16]

David Williams et al. 2026. Pomona Supplementary Materials Repository. https: //figshare.com/s/09a23c22550f3e3967b2. Accessed: April 30, 2026

2026
[17]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: agent-computer interfaces enable automated software engineering. InProceedings of the 38th International Conference on Neural Information Processing Systems (NIPS ’24, Vol. 37). Curran Associates Inc., Red Hook, NY, USA, 50528–50652

2024

[1] [1]

Salaheddin Alzubi, Noah Provenzano, Jaydon Bingham, Weiyuan Chen, and Tu Vu. 2026. EvoSkill: Automated Skill Discovery for Multi-Agent Systems. arXiv:2603.02766 [cs.AI] https://arxiv.org/abs/2603.02766

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. 2025. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. doi:10.48550/arXiv.2507.09089 arXiv:2507.09089 [cs]

work page doi:10.48550/arxiv.2507.09089 2025

[3] [3]

Inc. Github. 2026. About agent skills. https://docs.github.com/en/copilot/ concepts/agents/about-agent-skills. Accessed: April 30, 2026. arXiv Preprint, 2026, Online Williams et al

2026

[4] [4]

Jingzhi Gong, Ruizhen Gu, Zhiwei Fei, Yazhuo Cao, Lukas Twist, Alina Geiger, Shuo Han, Dominik Sobania, Federica Sarro, and Jie M. Zhang. 2026. Skill- MOO: Multi-Objective Optimization of Agent Skills for Software Engineering. arXiv:2604.09297 [cs.SE] https://arxiv.org/abs/2604.09297

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Tingxu Han, Yi Zhang, Wei Song, Chunrong Fang, Zhenyu Chen, Youcheng Sun, and Lijie Hu. 2026. SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering? arXiv:2603.15401 [cs.SE] https://arxiv.org/ abs/2603.15401

work page arXiv 2026

[6] [6]

Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. 2026. Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects. arXiv:2511.04427 [cs.SE] doi:10.1145/3793302.3793349

work page doi:10.1145/3793302.3793349 2026

[7] [7]

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Binxu Li, Qunhong Zeng, Di Wang, Xuandong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi Liu, X...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[8] [8]

Pan, Guilin Qi, Haofen Wang, and Huajun Chen

Yuan Liang, Ruobin Zhong, Haoming Xu, Chen Jiang, Yi Zhong, Runnan Fang, Jia-Chen Gu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Xin Xu, Tongtong Wu, Kun Wang, Yang Liu, Zhen Bi, Jungang Lou, Yuchen Eleanor Jiang, Hangcheng Zhu, Gang Yu, Haiwen Hong, Longtao Huang, Hui Xue, Chenxi Wang, Yijun Wang, Zifei Shan, Xi Chen, Zhaopeng Tu, Feiyu Xiong, X...

[9] [9]

arXiv:2603.04448 [cs.AI] https://arxiv.org/abs/2603.04448

SkillNet: Create, Evaluate, and Connect AI Skills. arXiv:2603.04448 [cs.AI] https://arxiv.org/abs/2603.04448

work page arXiv

[10] [10]

Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Ying Zhang, and Leo Yu Zhang. 2026. Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study. arXiv:2602.06547 [cs.CR] https://arxiv.org/abs/2602. 06547

work page internal anchor Pith review Pith/arXiv arXiv 2026

[11] [11]

Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. 2026. Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale. arXiv:2601.10338 [cs.CR] https://arxiv.org/ abs/2601.10338

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

Anthropic PBC. 2026. Skills | Agent Skills. https://platform.claude.com/docs/en/ agents-and-tools/agent-skills/overview. Accessed: April 30, 2026

2026

[13] [13]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. https: //arxiv.org/abs/2302.06590v1

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Haifeng Ruan, View Profile, Yuntong Zhang, View Profile, Abhik Roychoudhury, and View Profile. 2025. SpecRover: Code Intent Extraction via LLMs. InPro- ceedings of the IEEE/ACM 47th International Conference on Software Engineering. 963–974. doi:10.1109/ICSE55347.2025.00080

work page doi:10.1109/icse55347.2025.00080 2025

[15] [15]

Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. 2025. OpenHands: An Open Platform for A...

2025

[16] [16]

David Williams et al. 2026. Pomona Supplementary Materials Repository. https: //figshare.com/s/09a23c22550f3e3967b2. Accessed: April 30, 2026

2026

[17] [17]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: agent-computer interfaces enable automated software engineering. InProceedings of the 38th International Conference on Neural Information Processing Systems (NIPS ’24, Vol. 37). Curran Associates Inc., Red Hook, NY, USA, 50528–50652

2024