GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

Aleksandr Tsymbalov; Anastasiya Palienko; Artem Epifanov; Danis Zaripov

arxiv: 2606.16000 · v2 · pith:4ZM6J2T7new · submitted 2026-06-14 · 💻 cs.CL · cs.LG

GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science

Aleksandr Tsymbalov , Danis Zaripov , Artem Epifanov , Anastasiya Palienko This is my paper

Pith reviewed 2026-06-27 03:39 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords GRACE-DSLLM AutoML agentsevaluation environmentdata science workflowsiterative interactionhidden validatorstabular MLagent correction

0 comments

The pith

Flexible iterative interaction yields higher quality AutoML workflows than single-shot or unstructured approaches in a guarded evaluation setup.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

GRACE-DS supplies an isolated environment and metrics to test LLM-powered AutoML agents on tabular tasks specific to an organization. Agents move through planning, data inspection, feature engineering, model building, validation, code repair, and submission while hidden validators track leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment in addition to predictive performance. The paper reports that a flexible iterative interaction regime produces higher end-to-end normalized hidden-test quality and better protocol-valid completion rates than single-shot generation, unstructured interaction, or restart baselines. These outcomes were measured across more than 7,000 episodes. The setup therefore provides a way to check whether agents can follow production-like rules before deployment.

Core claim

GRACE-DS is a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. It exposes agents to realistic workflow stages from planning and data inspection through feature engineering, model development, validation, and code repair to final submission, while hidden executable validators measure not only final predictive performance but also leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The strongest structured regime, flexible iterative interaction, achieves higher end-to-end normalized hidden-test quality than single-shot generation, unstructured interaction, and restart-bas

What carries the argument

GRACE-DS environment with hidden executable validators that score agents on leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment during full data-science workflows.

If this is right

LLM AutoML agents achieve higher end-to-end normalized hidden-test quality under flexible iterative interaction than under single-shot generation or restart baselines.
Protocol-valid completion rates rise when agents use the structured iterative regime.
The environment can assess whether agents meet organization-specific requirements across planning, feature engineering, validation, and code repair stages.
Performance differences remain consistent when measured over thousands of episodes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Organizations could adopt similar guarded environments to screen agents before live use and reduce risks such as data leakage.
Extending the validator set to non-tabular domains could test whether the advantage of iterative correction generalizes.
The focus on reward alignment opens a path for incorporating organization-specific human oversight signals into agent evaluation.

Load-bearing premise

The hidden executable validators accurately and comprehensively measure leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment in a manner that corresponds to real organizational production requirements and cannot be gamed by the agents.

What would settle it

An agent that achieves top scores on all GRACE-DS metrics yet produces code that leaks data or fails reproducibility when run outside the guarded environment on the organization's actual data.

Figures

Figures reproduced from arXiv: 2606.16000 by Aleksandr Tsymbalov, Anastasiya Palienko, Artem Epifanov, Danis Zaripov.

**Figure 1.** Figure 1: GRACE-DS evaluates LLM AutoML agents in a staged tabular environment. The evaluator owns the split and the private valid/hidden-test labels (top row). The agent (bottom row) sees only training labels and validation features and iterates LLM agent → restricted execution → reproducible candidate. Each candidate is gated by validation, and the agent receives only abstract stage and metric feedback. Scoring is… view at source ↗

**Figure 2.** Figure 2: Model × regime structure of end-to-end quality. Each cell is colored by the mean end-to-end normalized hidden-test quality (E2E Q) of the corresponding model and regime (multiply the test score by the coverage), the cell annotation reports the mean observed quality on scored episodes together with the scored-episode coverage out of the 60 episodes per cell (10 tasks × 6 repeats). Reading across a row isola… view at source ↗

**Figure 3.** Figure 3: Model × regime rate of any risky behavior. A run is counted as risky if it exhibits at least one execution error, critical methodological error, protocol violation, forbidden-action attempt, or payload error, cells are colored by the per-cell rate and annotated with the number of risky episodes out of 60. The single-shot and restart columns concentrate the leakage-class critical errors, whereas the structu… view at source ↗

**Figure 4.** Figure 4: End-to-end quality across the autonomy spectrum for three representative models. Mean E2E Q for [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Quality-cost frontier by regime. Mean output-token consumption per episode (horizontal axis) against [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Regime-level relationship between the mean final process reward and the mean observed normalized [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

read the original abstract

We introduce GRACE-DS, a Guarded Reward-guided Agent Correction Environment in Data Science for pre-deployment evaluation of LLM-powered AutoML agents. GRACE-DS is a set of evaluation metrics in an isolated environment that can be applied to tabular ML tasks specific to a particular organization. It exposes agents to realistic workflow stages, from planning and data inspection through feature engineering, model development, validation, and code repair to final submission, while hidden executable validators measure not only final predictive performance but also leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The strongest structured regime, flexible iterative interaction (our approach), achieves higher end-to-end normalized hidden-test quality than single-shot generation, unstructured interaction, and restart-based baselines, while also improving protocol-valid completion. Validated across more than 7,000 episodes, these results establish GRACE-DS as a robust platform for assessing the capacity of LLM-based AutoML agents to execute machine learning workflows under production-like conditions and in accordance with organization-specific requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRACE-DS names a new evaluation environment for LLM AutoML agents on tabular tasks with hidden validators, but the abstract leaves the core claims on performance and validator robustness uncheckable.

read the letter

GRACE-DS is a new named environment for testing LLM-powered AutoML agents on tabular tasks under production-like constraints. It defines workflow stages from planning through feature engineering, modeling, validation, and repair, plus hidden executable validators that score leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The abstract reports that their flexible iterative interaction regime beats single-shot generation, unstructured interaction, and restart baselines on normalized hidden-test quality and protocol-valid completion, across more than 7000 episodes.

The concrete addition is the packaged combination of those stages with hidden validators tied to organization-specific rules. That setup gives a structured way to run end-to-end agent evaluations beyond simple accuracy metrics, and the episode count is large enough to be worth noticing.

The soft spots are straightforward. The abstract supplies no information on how tasks were selected, how the validators are coded, what statistical tests were used, or any controls for confounding. Without those pieces the reported superiority cannot be assessed. The stress-test point about validator robustness is on target: nothing shows that the hidden checks cannot be gamed or that they map to actual production requirements rather than artifacts of the metric design.

This is for researchers working on LLM agent evaluation in the AutoML subfield, particularly those already focused on tabular data and constrained workflows. A reader in that area could extract useful workflow and metric ideas even if the results stay provisional.

The paper deserves a serious referee once the full text is available, because the evaluation framework itself is a concrete artifact that can be examined. I would send it to peer review rather than desk reject.

Referee Report

2 major / 0 minor

Summary. The paper introduces GRACE-DS, a guarded reward-guided agent correction environment for pre-deployment evaluation of LLM-powered AutoML agents on tabular ML tasks. It exposes agents to workflow stages including planning, data inspection, feature engineering, model development, validation, code repair, and submission, with hidden executable validators assessing leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. The central empirical claim is that the flexible iterative interaction regime outperforms single-shot generation, unstructured interaction, and restart-based baselines on end-to-end normalized hidden-test quality and protocol-valid completion, with results validated across more than 7,000 episodes.

Significance. If the hidden validators are shown to be robust, ungameable, and aligned with real production constraints, GRACE-DS could provide a valuable standardized platform for assessing LLM-based AutoML agents under organization-specific conditions. The scale of evaluation (over 7,000 episodes) is a positive aspect. However, the absence of implementation details on validators, task selection, and statistical controls substantially limits the ability to determine whether the reported superiority reflects genuine capability gains or metric artifacts.

major comments (2)

[Abstract] Abstract: The abstract states comparative results and episode count but supplies no information on task selection, validator implementation, statistical testing, or controls for confounding factors; the central claim cannot be assessed from the provided text.
[Abstract] Abstract (and by extension the evaluation framework): The claim that flexible iterative interaction outperforms baselines rests on hidden executable validators correctly scoring leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. No implementation details, adversarial robustness checks, or mapping to real-world organizational constraints are supplied, so reported superiority could be an artifact of metric design.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the abstract and evaluation framework. We address each major comment below and note planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract states comparative results and episode count but supplies no information on task selection, validator implementation, statistical testing, or controls for confounding factors; the central claim cannot be assessed from the provided text.

Authors: We agree the abstract is highly condensed to meet length limits and omits these specifics. The full manuscript details task selection from organization-specific tabular datasets (Section 3.1), validator implementation via executable sandbox checks (Section 3.2), and statistical testing with controls for episode-level variance and baseline comparisons (Section 4.3). We will revise the abstract to briefly reference organization-specific tasks, hidden validators for leakage/reproducibility/protocol criteria, and statistical validation across episodes. revision: yes
Referee: [Abstract] Abstract (and by extension the evaluation framework): The claim that flexible iterative interaction outperforms baselines rests on hidden executable validators correctly scoring leakage avoidance, reproducibility, protocol validity, correction behavior, and reward alignment. No implementation details, adversarial robustness checks, or mapping to real-world organizational constraints are supplied, so reported superiority could be an artifact of metric design.

Authors: Implementation details for the validators (executable checks for each criterion in the guarded environment) appear in Section 3.2 of the manuscript. We did not conduct explicit adversarial robustness tests or provide fine-grained mappings to individual organizational constraints beyond the described production-like workflow. This is a genuine limitation that could affect interpretation of the results. We will add a limitations subsection discussing metric design risks and outline future robustness evaluations while retaining the 7,000+ episode empirical comparison. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical comparison with no derivations or self-referential reductions

full rationale

The paper describes an empirical evaluation framework (GRACE-DS) that compares interaction regimes on hidden-test metrics across >7000 episodes. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text or abstract. The central claim rests on experimental outcomes rather than any reduction to inputs by construction. The noted weakness in validator robustness is an assumption-validity issue, not a circularity issue per the analysis rules.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The environment itself functions as the central contribution but its internal validators and task definitions are not detailed.

pith-pipeline@v0.9.1-grok · 5723 in / 1079 out tokens · 30850 ms · 2026-06-27T03:39:17.994902+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 8 canonical work pages · 1 internal anchor

[1]

Red Teaming Language Models with Language Models

Ethan Perez and Saffron Huang and H. Francis Song and Trevor Cai and Roman Ring and John Aslanides and Amelia Glaese and Nat McAleese and Geoffrey Irving , title =. CoRR , volume =. 2022 , url =. 2202.03286 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

2023 , articleno =

Lai, Yuhang and Li, Chengxi and Wang, Yiming and Zhang, Tianyi and Zhong, Ruiqi and Zettlemoyer, Luke and Yih, Wen-tau and Fried, Daniel and Wang, Sida and Yu, Tao , booktitle =. 2023 , articleno =

2023
[3]

2025 , url =

Liqiang Jing and Zhehui Huang and Xiaoyang Wang and Wenlin Yao and Wenhao Yu and Kaixin Ma and Hongming Zhang and Xinya Du and Dong Yu , booktitle =. 2025 , url =

2025
[4]

2024 , url =

Huang, Yiming and Luo, Jianwen and Yu, Yan and Zhang, Yitong and Lei, Fangyu and Wei, Yifan and He, Shizhu and Huang, Lifu and Liu, Xiao and Zhao, Jun and Liu, Kang , booktitle =. 2024 , url =

2024
[5]

2502.13897 , archivePrefix=

Dan Zhang and Sining Zhoubian and Min Cai and Fengzu Li and Lekang Yang and Wei Wang and Tianjiao Dong and Ziniu Hu and Jie Tang and Yisong Yue , year=. 2502.13897 , archivePrefix=

work page arXiv
[6]

2025 , url=

Jun Shern Chan and Neil Chowdhury and Oliver Jaffe and James Aung and Dane Sherburn and Evan Mays and Giulio Starace and Kevin Liu and Leon Maksin and Tejal Patwardhan and Aleksander Madry and Lilian Weng , booktitle=. 2025 , url=

2025
[7]

2603.05764 , archivePrefix=

Mykola Pinchuk , year=. 2603.05764 , archivePrefix=

work page arXiv
[8]

2026 , url=

Rushi Qiang and Yuchen Zhuang and Yinghao Li and Dingu Sagar V K and Rongzhi Zhang and ChangHao Li and Ian Shu-Hei Wong and Sherry Yang and Percy Liang and Chao Zhang and Bo Dai , booktitle=. 2026 , url=

2026
[9]

2025 , url=

Deepak Nathani and Lovish Madaan and Nicholas Roberts and Nikolay Bashlykov and Ajay Menon and Vincent Moens and Mikhail Plekhanov and Amar Budhiraja and Despoina Magka and Vladislav Vorotilov and Gaurav Chaurasia and Dieuwke Hupkes and Ricardo Silveira Cabral and Tatiana Shavrina and Jakob Nicolaus Foerster and Yoram Bachrach and William Yang Wang and Ro...

2025
[10]

2601.16344 , archivePrefix=

Fan Nie and Junlin Wang and Harper Hua and Federico Bianchi and Yongchan Kwon and Zhenting Qi and Owen Queen and Shang Zhu and James Zou , year=. 2601.16344 , archivePrefix=

work page arXiv
[11]

2024 , url =

Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure , booktitle =. 2024 , url =

2024
[12]

2024 , articleno =

Hu, Xueyu and Zhao, Ziyu and Wei, Shuang and Chai, Ziwei and Ma, Qianli and Wang, Guoyin and Wang, Xuwu and Su, Jing and Xu, Jingjing and Zhu, Ming and Cheng, Yao and Yuan, Jianbo and Li, Jiwei and Kuang, Kun and Yang, Yang and Yang, Hongxia and Wu, Fei , booktitle =. 2024 , articleno =

2024
[13]

2506.23719 , archivePrefix=

Alex Egg and Martin Iglesias Goyanes and Friso Kingma and Andreu Mora and Leandro von Werra and Thomas Wolf , year=. 2506.23719 , archivePrefix=

work page arXiv
[14]

Jordan , year=

Hanyu Li and Haoyu Liu and Tingyu Zhu and Tianyu Guo and Zeyu Zheng and Xiaotie Deng and Michael I. Jordan , year=. 2505.18223 , archivePrefix=

work page arXiv
[15]

and Zhu, Lanyi and Merrill, Mike A and Heer, Jeffrey and Althoff, Tim , booktitle =

Gu, Ken and Shang, Ruoxi and Jiang, Ruien and Kuang, Keying and Lin, Richard-John and Lyu, Donghe and Mao, Yue and Pan, Youran and Wu, Teng and Yu, Jiaqian and Zhang, Yikun and Zhang, Tianmai M. and Zhu, Lanyi and Merrill, Mike A and Heer, Jeffrey and Althoff, Tim , booktitle =. 2024 , url =

2024
[16]

Baker and Benjamin Burns and Daniel Adu-Ampratwum and Xuhui Huang and Xia Ning and Song Gao and Yu Su and Huan Sun , booktitle=

Ziru Chen and Shijie Chen and Yuting Ning and Qianheng Zhang and Boshi Wang and Botao Yu and Yifei Li and Zeyi Liao and Chen Wei and Zitong Lu and Vishal Dey and Mingyi Xue and Frazier N. Baker and Benjamin Burns and Daniel Adu-Ampratwum and Xuhui Huang and Xia Ning and Song Gao and Yu Su and Huan Sun , booktitle=. 2025 , url=

2025
[17]

2025 , url=

Bodhisattwa Prasad Majumder and Harshit Surana and Dhruv Agarwal and Bhavana Dalvi Mishra and Abhijeetsingh Meena and Aryan Prakhar and Tirth Vora and Tushar Khot and Ashish Sabharwal and Peter Clark , booktitle=. 2025 , url=

2025
[18]

Gaurav Sahu and Abhay Puri and Juan A. Rodriguez and Amirhossein Abaskohi and Mohammad Chegini and Alexandre Drouin and Perouz Taslakian and Valentina Zantedeschi and Alexandre Lacoste and David Vazquez and Nicolas Chapados and Christopher Pal and Sai Rajeswar and Issam H. Laradji , booktitle=. 2025 , url=

2025
[19]

Christine Ye and Sihan Yuan and Suchetha Cooray and Steven Dillmann and Ian L. V. Roque and Dalya Baron and Philipp Frank and Sergio Martin-Alvarez and Nolan Koblischke and Frank J Qu and Diyi Yang and Risa Wechsler and Ioana Ciuc. 2026 , url=

2026
[20]

Pieter Gijsbers and Marcos L. P. Bueno and Stefan Coors and Erin LeDell and S. Journal of Machine Learning Research , year =
[21]

arXiv preprint arXiv:2402.18679 , year=

Sirui Hong and Yizhang Lin and Bang Liu and Bangbang Liu and Binhao Wu and Ceyao Zhang and Chenxing Wei and Danyang Li and Jiaqi Chen and Jiayi Zhang and Jinlin Wang and Li Zhang and Lingyao Zhang and Min Yang and Mingchen Zhuge and Taicheng Guo and Tuo Zhou and Wei Tao and Xiangru Tang and Xiangtao Lu and Xiawu Zheng and Xinbing Liang and Yaying Fei and ...

work page arXiv
[22]

2025 , url =

Jiang, Zhengyao and Schmidt, Dominik and Srikanth, Dhruv and Xu, Dixing and Kaplan, Ian and Jacenko, Deniss and Wu, Yuxiang , eprint=. 2025 , url =

2025
[23]

2025 , url=

Ivan Rubachev and Nikolay Kartashev and Yury Gorishniy and Artem Babenko , booktitle=. 2025 , url=

2025
[24]

Scikit-learn: Machine Learning in

Fabian Pedregosa and Ga. Scikit-learn: Machine Learning in. Journal of Machine Learning Research , year =
[25]

Machine Learning Operations (

Dominik Kreuzberger and Niklas Kühl and Sebastian Hirschl , year=. Machine Learning Operations (. 2205.02302 , archivePrefix=

work page arXiv
[26]

2024 , howpublished =

Huu Tiep, Nguyen , title =. 2024 , howpublished =

2024
[27]

Malware Dataset Generation and Evaluation , year=

Borah, Parthajit and Bhattacharyya, DK and Kalita, JK , booktitle=. Malware Dataset Generation and Evaluation , year=
[28]

2021 , howpublished =

Alex Teboul and CDC , title =. 2021 , howpublished =

2021
[29]

2026 , howpublished =

Aleksandr Tsymbalov and Danis Zaripov and Artem Epifanov and Anastasiya Palienko , title =. 2026 , howpublished =

2026

[1] [1]

Red Teaming Language Models with Language Models

Ethan Perez and Saffron Huang and H. Francis Song and Trevor Cai and Roman Ring and John Aslanides and Amelia Glaese and Nat McAleese and Geoffrey Irving , title =. CoRR , volume =. 2022 , url =. 2202.03286 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

2023 , articleno =

Lai, Yuhang and Li, Chengxi and Wang, Yiming and Zhang, Tianyi and Zhong, Ruiqi and Zettlemoyer, Luke and Yih, Wen-tau and Fried, Daniel and Wang, Sida and Yu, Tao , booktitle =. 2023 , articleno =

2023

[3] [3]

2025 , url =

Liqiang Jing and Zhehui Huang and Xiaoyang Wang and Wenlin Yao and Wenhao Yu and Kaixin Ma and Hongming Zhang and Xinya Du and Dong Yu , booktitle =. 2025 , url =

2025

[4] [4]

2024 , url =

Huang, Yiming and Luo, Jianwen and Yu, Yan and Zhang, Yitong and Lei, Fangyu and Wei, Yifan and He, Shizhu and Huang, Lifu and Liu, Xiao and Zhao, Jun and Liu, Kang , booktitle =. 2024 , url =

2024

[5] [5]

2502.13897 , archivePrefix=

Dan Zhang and Sining Zhoubian and Min Cai and Fengzu Li and Lekang Yang and Wei Wang and Tianjiao Dong and Ziniu Hu and Jie Tang and Yisong Yue , year=. 2502.13897 , archivePrefix=

work page arXiv

[6] [6]

2025 , url=

Jun Shern Chan and Neil Chowdhury and Oliver Jaffe and James Aung and Dane Sherburn and Evan Mays and Giulio Starace and Kevin Liu and Leon Maksin and Tejal Patwardhan and Aleksander Madry and Lilian Weng , booktitle=. 2025 , url=

2025

[7] [7]

2603.05764 , archivePrefix=

Mykola Pinchuk , year=. 2603.05764 , archivePrefix=

work page arXiv

[8] [8]

2026 , url=

Rushi Qiang and Yuchen Zhuang and Yinghao Li and Dingu Sagar V K and Rongzhi Zhang and ChangHao Li and Ian Shu-Hei Wong and Sherry Yang and Percy Liang and Chao Zhang and Bo Dai , booktitle=. 2026 , url=

2026

[9] [9]

2025 , url=

Deepak Nathani and Lovish Madaan and Nicholas Roberts and Nikolay Bashlykov and Ajay Menon and Vincent Moens and Mikhail Plekhanov and Amar Budhiraja and Despoina Magka and Vladislav Vorotilov and Gaurav Chaurasia and Dieuwke Hupkes and Ricardo Silveira Cabral and Tatiana Shavrina and Jakob Nicolaus Foerster and Yoram Bachrach and William Yang Wang and Ro...

2025

[10] [10]

2601.16344 , archivePrefix=

Fan Nie and Junlin Wang and Harper Hua and Federico Bianchi and Yongchan Kwon and Zhenting Qi and Owen Queen and Shang Zhu and James Zou , year=. 2601.16344 , archivePrefix=

work page arXiv

[11] [11]

2024 , url =

Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure , booktitle =. 2024 , url =

2024

[12] [12]

2024 , articleno =

Hu, Xueyu and Zhao, Ziyu and Wei, Shuang and Chai, Ziwei and Ma, Qianli and Wang, Guoyin and Wang, Xuwu and Su, Jing and Xu, Jingjing and Zhu, Ming and Cheng, Yao and Yuan, Jianbo and Li, Jiwei and Kuang, Kun and Yang, Yang and Yang, Hongxia and Wu, Fei , booktitle =. 2024 , articleno =

2024

[13] [13]

2506.23719 , archivePrefix=

Alex Egg and Martin Iglesias Goyanes and Friso Kingma and Andreu Mora and Leandro von Werra and Thomas Wolf , year=. 2506.23719 , archivePrefix=

work page arXiv

[14] [14]

Jordan , year=

Hanyu Li and Haoyu Liu and Tingyu Zhu and Tianyu Guo and Zeyu Zheng and Xiaotie Deng and Michael I. Jordan , year=. 2505.18223 , archivePrefix=

work page arXiv

[15] [15]

and Zhu, Lanyi and Merrill, Mike A and Heer, Jeffrey and Althoff, Tim , booktitle =

Gu, Ken and Shang, Ruoxi and Jiang, Ruien and Kuang, Keying and Lin, Richard-John and Lyu, Donghe and Mao, Yue and Pan, Youran and Wu, Teng and Yu, Jiaqian and Zhang, Yikun and Zhang, Tianmai M. and Zhu, Lanyi and Merrill, Mike A and Heer, Jeffrey and Althoff, Tim , booktitle =. 2024 , url =

2024

[16] [16]

Baker and Benjamin Burns and Daniel Adu-Ampratwum and Xuhui Huang and Xia Ning and Song Gao and Yu Su and Huan Sun , booktitle=

Ziru Chen and Shijie Chen and Yuting Ning and Qianheng Zhang and Boshi Wang and Botao Yu and Yifei Li and Zeyi Liao and Chen Wei and Zitong Lu and Vishal Dey and Mingyi Xue and Frazier N. Baker and Benjamin Burns and Daniel Adu-Ampratwum and Xuhui Huang and Xia Ning and Song Gao and Yu Su and Huan Sun , booktitle=. 2025 , url=

2025

[17] [17]

2025 , url=

Bodhisattwa Prasad Majumder and Harshit Surana and Dhruv Agarwal and Bhavana Dalvi Mishra and Abhijeetsingh Meena and Aryan Prakhar and Tirth Vora and Tushar Khot and Ashish Sabharwal and Peter Clark , booktitle=. 2025 , url=

2025

[18] [18]

Gaurav Sahu and Abhay Puri and Juan A. Rodriguez and Amirhossein Abaskohi and Mohammad Chegini and Alexandre Drouin and Perouz Taslakian and Valentina Zantedeschi and Alexandre Lacoste and David Vazquez and Nicolas Chapados and Christopher Pal and Sai Rajeswar and Issam H. Laradji , booktitle=. 2025 , url=

2025

[19] [19]

Christine Ye and Sihan Yuan and Suchetha Cooray and Steven Dillmann and Ian L. V. Roque and Dalya Baron and Philipp Frank and Sergio Martin-Alvarez and Nolan Koblischke and Frank J Qu and Diyi Yang and Risa Wechsler and Ioana Ciuc. 2026 , url=

2026

[20] [20]

Pieter Gijsbers and Marcos L. P. Bueno and Stefan Coors and Erin LeDell and S. Journal of Machine Learning Research , year =

[21] [21]

arXiv preprint arXiv:2402.18679 , year=

Sirui Hong and Yizhang Lin and Bang Liu and Bangbang Liu and Binhao Wu and Ceyao Zhang and Chenxing Wei and Danyang Li and Jiaqi Chen and Jiayi Zhang and Jinlin Wang and Li Zhang and Lingyao Zhang and Min Yang and Mingchen Zhuge and Taicheng Guo and Tuo Zhou and Wei Tao and Xiangru Tang and Xiangtao Lu and Xiawu Zheng and Xinbing Liang and Yaying Fei and ...

work page arXiv

[22] [22]

2025 , url =

Jiang, Zhengyao and Schmidt, Dominik and Srikanth, Dhruv and Xu, Dixing and Kaplan, Ian and Jacenko, Deniss and Wu, Yuxiang , eprint=. 2025 , url =

2025

[23] [23]

2025 , url=

Ivan Rubachev and Nikolay Kartashev and Yury Gorishniy and Artem Babenko , booktitle=. 2025 , url=

2025

[24] [24]

Scikit-learn: Machine Learning in

Fabian Pedregosa and Ga. Scikit-learn: Machine Learning in. Journal of Machine Learning Research , year =

[25] [25]

Machine Learning Operations (

Dominik Kreuzberger and Niklas Kühl and Sebastian Hirschl , year=. Machine Learning Operations (. 2205.02302 , archivePrefix=

work page arXiv

[26] [26]

2024 , howpublished =

Huu Tiep, Nguyen , title =. 2024 , howpublished =

2024

[27] [27]

Malware Dataset Generation and Evaluation , year=

Borah, Parthajit and Bhattacharyya, DK and Kalita, JK , booktitle=. Malware Dataset Generation and Evaluation , year=

[28] [28]

2021 , howpublished =

Alex Teboul and CDC , title =. 2021 , howpublished =

2021

[29] [29]

2026 , howpublished =

Aleksandr Tsymbalov and Danis Zaripov and Artem Epifanov and Anastasiya Palienko , title =. 2026 , howpublished =

2026