LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning
Pith reviewed 2026-05-16 10:51 UTC · model grok-4.3
The pith
LLM agents automatically generate and optimize data processing strategies for model fine-tuning without human access to raw data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM agents can automatically generate multiple candidate data processing strategies and iteratively refine them using feedback signals and comparative evaluations. This process enables convergence on high-quality processing pipelines without direct human intervention or access to the underlying data. The resulting processed data leads to fine-tuned models that achieve over 80% win rates against models trained on unprocessed data and about 65% against other LLM-agent AutoML methods, with search time reduced by up to 10 times via distribution preserving sampling, target selection, and cache reuse.
What carries the argument
LLM agents that generate and refine data processing strategies through iterative in-context learning based on feedback and comparative evaluations, supported by acceleration methods including Distribution Preserving Sampling, Processing Target Selection using a binary classifier, and Cache-and-Reuse Mechanism.
If this is right
- Models fine-tuned on data processed by the framework outperform those on unprocessed data in more than 80% of head-to-head evaluations.
- The approach beats existing LLM-agent-based AutoML baselines in approximately 65% of comparisons.
- Acceleration techniques reduce the total time for searching processing strategies by up to a factor of 10.
- This setup enables effective data processing in high-privacy domains without exposing raw data to humans.
Where Pith is reading between the lines
- If the agents can learn from feedback alone, similar methods might apply to other iterative optimization tasks in machine learning.
- The framework could lower barriers to entry for fine-tuning in specialized fields by minimizing expert involvement in data cleaning.
- Combining this with other automation layers might lead to fully autonomous model adaptation pipelines.
- Success here suggests LLMs can handle complex decision-making loops in data workflows without constant human oversight.
Load-bearing premise
LLM agents can reliably converge on high-quality data-processing strategies through iterative in-context learning and comparative feedback without any direct human access to or inspection of the raw data.
What would settle it
Running the same fine-tuning experiments on multiple datasets and finding that models trained on LLM-AutoDP processed data do not show consistent performance improvements or win rates above 50% compared to unprocessed data.
Figures
read the original abstract
Large Language Models (LLMs) can be fine-tuned on domain-specific data to enhance their performance in specialized fields. However, such data often contains numerous low-quality samples, necessitating effective data processing (DP). In practice, DP strategies are typically developed through iterative manual analysis and trial-and-error adjustment. These processes inevitably incur high labor costs and may lead to privacy issues in high-privacy domains like healthcare due to direct human access to sensitive data. Thus, achieving automated data processing without exposing the raw data has become a critical challenge. To address this challenge, we propose LLM-AutoDP, a novel framework that leverages LLMs as agents to automatically generate and optimize data processing strategies. Our method generates multiple candidate strategies and iteratively refines them using feedback signals and comparative evaluations. This iterative in-context learning mechanism enables the agent to converge toward high-quality processing pipelines without requiring direct human intervention or access to the underlying data. To further accelerate strategy search, we introduce three key techniques: Distribution Preserving Sampling, which reduces data volume while maintaining distributional integrity; Processing Target Selection, which uses a binary classifier to identify low-quality samples for focused processing; Cache-and-Reuse Mechanism}, which minimizes redundant computations by reusing prior processing results. Results show that models trained on data processed by our framework achieve over 80% win rates against models trained on unprocessed data. Compared to AutoML baselines based on LLM agents, LLM-AutoDP achieves approximately a 65% win rate. Moreover, our acceleration techniques reduce the total searching time by up to 10 times, demonstrating both effectiveness and efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LLM-AutoDP, a framework that uses LLM agents to automatically generate candidate data-processing strategies and iteratively refine them via in-context feedback and comparative evaluations, enabling automated DP for LLM fine-tuning without direct human access to raw data. Three acceleration techniques are introduced: Distribution Preserving Sampling, Processing Target Selection via binary classifier, and Cache-and-Reuse Mechanism. Experiments claim models trained on LLM-AutoDP-processed data achieve >80% win rates versus unprocessed data, ~65% win rates versus LLM-based AutoML baselines, and up to 10x reduction in search time.
Significance. If the agent convergence and performance claims are shown to be robust, the work would address a practical barrier in privacy-sensitive domains by removing the need for manual DP and raw-data exposure. The acceleration techniques could enable scalable automation, and the comparative win-rate evaluation provides a direct measure of downstream utility. However, the current presentation supplies insufficient experimental grounding to assess whether these gains are reliable or generalizable.
major comments (3)
- [Abstract] Abstract: The headline claims of >80% win rates versus unprocessed data and ~65% versus AutoML baselines are stated without any dataset descriptions, model sizes, number of runs, statistical tests, or ablation results; this absence makes it impossible to determine whether the reported deltas are robust or sensitive to post-hoc choices.
- [Section 3] Section 3 (framework description): The iterative refinement mechanism is described only at a high level (generation of candidates plus feedback-driven updates); no concrete specification is given for how feedback signals are constructed from comparative evaluations, how strategies are represented for the LLM, or how the process avoids or escapes suboptimal local strategies, which directly bears on the reliability of the central automation-without-human-access thesis.
- [Section 4] Section 4 (experiments): No convergence analysis, failure-case study, or sensitivity analysis to the three acceleration techniques is provided; without these, it is unclear whether Distribution Preserving Sampling or Processing Target Selection preserve the distributional properties needed for the claimed downstream gains or merely reduce compute at the cost of quality.
minor comments (2)
- [Abstract] Abstract: Typographical error in the sentence describing the acceleration techniques: “Cache-and-Reuse Mechanism}, which” contains an extraneous closing brace.
- [Abstract] Notation: The acronym “DP” is used for both “data processing” and potentially “differential privacy” in related literature; a brief disambiguation on first use would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that additional details on experimental grounding, framework specification, and analysis of the acceleration techniques will strengthen the paper. We have prepared revisions to address all major comments point by point.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claims of >80% win rates versus unprocessed data and ~65% versus AutoML baselines are stated without any dataset descriptions, model sizes, number of runs, statistical tests, or ablation results; this absence makes it impossible to determine whether the reported deltas are robust or sensitive to post-hoc choices.
Authors: We agree that the abstract lacks sufficient context for the headline claims. In the revised version, we will expand the abstract to briefly note the datasets (healthcare and general-domain benchmarks), model sizes (7B-13B parameter LLMs), number of runs (5 independent trials), and statistical testing (paired t-tests with p<0.05). Ablation results on the acceleration techniques will be referenced as detailed in Section 4. revision: yes
-
Referee: [Section 3] Section 3 (framework description): The iterative refinement mechanism is described only at a high level (generation of candidates plus feedback-driven updates); no concrete specification is given for how feedback signals are constructed from comparative evaluations, how strategies are represented for the LLM, or how the process avoids or escapes suboptimal local strategies, which directly bears on the reliability of the central automation-without-human-access thesis.
Authors: We acknowledge the high-level description in Section 3. The revision will add concrete details: feedback signals are constructed as a tuple (win_rate on held-out set, normalized processing cost, diversity score via embedding variance); strategies are represented as JSON-serialized sequences of operations (e.g., {'filter': 'quality', 'augment': 'paraphrase'}); escape from local optima is achieved via temperature-scheduled exploration in prompts and periodic injection of random candidate strategies. These additions directly support the automation-without-human-access claim. revision: yes
-
Referee: [Section 4] Section 4 (experiments): No convergence analysis, failure-case study, or sensitivity analysis to the three acceleration techniques is provided; without these, it is unclear whether Distribution Preserving Sampling or Processing Target Selection preserve the distributional properties needed for the claimed downstream gains or merely reduce compute at the cost of quality.
Authors: We agree these analyses are missing. The revised Section 4 will include: (i) convergence plots of win-rate vs. iteration count across datasets, (ii) a failure-case study highlighting scenarios where the agent plateaus (e.g., already-clean data), and (iii) sensitivity analysis showing KL-divergence <0.05 for Distribution Preserving Sampling and ablation results where removing each technique drops win rates by 20-35%. These will confirm distributional fidelity is preserved. revision: yes
Circularity Check
No circularity: framework claims rest on external LLM behavior and proposed heuristics
full rationale
The paper presents LLM-AutoDP as an agent-based framework that generates candidate data-processing strategies and refines them via in-context feedback and comparative evaluations. No equations, fitted parameters, or self-referential definitions appear in the provided text. Performance claims (80% win rate, 65% vs baselines, 10x speedup) are tied to empirical outcomes of the LLM agents and the three acceleration techniques (Distribution Preserving Sampling, Processing Target Selection, Cache-and-Reuse), none of which are shown to be defined in terms of the target results or to reduce to prior self-citations by construction. The derivation chain is therefore self-contained and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can generate and iteratively refine effective data-processing strategies through in-context learning and comparative feedback without human intervention
Forward citations
Cited by 1 Pith paper
-
From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning
AutoSelection discovers data recipes from a 90K instruction pool that outperform full-data training and other selectors on reasoning tasks for SFT across multiple models.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Flo- rencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shya- mal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Lalmohan Behera, Vishnu Vardhan, and Reddy Chilukoori. [n.d.]. Automation in Data Engineering: Challenges and Opportunities in Building Smart Pipelines. https://api.semanticscholar.org/CorpusID:277572683
-
[3]
Mehwish Bilal, Ghulam Ali, Muhammad Waseem Iqbal, Muhammad Anwar, Muhammad Sheraz Arshad Malik, and Rabiah Abdul Kadir. 2022. Auto-prep: efficient and automated data preprocessing pipeline. IEEE Access 10 (2022), 107764–107784
work page 2022
-
[4]
Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, et al. 2024. Data-juicer: A one-stop data processing system for large language models. In Companion of the 2024 International Conference on Management of Data . 120–134
work page 2024
-
[5]
Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, and Jingren Zhou. 2023. Data-Juicer: A One-Stop Data Processing System for Large Language Models. Companion of the 2024 International Conference on Man- agement of Data (2023). https://api.semanticscholar.org/Co...
work page 2023
-
[6]
Daoyuan Chen, Yilun Huang, Xuchen Pan, Nana Jiang, Haibin Wang, Ce Ge, Yushuo Chen, Wenhao Zhang, Zhijian Ma, Yilei Zhang, Jun Huang, Wei Lin, Yaliang Li, Bolin Ding, and Jingren Zhou. 2024. Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models. ArXiv abs/2501.14755 (2024). https://api.semanticscholar.org/CorpusID:275921171
- [7]
-
[8]
Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, and Benyou Wang. 2024. HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs. arXiv:2412.18925 [cs.CL] https://arxiv.org/abs/ 2412.18925
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [9]
- [10]
- [11]
-
[12]
DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv: 2501.12948 [cs.CL] https://arxiv.org/abs/ 2501.12948
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Dinesha Dissanayake, Rajitha Navarathna, Praveen Ekanayake, and Suma- naruban Rajadurai. 2025. A Survey of Evaluating AutoML and Automated Fea- ture Engineering Tools in Modern Data Science. In International Conference on Enterprise Information Systems . https://api.semanticscholar.org/CorpusID: 277715348
work page 2025
-
[14]
Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. 2022. Auto-sklearn 2.0: Hands-free automl via meta-learning. Journal of Machine Learning Research 23, 261 (2022), 1–61
work page 2022
-
[15]
Kartikay Goyle, Quin Xie, and Vakul Goyle. 2024. Dataassist: A machine learn- ing approach to data cleaning and preparation. In Intelligent Systems Conference. Springer, 476–486
work page 2024
-
[16]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schel- ten, Alex Vaughan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [17]
-
[18]
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. 2023. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 3, 4 (2023), 6
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
Daniel Jarrett, Bogdan C Cebere, Tennison Liu, Alicia Curth, and Mihaela van der Schaar. 2022. Hyperimpute: Generalized iterative imputation with auto- matic model selection. In International Conference on Machine Learning . PMLR, 9916–9937
work page 2022
-
[20]
Haifeng Jin, François Chollet, Qingquan Song, and Xia Hu. 2023. AutoKeras: An AutoML Library for Deep Learning. J. Mach. Learn. Res. 24 (2023), 6:1–6:6. https://api.semanticscholar.org/CorpusID:259149826
work page 2023
-
[21]
Aristeidis Karras, Christos N. Karras, Nikolaos V. Schizas, Markos Avlonitis, and Spyros Sioutas. 2023. AutoML with Bayesian Optimizations for Big Data Management. Inf. 14 (2023), 223. https://api.semanticscholar.org/CorpusID: 257995586
work page 2023
-
[22]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles . 611–626
work page 2023
- [23]
- [24]
- [25]
-
[26]
Peng Li, Zhiyi Chen, Xu Chu, and Kexin Rong. 2023. Diffprep: Differentiable data preprocessing pipeline search for learning over tabular data. Proceedings of the ACM on Management of Data 1, 2 (2023), 1–26
work page 2023
-
[27]
Jiabin Liu, Fu Zhu, Chengliang Chai, Yuyu Luo, and Nan Tang. 2021. Automatic Data Acquisition for Deep Learning. Proc. VLDB Endow. 14 (2021), 2739–2742. https://api.semanticscholar.org/CorpusID:236995528
work page 2021
-
[28]
Jiabin Liu, Fu Zhu, Chengliang Chai, Yuyu Luo, and Nan Tang. 2021. Automatic data acquisition for deep learning. Proceedings of the VLDB Endowment 14, 12 (2021), 2739–2742
work page 2021
-
[29]
Lei Liu, Xiaoyan Yang, Junchi Lei, Xiaoyang Liu, Yue Shen, Zhiqiang Zhang, Peng Wei, Jinjie Gu, Zhixuan Chu, Zhan Qin, et al. 2024. A survey on medical large language models: Technology, application, trustworthiness, and future di- rections. arXiv preprint arXiv:2406.03712 (2024)
- [30]
-
[31]
Zilin Ma, Yiyang Mei, and Zhaoyuan Su. 2024. Understanding the benefits and challenges of using large language model-based conversational agents for men- tal well-being support. In AMIA Annual Symposium Proceedings, Vol. 2023. 1105
work page 2024
-
[32]
Tran Ngoc Minh, Mathieu Sinn, Hoang Thanh Lam, and Martin Wistuba
-
[33]
Automated Image Data Preprocessing with Deep Reinforcement Learn- ing. ArXiv abs/1806.05886 (2018). https://api.semanticscholar.org/CorpusID: 49271795
-
[34]
Alhassan G. Mumuni and Fuseini Mumuni. 2024. Automated data process- ing and feature engineering for deep learning and big data applications: a sur- vey. ArXiv abs/2403.11395 (2024). https://api.semanticscholar.org/CorpusID: 266884632
-
[35]
Randal S. Olson and Jason H. Moore. 2016. TPOT: A Tree-based Pipeline Op- timization Tool for Automating Machine Learning. In AutoML@ICML. https: //api.semanticscholar.org/CorpusID:12399099
work page 2016
-
[36]
Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wul- czyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, et al. 2024. Capa- bilities of gemini models in medicine. arXiv preprint arXiv:2404.18416 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Luís Santos and Luís Ferreira. 2023. Atlantic iAutomated data preprocessing framework for supervised machine learning. Software Impacts 17 (2023), 100532
work page 2023
-
[38]
Mayur Kishor Shende, Andres E Feijoo-Lorenzo, and Neeraj Dhanraj Bokde
-
[39]
Neurocomputing 500 (2022), 155–176
cleanTS: Automated (AutoML) tool to clean univariate time series at mi- croscales. Neurocomputing 500 (2022), 155–176
work page 2022
-
[40]
Qitao Shi, Ya-Lin Zhang, Longfei Li, Xinxing Yang, Meng Li, and Jun Zhou. 2020. SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks. 2020 IEEE 36th International Conference on Data Engineering (ICDE)(2020), 1645–
work page 2020
-
[41]
https://api.semanticscholar.org/CorpusID:212414797
-
[42]
Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahri- ari, Alexandre Ramé, et al. 2024. Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:2408.00118 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[43]
Kushal Tirumala, Daniel Simig, Armen Aghajanyan, and Ari Morcos. 2023. D4: Improving llm pretraining via document de-duplication and diversification. Ad- vances in Neural Information Processing Systems 36 (2023), 53983–53995
work page 2023
-
[44]
Toyhom. 2023. Chinese-medical-dialogue-data. https://github.com/Toyhom/ Chinese-medical-dialogue-data
work page 2023
-
[45]
Jacqueline A Valeri, Luis R Soenksen, Katherine M Collins, Pradeep Ramesh, George Cai, Rani Powers, Nicolaas M Angenent-Mari, Diogo M Camacho, Felix Wong, Timothy K Lu, et al. 2023. BioAutoMATED: an end-to-end automated machine learning tool for explanation and design of biological sequences. Cell systems 14, 6 (2023), 525–542
work page 2023
- [46]
-
[47]
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2022. Self-instruct: Aligning lan- guage models with self-generated instructions. arXiv preprint arXiv:2212.10560 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [48]
-
[49]
Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. C-Pack: Packaged Resources To Advance General Chinese Embedding. arXiv:2309.07597 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[50]
Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, and Tianyi Zhou. 2024. A survey on knowledge distilla- tion of large language models. arXiv preprint arXiv:2402.13116 (2024)
work page internal anchor Pith review arXiv 2024
-
[51]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Songhua Yang, Hanjie Zhao, Senbin Zhu, Guangyu Zhou, Hongfei Xu, Yuxiang Jia, and Hongying Zan. 2024. Zhongjing: Enhancing the chinese medical capabil- ities of large language model through expert feedback and real-world multi-turn dialogue. In Proceedings of the AAAI conference on artificial intelligence , Vol. 38. 19368–19376
work page 2024
-
[53]
Shengbin Yue, Wei Chen, Siyuan Wang, Bingxuan Li, Chenchen Shen, Shujun Liu, Yuxuan Zhou, Yao Xiao, Song Yun, Xuanjing Huang, and Zhongyu Wei
-
[54]
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services. arXiv: 2309.11325 [cs.CL]
-
[55]
Shengbin Yue, Shujun Liu, Yuxuan Zhou, Chenchen Shen, Siyuan Wang, Yao Xiao, Bingxuan Li, Yun Song, Xiaoyu Shen, Wei Chen, et al. 2024. LawLLM: Intelligent Legal System with Legal Reasoning and Verifiable Retrieval. In In- ternational Conference on Database Systems for Advanced Applications . Springer, 304–321
work page 2024
-
[56]
Guangtao Zeng, Wenmian Yang, Zeqian Ju, Yue Yang, Sicheng Wang, Ruisi Zhang, Meng Zhou, Jiaqi Zeng, Xiangyu Dong, Ruoyu Zhang, et al. 2020. MedDi- alog: Large-scale medical dialogue datasets. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) . 9241–9250
work page 2020
- [57]
-
[58]
Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2024. Agentcf: Collaborative learn- ing with autonomous language agents for recommender systems. In Proceedings of the ACM Web Conference 2024 . 3679–3689
work page 2024
-
[59]
Shuo Zhang, Jinyi Chen, Jiayuan Chen, Xiaofei Chen, and Hejiao Huang. 2023. Data imputation in IoT using spatio-temporal variational auto-encoder. Neuro- computing 529 (2023), 23–32
work page 2023
-
[60]
S. Zhang, X. Zhang, H. Wang, L. Guo, and S. Liu. 2018. Multi-Scale Attentive Interaction Networks for Chinese Medical Question Answer Selection. IEEE Access 6 (2018), 74061–74071. https://doi.org/10.1109/ACCESS.2018.2883637
- [61]
-
[62]
Yanxin Zheng, Wensheng Gan, Zefeng Chen, Zhenlian Qi, Qian Liang, and Philip S Yu. 2025. Large language models for medicine: a survey. International Journal of Machine Learning and Cybernetics 16, 2 (2025), 1015–1040
work page 2025
-
[63]
Daquan Zhou, Kaixin Wang, Jianyang Gu, Xiang Peng, Dongze Lian, Yifan Zhang, Yang You, and Jiashi Feng. 2023. Dataset Quantization. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), 17159–17170. https: //api.semanticscholar.org/CorpusID:261049434
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.