AIBuildAI: An AI Agent for Automatically Building AI Models
Pith reviewed 2026-05-10 12:45 UTC · model grok-4.3
The pith
A hierarchical AI agent system automatically builds complete models from task descriptions and data, achieving first place on a benchmark of realistic development tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AIBuildAI uses a manager agent to coordinate a designer sub-agent for choosing modeling strategies, a coder sub-agent for writing and debugging code, and a tuner sub-agent for training and performance refinement. Each sub-agent is an LLM-based system that performs multi-step reasoning and tool use. On the MLE-Bench benchmark of diverse real-world tasks, this architecture delivers a 63.1 percent medal rate, the highest among tested methods and comparable to the output of skilled human practitioners.
What carries the argument
Hierarchical agent architecture in which a manager coordinates three specialized LLM agents (designer, coder, tuner) that together execute architecture selection, code implementation, debugging, and optimization.
If this is right
- End-to-end automation becomes feasible for the full AI model development process from specification to deployable artifact.
- Performance on realistic tasks reaches levels previously associated only with experienced human engineers.
- The approach surpasses existing AutoML systems by handling open-ended architecture design and implementation steps.
- AI model creation could become accessible with far less specialized expertise than is currently required.
Where Pith is reading between the lines
- The same coordination pattern might transfer to other multi-stage engineering workflows that currently demand teams of specialists.
- Further reliability gains in the underlying language models could reduce the frequency of failures on harder or less common data modalities.
- Combining the agent with existing code repositories or external APIs might shorten the remaining manual review steps even more.
Load-bearing premise
LLM-based agents can execute long sequences of architecture design, coding, debugging, and performance tuning across different data types without repeated human corrections or breakdowns.
What would settle it
A follow-up evaluation on additional MLE-Bench tasks or similar problems in which AIBuildAI produces no competitive model or requires substantial human fixes to reach working performance.
read the original abstract
AI models underpin modern intelligent systems, driving advances across science, medicine, finance, and technology. Yet developing high-performing AI models remains a labor-intensive process that requires expert practitioners to iteratively design architectures, engineer representations, implement training pipelines and refine approaches through empirical evaluation. Existing AutoML methods partially alleviate this burden but remain limited to narrow aspects such as hyperparameter optimization and model selection within predefined search spaces, leaving the full development lifecycle largely dependent on human expertise. To address this gap, we introduce AIBuildAI, an AI agent that automatically builds AI models from a task description and training data. AIBuildAI adopts a hierarchical agent architecture in which a manager agent coordinates three specialized sub-agents: a designer for modeling strategy, a coder for implementation and debugging, and a tuner for training and performance optimization. Each sub-agent is itself a large language model (LLM) based agent capable of multi-step reasoning and tool use, enabling end-to-end automation of the AI model development process that goes beyond the scope of existing AutoML approaches. We evaluate AIBuildAI on MLE-Bench, a benchmark of realistic Kaggle-style AI development tasks spanning visual, textual, time-series and tabular modalities. AIBuildAI ranks first on MLE-Bench with a medal rate of 63.1%, outperforming all existing baseline methods and matching the capability of highly experienced AI engineers. These results demonstrate that hierarchical agent systems can automate the full AI model development process from task specification to deployable model, suggesting a pathway toward broadly accessible AI development with minimal human intervention.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AIBuildAI, a hierarchical LLM-based agent system in which a manager agent coordinates three specialized sub-agents (designer for modeling strategy, coder for implementation/debugging, and tuner for optimization) to automate the full AI model development pipeline from task description and data to a deployable model. It evaluates the system on MLE-Bench, a collection of realistic Kaggle-style tasks across visual, textual, time-series, and tabular modalities, and claims a first-place ranking with a 63.1% medal rate that outperforms existing AutoML and agent baselines while matching experienced human engineers.
Significance. If the performance claims can be substantiated with complete experimental details, this would constitute a notable advance in automated machine learning. The hierarchical multi-agent design extends beyond conventional AutoML (limited to hyperparameter search within fixed spaces) by attempting end-to-end automation of architecture design, coding, debugging, and tuning. Successful validation would support the broader hypothesis that LLM agents can reliably handle complex, multi-step engineering workflows across modalities with minimal human oversight.
major comments (3)
- [Abstract and Experiments section] Abstract and Experiments section: The headline result of a 63.1% medal rate and first-place ranking on MLE-Bench is presented without any description of the base LLMs powering the manager/designer/coder/tuner agents, the number of tasks attempted versus completed, the number of independent trials per task, retry budgets, failure-handling protocols, or the precise definition of a 'medal' used by the benchmark. These omissions make it impossible to determine whether the reported superiority is attributable to the hierarchical architecture or to unreported implementation choices, directly undermining the central empirical claim.
- [Method section] Method section: The paper asserts that the sub-agents enable 'end-to-end automation ... without human intervention,' yet provides no concrete specification of inter-agent communication protocols, tool-use interfaces, state sharing, or error-recovery mechanisms. Without these details, the weakest assumption—that LLM agents can reliably execute the full multi-step pipeline across diverse modalities—cannot be evaluated, leaving the architectural contribution untestable.
- [Experiments section] Experiments section: The claim that AIBuildAI 'outperforms all existing baseline methods' is unsupported by any description of baseline re-implementations, statistical significance tests, variance across runs, or ablation studies isolating the contribution of the manager or individual sub-agents. This absence prevents assessment of whether the medal-rate advantage is robust or confounded by differences in underlying model capabilities.
minor comments (2)
- [Abstract] The abstract states that results 'match the capability of highly experienced AI engineers' without any quantitative human baseline or side-by-side comparison; this phrasing should be qualified or removed unless supported by data in the full evaluation.
- [Method section] A system diagram or pseudocode illustrating the exact workflow and handoff between manager and sub-agents would improve clarity of the hierarchical architecture.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which identify important gaps in experimental transparency and methodological specification. We agree that addressing these points will strengthen the manuscript's reproducibility and allow for a more rigorous evaluation of the hierarchical agent architecture. We respond to each major comment below and commit to the indicated revisions.
read point-by-point responses
-
Referee: [Abstract and Experiments section] Abstract and Experiments section: The headline result of a 63.1% medal rate and first-place ranking on MLE-Bench is presented without any description of the base LLMs powering the manager/designer/coder/tuner agents, the number of tasks attempted versus completed, the number of independent trials per task, retry budgets, failure-handling protocols, or the precise definition of a 'medal' used by the benchmark. These omissions make it impossible to determine whether the reported superiority is attributable to the hierarchical architecture or to unreported implementation choices, directly undermining the central empirical claim.
Authors: We agree that these details are necessary to substantiate the central empirical claims and to distinguish the contribution of the architecture from implementation specifics. In the revised manuscript we will add a dedicated 'Experimental Setup' subsection that explicitly states the base LLMs used for the manager and each sub-agent, the total number of MLE-Bench tasks attempted and completed, the number of independent trials per task, the retry budgets and failure-handling protocols, and the precise definition of a 'medal' as specified by the benchmark. These additions will be placed in both the Experiments section and referenced from the abstract where appropriate. revision: yes
-
Referee: [Method section] Method section: The paper asserts that the sub-agents enable 'end-to-end automation ... without human intervention,' yet provides no concrete specification of inter-agent communication protocols, tool-use interfaces, state sharing, or error-recovery mechanisms. Without these details, the weakest assumption—that LLM agents can reliably execute the full multi-step pipeline across diverse modalities—cannot be evaluated, leaving the architectural contribution untestable.
Authors: We acknowledge that the current Method section lacks the level of implementation detail required to make the system testable and reproducible. In the revision we will expand the hierarchical architecture description with a new subsection that specifies the inter-agent communication protocols (including message formats and delegation procedures), tool-use interfaces (code execution environment, data loaders, and evaluation tools), state sharing mechanisms (shared workspace and conversation history), and error-recovery mechanisms (retry logic, fallback strategies, and escalation to the manager). These concrete specifications will allow readers to assess the reliability of the end-to-end automation claim. revision: yes
-
Referee: [Experiments section] Experiments section: The claim that AIBuildAI 'outperforms all existing baseline methods' is unsupported by any description of baseline re-implementations, statistical significance tests, variance across runs, or ablation studies isolating the contribution of the manager or individual sub-agents. This absence prevents assessment of whether the medal-rate advantage is robust or confounded by differences in underlying model capabilities.
Authors: We will revise the Experiments section to include detailed descriptions of all baseline methods, specifying whether they were re-implemented from original code or taken from published results and noting any adaptations required for fair comparison. We will also report statistical significance tests, discuss observed variance across runs (accounting for LLM stochasticity), and present ablation studies that isolate the manager agent and each sub-agent. Where full multi-run variance or exhaustive ablations were not performed in the original experiments, we will explicitly note this as a limitation and provide the available partial results. revision: partial
Circularity Check
No circularity; empirical benchmark result stands independent of internal definitions
full rationale
The paper reports an empirical outcome (63.1% medal rate on external MLE-Bench) obtained by running the described hierarchical LLM agent on a fixed public benchmark. No equations, fitted parameters, or first-principles derivations are present; the central claim is a measured performance number on tasks whose success criteria and data are defined outside the paper. No self-citations are invoked to justify uniqueness or to close any logical loop, and the architecture description does not redefine or presuppose the reported metric. The result is therefore self-contained against the external benchmark.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Large language models can perform reliable multi-step reasoning, tool use, code generation, and iterative debugging for AI model development tasks.
- domain assumption The MLE-Bench benchmark tasks and evaluation protocol accurately reflect the capabilities of experienced human AI engineers.
invented entities (1)
-
AIBuildAI hierarchical agent system
no independent evidence
Reference graph
Works this paper leans on
-
[1]
M.Computing machinery and intelligence, 23–65 (Springer, 2007)
Turing, A. M.Computing machinery and intelligence, 23–65 (Springer, 2007)
work page 2007
-
[2]
Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects.Science 349, 255–260 (2015)
work page 2015
-
[3]
Probabilistic machine learning and artificial intelligence.Nature521, 452–459 (2015)
Ghahramani, Z. Probabilistic machine learning and artificial intelligence.Nature521, 452–459 (2015)
work page 2015
-
[4]
Biamonte, J.et al.Quantum machine learning.Nature549, 195–202 (2017)
work page 2017
-
[5]
He, K., Zhang, X., Ren, S. & Sun, J. Agapito, L., Berg, T., Kosecka, J. & Zelnik-Manor, L. (eds)Deep residual learning for image recognition. (eds Agapito, L., Berg, T., Kosecka, J. & Zelnik-Manor, L.) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016)
work page 2016
-
[6]
Otter, D. W., Medina, J. R. & Kalita, J. K. A survey of the usages of deep learning for natural language processing.IEEE transactions on neural networks and learning systems32, 604–624 (2020). 17
work page 2020
-
[7]
Price, I.et al.Probabilistic weather forecasting with machine learning.Nature637, 84–90 (2025)
work page 2025
-
[8]
Hollmann, N.et al.Accurate predictions on small data with a tabular foundation model.Nature 637, 319–326 (2025)
work page 2025
-
[9]
Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization.Journal of machine learning research13(2012)
work page 2012
-
[10]
Sculley, D.et al.Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds)Hidden technical debt in machine learning systems. (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R.)Proceedings of the International Conference on Neural Information Processing Systems, Vol. 2, 2503–2511 (2015)
work page 2015
-
[11]
& Vanschoren, J.Automated machine learning: methods, systems, challenges (Springer, 2019)
Hutter, F., Kotthoff, L. & Vanschoren, J.Automated machine learning: methods, systems, challenges (Springer, 2019)
work page 2019
-
[12]
Aldoseri, A., Al-Khalifa, K. N. & Hamouda, A. M. Re-thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges.Applied Sciences13, 7082 (2023)
work page 2023
-
[13]
Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging ai in real-world generalization.Nature medicine30, 2838–2848 (2024)
work page 2024
-
[14]
A few useful things to know about machine learning.Communications of the ACM 55, 78–87 (2012)
Domingos, P. A few useful things to know about machine learning.Communications of the ACM 55, 78–87 (2012)
work page 2012
-
[15]
Feurer, M.et al.Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds)Efficient and robust automated machine learning. (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R.)Proceedings of the International Conference on Neural Information Processing Systems, 2755–2763 (2015)
work page 2015
-
[16]
Henderson, P.et al.McIlraith, S. & Weinberger, K. (eds)Deep reinforcement learning that matters. (eds McIlraith, S. & Weinberger, K.)Proceedings of the AAAI conference on artificial intelligence, Vol. 32 (2018)
work page 2018
-
[17]
Thornton, C., Hutter, F., Hoos, H. H. & Leyton-Brown, K. Dhillon, I. S., Koren, Y., Ghani, R., Senator, T. E. & Schmerl, B. (eds)Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. (eds Dhillon, I. S., Koren, Y., Ghani, R., Senator, T. E. & Schmerl, B.)Proceedings of the 19th ACM SIGKDD International Conference on K...
work page 2013
- [18]
-
[19]
URL https://openreview.net/forum?id=RwfrdKSgCE
Toledo, E.et al.AI research agents for machine learning: Search, exploration, and generalization in MLE-bench (2025). URL https://openreview.net/forum?id=RwfrdKSgCE
work page 2025
-
[20]
Du, S.et al.Automlgen: Navigating fine-grained optimization for coding agents.ArXiv abs/2510.08511(2025). URL https://api.semanticscholar.org/CorpusID:281951479
-
[21]
Hong, S.et al.MetaGPT: Meta programming for a multi-agent collaborative framework.The Twelfth International Conference on Learning Representations (ICLR)(2024)
work page 2024
-
[22]
Conference on Language Modeling (COLM)(2024)
Wu, Q.et al.AutoGen: Enabling next-gen LLM applications via multi-agent conversation. Conference on Language Modeling (COLM)(2024)
work page 2024
-
[23]
S.et al.Mle-bench: Evaluating machine learning agents on machine learning engineering (2025)
Chan, J. S.et al.Mle-bench: Evaluating machine learning agents on machine learning engineering (2025). International Conference on Learning Representations (ICLR)
work page 2025
-
[24]
Mle-bench leaderboard (commit c5631ba)
OpenAI. Mle-bench leaderboard (commit c5631ba). https://github.com/openai/mle-bench/tree/ c5631ba61ceeb0573235a6ce209db435327a1e84 (2026). Accessed: 2026-03-18. 18
work page 2026
-
[25]
Chen, J.et al.MARS: Modular agent with reflective search for automated AI research.arXiv preprint arXiv:2602.02660(2026)
work page internal anchor Pith review arXiv 2026
- [26]
- [27]
-
[28]
Nadafian, A., Mohammadshahi, A. & Yazdani, M. KAPSO: A knowledge-grounded framework for autonomous program synthesis and optimization.arXiv preprint arXiv:2601.21526(2026)
- [29]
- [30]
-
[31]
Jiang, Z.et al.AIDE: AI-Driven exploration in the space of code.arXiv preprint arXiv:2502.13138 (2025)
work page internal anchor Pith review arXiv 2025
-
[32]
ImageNet classific ation with deep convolutional neural networks
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks.Commun. ACM60, 84–90 (2017). URL https://doi.org/10.1145/3065386
-
[33]
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks (2019)
work page 2019
-
[34]
URL https://openreview.net/forum?id=YicbFdNTTy
Dosovitskiy, A.et al.An image is worth 16x16 words: Transformers for image recognition at scale (2021). URL https://openreview.net/forum?id=YicbFdNTTy
work page 2021
-
[35]
Liu, Z.et al.Swin transformer: Hierarchical vision transformer using shifted windows.Proceedings of the IEEE/CVF International Conference on Computer Vision10012–10022 (2021)
work page 2021
-
[36]
Wang, M. & Deng, W. Deep visual domain adaptation: A survey.Neurocomput.312, 135–153 (2018). URL https://doi.org/10.1016/j.neucom.2018.05.083
-
[37]
D., Zoph, B., Man´ e, D., Vasudevan, V
Cubuk, E. D., Zoph, B., Man´ e, D., Vasudevan, V. & Le, Q. V. Autoaugment: Learning augmen- tation strategies from data. (2019). URL http://dblp.uni-trier.de/db/conf/cvpr/cvpr2019.html# CubukZMVL19
work page 2019
-
[38]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Liu, Z.et al.A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
work page 2022
-
[39]
Zagoruyko, S. & Komodakis, N. Wide residual networks.Proceedings of the British Machine Vision Conference (BMVC)(2016)
work page 2016
-
[40]
Deng, J.et al.ImageNet: A large-scale hierarchical image database.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition248–255 (2009)
work page 2009
-
[41]
Tan, M. & Le, Q. V. EfficientNetV2: Smaller models and faster training.Proceedings of the International Conference on Machine Learning (ICML)10096–10106 (2021)
work page 2021
- [42]
-
[43]
Liu, L.et al.Deep learning for generic object detection: A survey.Int. J. Comput. Vision128, 261–318 (2020). URL https://doi.org/10.1007/s11263-019-01247-4
-
[44]
Sharma, R., Saqib, M., Lin, C. T. & Blumenstein, M. A survey on object instance segmentation. SN Comput. Sci.3(2022). URL https://doi.org/10.1007/s42979-022-01407-3
- [45]
-
[46]
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: Unified, real-time object detection.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition779– 788 (2016)
work page 2016
- [47]
-
[48]
Rethinking Atrous Convolution for Semantic Image Segmentation
Chen, L.-C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation (2017). arXiv:1706.05587
work page internal anchor Pith review arXiv 2017
-
[49]
Simonyan, K. & Zisserman, A. Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems (NeurIPS) (2014)
work page 2014
-
[50]
Arnab, A.et al.ViViT: A video vision transformer.Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)6836–6846 (2021)
work page 2021
-
[51]
Berman, M., Triki, A. R. & Blaschko, M. B. The Lov´ asz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks 4413–4421 (2018)
work page 2018
-
[52]
Milletari, F., Navab, N. & Ahmadi, S.-A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation.Proceedings of the International Conference on 3D Vision (3DV) 565–571 (2016)
work page 2016
-
[53]
Advances in Neural Information Processing Systems34, 12077–12090 (2021)
Xie, E.et al.SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)
work page 2021
-
[54]
Bengio, Y., Ducharme, R., Vincent, P. & Janvin, C. A neural probabilistic language model.J. Mach. Learn. Res.3, 1137–1155 (2003)
work page 2003
-
[55]
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional trans- formers for language understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) (2019)
work page 2019
-
[56]
Liu, Y.et al.Roberta: A robustly optimized bert pretraining approach. arXiv (2019)
work page 2019
-
[57]
Raffel, C.et al.Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21(2020)
work page 2020
-
[58]
OpenAI Technical Report (2019)
Radford, A.et al.Language models are unsupervised multitask learners. OpenAI Technical Report (2019)
work page 2019
- [59]
-
[60]
van den Oord, A.et al.WaveNet: A generative model for raw audio.arXiv preprint arXiv:1609.03499 (2016)
work page internal anchor Pith review arXiv 2016
-
[61]
URL https: //api.semanticscholar.org/CorpusID:8810481
Hershey, S.et al.Cnn architectures for large-scale audio classification.2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)131–135 (2016). URL https: //api.semanticscholar.org/CorpusID:8810481
work page 2017
-
[62]
Hochreiter, S. & Schmidhuber, J. Long short-term memory.Neural Comput.9, 1735–1780 (1997). URL https://doi.org/10.1162/neco.1997.9.8.1735
-
[63]
Bai, S., Kolter, J. Z. & Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.ArXivabs/1803.01271(2018). URL https://api.semanticscholar. org/CorpusID:4747877
work page internal anchor Pith review arXiv 2018
-
[64]
Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? (2022). URL https://openreview.net/forum?id=Fp7 phQszn. 20
work page 2022
-
[65]
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system (2016). URL https://doi.org/ 10.1145/2939672.2939785
-
[66]
Ke, G.et al.Lightgbm: a highly efficient gradient boosting decision tree (2017)
work page 2017
-
[67]
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features.Advances in Neural Information Processing Systems31(2018)
work page 2018
-
[68]
Arik, S. ¨O. & Pfister, T. TabNet: Attentive interpretable tabular learning.Proceedings of the AAAI Conference on Artificial Intelligence35, 6679–6687 (2021)
work page 2021
-
[69]
Jasper, H. H. The ten-twenty electrode system of the International Federation.Electroencephalog- raphy and Clinical Neurophysiology10, 371–375 (1958)
work page 1958
-
[70]
Wu, K.et al.TinyViT: Fast pretraining distillation for small vision transformers.Proceedings of the European Conference on Computer Vision (ECCV)(2022)
work page 2022
-
[71]
Acharya, J. N., Hani, A. J., Thirumala, P. D. & Tsuchida, T. N. American clinical neurophysiology society guideline 3: A proposal for standard montages to be used in clinical EEG.Journal of Clinical Neurophysiology33, 312–316 (2016)
work page 2016
-
[72]
On the theory of filter amplifiers.Experimental Wireless and the Wireless Engineer 7, 536–541 (1930)
Butterworth, S. On the theory of filter amplifiers.Experimental Wireless and the Wireless Engineer 7, 536–541 (1930)
work page 1930
-
[73]
Ding, D.et al.Hybrid LLM: Cost-efficient and quality-aware query routing.Proceedings of the Twelfth International Conference on Learning Representations(2024)
work page 2024
-
[74]
Wang, X.et al.MixLLM: Dynamic routing in mixed large language models.Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (2025)
work page 2025
-
[75]
Hoffmann, J.et al.Training compute-optimal large language models.Advances in Neural Information Processing Systems (NeurIPS)(2022)
work page 2022
-
[76]
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI)(2022)
Zheng, L.et al.Alpa: Automating inter- and intra-operator parallelism for distributed deep learning. Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI)(2022)
work page 2022
-
[77]
Zhu, Z.et al.Mist: Efficient distributed training of large language models via memory-parallelism co-optimization.Proceedings of the 20th European Conference on Computer Systems (EuroSys) (2025)
work page 2025
- [78]
-
[79]
Xu, W.et al.A-MEM: Agentic memory for LLM agents.Advances in Neural Information Processing Systems (NeurIPS)(2025)
work page 2025
-
[80]
Peidli, S.et al.scPerturb: Harmonized single-cell perturbation data.Nature Methods21, 531–540 (2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.