torchtune: PyTorch native post-training library
Pith reviewed 2026-05-21 05:39 UTC · model grok-4.3
The pith
torchtune is a PyTorch-native library for LLM post-training that delivers strong performance and memory efficiency through modularity and direct code access.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enabling efficient fine-tuning, experimentation, and deployment-oriented workflows. Unlike many existing fine-tuning frameworks, which often optimize for ease of use, specialized recipes, or hardware efficiency at the cost of transparency and extensibility, torchtune emphasizes modularity, hackability, and direct access to the underlying PyTorch components. In this paper, we present the design principles behind torchtune, describe how they are reflected in its model builders, training recipes, and distributed training stack, and evaluate the library across representative post-training
What carries the argument
Modular model builders, training recipes, and distributed training stack that expose direct PyTorch components for customization and extension.
If this is right
- Post-training experiments become reproducible across different research environments using the same modular components.
- New model architectures or training methods can be integrated and tested with minimal changes to the base library.
- The same codebase supports both rapid prototyping and production-oriented deployment workflows.
- Memory and speed results remain competitive with specialized frameworks while preserving full transparency.
Where Pith is reading between the lines
- The design could increase contributions from academic groups that need to inspect and alter every stage of the pipeline.
- It suggests a broader pattern for other PyTorch-based tools to favor native access over polished high-level APIs.
- Future work might test whether the same modularity principles extend cleanly to earlier training stages or new hardware backends.
Load-bearing premise
Users will achieve better research outcomes by directly accessing and modifying core PyTorch code instead of relying on abstracted interfaces or specialized optimizations.
What would settle it
A side-by-side run on the same fine-tuning task and hardware where torchtune uses more memory or achieves lower downstream accuracy than Unsloth or Axolotl.
Figures
read the original abstract
Modern LLMs typically require multistage training pipelines to achieve strong downstream performance, with post-training serving as the main interface for adapting open-weight models. We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enabling efficient fine-tuning, experimentation, and deployment-oriented workflows. Unlike many existing fine-tuning frameworks, which often optimize for ease of use, specialized recipes, or hardware efficiency at the cost of transparency and extensibility, torchtune emphasizes modularity, hackability, and direct access to the underlying PyTorch components. In this paper, we present the design principles behind torchtune, describe how they are reflected in its model builders, training recipes, and distributed training stack, and evaluate the library across representative post-training settings. We compare against popular fine-tuning frameworks, including Axolotl and Unsloth, and show that torchtune provides strong performance and memory efficiency across many settings while remaining flexible enough for rapid research iteration. These results position torchtune as a practical foundation for reproducible LLMs post-training research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces torchtune, a PyTorch-native library for LLM post-training. It describes design principles emphasizing modularity, hackability, and direct PyTorch access (in contrast to ease-of-use or specialized optimizations in other tools), details their realization in model builders, training recipes, and the distributed stack, and reports benchmark comparisons against Axolotl and Unsloth that demonstrate strong performance and memory efficiency while claiming sufficient flexibility for rapid research iteration.
Significance. If the empirical results hold under scrutiny, torchtune supplies a practical, extensible foundation for reproducible LLM post-training research. Its native PyTorch orientation and open design could lower barriers to custom experimentation and improve transparency compared with more opaque frameworks, directly supporting the community's need for hackable tools in multistage training pipelines.
major comments (1)
- Evaluation section: the central claim requires both strong performance/memory results and flexibility for rapid research iteration. Concrete benchmark numbers versus Axolotl and Unsloth support the former, but the latter rests solely on qualitative description of modularity and direct access; no quantitative evidence (developer time, lines changed, or iteration cycles for a custom extension) is provided, leaving the design-choice justification as the weakest link.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the single major comment below and describe the planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: Evaluation section: the central claim requires both strong performance/memory results and flexibility for rapid research iteration. Concrete benchmark numbers versus Axolotl and Unsloth support the former, but the latter rests solely on qualitative description of modularity and direct access; no quantitative evidence (developer time, lines changed, or iteration cycles for a custom extension) is provided, leaving the design-choice justification as the weakest link.
Authors: We agree that the current justification for flexibility is the weakest link in the evaluation, as it relies on qualitative descriptions of modularity and direct PyTorch access. Objective quantification of developer time or iteration cycles would require controlled user studies that are outside the scope of this library paper. To address the concern, we will revise the Evaluation section to include two concrete extension examples: (1) adding a custom loss function and (2) integrating a new data collator. For each, we will report the number of lines of code added or modified and the files touched, demonstrating the low effort required to extend the library while preserving its structure. These additions will provide the requested quantitative flavor without altering the paper's primary focus on performance and memory results. revision: yes
Circularity Check
No circularity in library design and benchmark presentation
full rationale
The paper introduces torchtune as a PyTorch-native library, describes its modular design principles and components (model builders, recipes, distributed stack), and reports empirical comparisons for performance and memory efficiency against Axolotl and Unsloth. No mathematical derivations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations are present. Claims rest on direct implementation descriptions and external benchmark results rather than any chain that reduces to its own inputs by construction. This is a standard non-circular outcome for a systems/software artifact paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Scaling Learning Algorithms Towards
Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
-
[2]
and Osindero, Simon and Teh, Yee Whye , journal =
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
- [3]
-
[4]
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel , author=. 2023 , eprint=
work page 2023
-
[5]
Axolotl: Open Source LLM Post-Training , author =
-
[7]
Daniel Han, Michael Han and Unsloth team , title =
-
[8]
Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan and Marian Tietz , howpublished =
-
[10]
GitHub repository , howpublished =
Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec , title =. GitHub repository , howpublished =. 2020 , publisher =
work page 2020
-
[11]
ZeRO: Memory optimizations Toward Training Trillion Parameter Models , year=
Rajbhandari, Samyam and Rasley, Jeff and Ruwase, Olatunji and He, Yuxiong , booktitle=. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , year=
-
[12]
Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Yang, Amy and Fan, Angela and Goyal, Anirudh and Hartshorn, Anthony and Yang, Aobo and Mitra, Archi and Sravankumar, Archie and Korenev, Artem and Hinsvark, Arthur and Rao, Arun and Zhang, Aston and ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783
-
[13]
Hu, Edward J. and shen, yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , year =. International
-
[14]
Attention is all you need , volume =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Łukasz and Polosukhin, Illia , editor =. Attention is all you need , volume =. Advances in neural information processing systems , publisher =
- [15]
-
[16]
Ring Attention with Blockwise Transformers for Near-Infinite Context , author =. ArXiv , year =
-
[17]
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =
Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...
-
[19]
Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
-
[20]
9th International Conference on Learning Representations, ICLR , year=
8-bit Optimizers via Block-wise Quantization , author=. 9th International Conference on Learning Representations, ICLR , year=
-
[22]
TorchAO: PyTorch-Native Training-to-Serving Model Optimization , author=. 2025 , eprint=
work page 2025
-
[23]
The Thirteenth International Conference on Learning Representations , year=
Cut Your Losses in Large-Vocabulary Language Models , author=. The Thirteenth International Conference on Learning Representations , year=
-
[24]
Training language models to follow instructions with human feedback , url =
Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul F and Leike, Jan and Lowe,...
-
[25]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model , url =
Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Manning, Christopher D and Ermon, Stefano and Finn, Chelsea , booktitle =. Direct Preference Optimization: Your Language Model is Secretly a Reward Model , url =
- [26]
-
[27]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=
work page 2024
-
[28]
QLoRA: Efficient Finetuning of Quantized LLMs , url =
Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. QLoRA: Efficient Finetuning of Quantized LLMs , url =
-
[31]
Training Deep Nets with Sublinear Memory Cost , author=. 2016 , eprint=
work page 2016
-
[32]
Moritz, Philipp and Nishihara, Robert and Wang, Stephanie and Tumanov, Alexey and Liaw, Richard and Liang, Eric and Elibol, Melih and Yang, Zongheng and Paul, William and Jordan, Michael I. and Stoica, Ion , booktitle=. Ray: A Distributed Framework for Emerging. 2018 , isbn=
work page 2018
-
[33]
TorchRL: A data-driven decision-making library for PyTorch , url =
Bou, Albert and Bettini, Matteo and Dittert, Sebastian and Kumar, Vikash and Sodhani, Shagun and Yang, Xiaomeng and De Fabritiis, Gianni and Moens, Vincent , booktitle =. TorchRL: A data-driven decision-making library for PyTorch , url =
-
[34]
Narayanan, Deepak and Shoeybi, Mohammad and Casper, Jared and LeGresley, Patrick and Patwary, Mostofa and Korthikanti, Vijay and Vainbrand, Dmitri and Kashinkunti, Prethvi and Bernauer, Julie and Catanzaro, Bryan and Phanishayee, Amar and Zaharia, Matei , title =. Proceedings of the International Conference for High Performance Computing, Networking, Stor...
-
[36]
Efficient Sequence Packing Without Cross-Contamination: Accelerating Large Language Models Without Impacting Performance , author=. 2021 , eprint=
work page 2021
-
[37]
Journal of Machine Learning Research , year =
William Fedus and Barret Zoph and Noam Shazeer , title =. Journal of Machine Learning Research , year =
-
[38]
Smith, Daniel Khashabi, and Hannaneh Hajishirzi
Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.754
-
[39]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =
Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =
- [40]
-
[41]
2024 , howpublished=
work page 2024
-
[43]
2026 , month = apr, day =
work page 2026
-
[46]
Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, Geeta Chauhan, Anjali Chourdia, Will Constable, Alban Desmaison, Zachary DeVito, Elias Ellison, Will Feng, Jiong Gong, Michael Gschwind, and 30 others. 2024. https://doi.org/10.1145/3620665.3640366 Pytorch 2: Fast...
-
[47]
Axolotl maintainers and contributors . 2023. https://github.com/axolotl-ai-cloud/axolotl Axolotl: Open source llm post-training
work page 2023
-
[48]
Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, and Vincent Moens. 2024. https://proceedings.iclr.cc/paper_files/paper/2024/file/07bc8125400bf4b140c332010756bd9b-Paper-Conference.pdf Torchrl: A data-driven decision-making library for pytorch . In International Conference on Learning Represen...
work page 2024
-
[49]
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. https://arxiv.org/abs/1604.06174 Training deep nets with sublinear memory cost . CoRR, abs/1604.06174
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[50]
Michael Han Daniel Han and Unsloth team. 2023. http://github.com/unslothai/unsloth Unsloth
work page 2023
-
[51]
Tim Dettmers, Mike Lewis, Sam Shleifer, and Luke Zettlemoyer. 2022. 8-bit optimizers via block-wise quantization. 9th International Conference on Learning Representations, ICLR
work page 2022
-
[52]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. https://proceedings.neurips.cc/paper_files/paper/2023/file/1feb87871436031bdc0f2beaa62a049b-Paper-Conference.pdf Qlora: Efficient finetuning of quantized llms . In Advances in Neural Information Processing Systems, volume 36, pages 10088--10115. Curran Associates, Inc
work page 2023
-
[53]
William Falcon and The PyTorch Lightning team . 2019. https://doi.org/10.5281/zenodo.3828935 PyTorch Lightning
-
[54]
Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac'h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, and 5 others. 2024. https://doi.org/10.5281/zenodo.12608602 The languag...
-
[55]
Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. http://arxiv.org/abs/1503.02531 Distilling the knowledge in a neural network . In NIPS Deep Learning and Representation Learning Workshop
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[56]
Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 LoRA : Low - Rank Adaptation of Large Language Models . In International Conference on Learning Representations
work page 2022
-
[57]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles
work page 2023
-
[58]
Hanyu Lai, Xiao Liu, Junjie Gao, Jiale Cheng, Zehan Qi, Yifan Xu, Shuntian Yao, Dan Zhang, Jinhua Du, Zhenyu Hou, Xin Lv, Minlie Huang, Yuxiao Dong, and Jie Tang. 2025. https://doi.org/10.18653/v1/2025.acl-long.140 A survey of post-training scaling in large language models . In Proceedings of the 63rd Annual Meeting of the Association for Computational Li...
-
[59]
Hao Liu, Matei Zaharia, and Pieter Abbeel. 2023. https://api.semanticscholar.org/CorpusID:263608461 Ring attention with blockwise transformers for near-infinite context . ArXiv, abs/2310.01889
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[60]
Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, Benjamin Bossan, and Marian Tietz. 2022. PEFT : State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft
work page 2022
-
[61]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. https://www.usenix.org/conference/osdi18/presentation/moritz Ray: A distributed framework for emerging AI applications . In 13th USENIX Symposium on Operating Systems Design and Imp...
work page 2018
-
[62]
Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, and Aleksandar Samardžić. 2025. https://arxiv.org/abs/2507.16099 Torchao: Pytorch-native training-to-serving model optimization . Preprint, arXiv:2507.16099
-
[63]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. https://proceedings.neurips.cc/paper_files/paper/2022/file/b...
work page 2022
-
[64]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, and 2 others. 2019. https://proceedings.neurips.cc/paper_files/paper/2019/...
work page 2019
-
[65]
PyTorch Community . 2023. https://github.com/pytorch/pytorch/issues/114299 PyTorch FSDP2 RFC . GitHub Issue
work page 2023
-
[66]
PyTorch Community . 2026. https://docs.pytorch.org/docs/2.12/distributed.tensor.html torch.distributed.tensor --- PyTorch 2.12 Documentation
work page 2026
-
[67]
Qwen Team . 2025. https://arxiv.org/abs/2505.09388 Qwen3 technical report . Preprint, arXiv:2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. https://proceedings.neurips.cc/paper_files/paper/2023/file/a85b405ed65c6477a4fe8302b5e06ce7-Paper-Conference.pdf Direct preference optimization: Your language model is secretly a reward model . In Advances in Neural Information Processing Systems, ...
work page 2023
-
[69]
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. https://doi.org/10.1109/SC41405.2020.00024 Zero: Memory optimizations toward training trillion parameter models . In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--16
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41405.2020.00024 2020
-
[70]
Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. 2020. https://doi.org/10.1145/3394486.3406703 Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters . In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '20, page 3505–3506, New York, NY, U...
-
[71]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms . Preprint, arXiv:1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[72]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. https://arxiv.org/abs/2402.03300 Deepseekmath: Pushing the limits of mathematical reasoning in open language models . Preprint, arXiv:2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[73]
Philippe Tillet, H. T. Kung, and David Cox. 2019. https://doi.org/10.1145/3315508.3329973 Triton: an intermediate language and compiler for tiled neural network computations . In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2019, page 10–19, New York, NY, USA. Association for Computing Machinery
-
[74]
Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. 2020. Trl: Transformers reinforcement learning. https://github.com/huggingface/trl
work page 2020
-
[75]
Erik Wijmans, Brody Huval, Alexander Hertzberg, Vladlen Koltun, and Philipp Kraehenbuehl. 2025. https://openreview.net/forum?id=E4Fk3YuG56 Cut your losses in large-vocabulary language models . In The Thirteenth International Conference on Learning Representations
work page 2025
-
[76]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, and 3 others. 2020. https://doi.org/10.18653/v1/2020.emnlp-demos.6 Transformers: Sta...
-
[77]
Omry Yadan. 2019. https://github.com/facebookresearch/hydra Hydra - a framework for elegantly configuring complex applications . Github
work page 2019
-
[78]
Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, and Shen Li. 2023. https://arxiv.org/abs/2304.11277 Pytorch fsdp: Experiences on scaling fully sharded data parallel . Prepr...
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.