mlr3torch: A Deep Learning Framework in R based on mlr3 and torch
Pith reviewed 2026-05-10 04:04 UTC · model grok-4.3
The pith
mlr3torch embeds torch neural networks into the mlr3 R framework by representing full workflows as graphs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
mlr3torch is an extensible deep learning framework for R built on torch that simplifies neural network work by converting torch models to mlr3 learners and by letting users express entire pipelines, including network architecture, as graphs in the mlr3pipelines language.
What carries the argument
The graph representation drawn from mlr3pipelines, which composes neural network layers with preprocessing and augmentation steps into a single directed structure that mlr3 can treat as a learner.
If this is right
- Hyperparameter tuning of neural networks can reuse mlr3's existing tuning and resampling infrastructure without extra wrappers.
- Fine-tuning of pretrained torch models becomes possible inside the same mlr3 learner objects used for classical algorithms.
- Multimodal models that mix tabular and image inputs can be assembled in one graph and evaluated uniformly with mlr3 tools.
- Preprocessing and augmentation steps can be version-controlled and benchmarked together with the network architecture.
- Runtime benchmarks indicate the package remains competitive for training and inference on standard hardware.
Where Pith is reading between the lines
- The graph approach could let researchers treat neural architecture search as an optimization problem over the same pipeline objects used for model selection.
- Production deployment of deep learning models in R might become more consistent because the same mlr3 prediction and monitoring tools apply to both classical and neural learners.
- Statistical comparisons between deep learning and traditional methods gain a shared interface, making direct head-to-head resampling studies easier to set up.
- Similar graph wrappers could be written for other deep learning backends, creating a uniform modeling layer across libraries.
Load-bearing premise
The graph conversion and torch integration will operate without major friction for typical users and will deliver the promised simplification of modeling workflows.
What would settle it
Attempting to define a multimodal graph, run hyperparameter tuning on it, and then apply mlr3 resampling to the resulting learner on a standard image-plus-tabular dataset, then checking whether training completes and predictions match expected behavior.
Figures
read the original abstract
Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the mlr3torch R package as an extensible deep learning framework for the mlr3 ecosystem, built on the torch package. It simplifies definition, training, and evaluation of neural networks for tabular data and generic tensors (e.g., images) via predefined architectures, easy conversion of torch models to mlr3 learners, and graph-based definitions that unify preprocessing, data augmentation, and network architecture using the mlr3pipelines graph language. The paper explains the design and customization options, demonstrates capabilities with three use cases (hyperparameter tuning, fine-tuning, and multimodal data architectures), and presents runtime benchmarks.
Significance. If the described integration and features function as outlined, mlr3torch would provide a meaningful addition to the R machine learning ecosystem by enabling seamless deep learning workflows within mlr3's established tools for resampling, benchmarking, and tuning. The graph-based unification of the full modeling pipeline and the support for both tabular and tensor data (including multimodal cases) address practical needs for R users. The inclusion of concrete use cases and runtime benchmarks strengthens the practical value, and the emphasis on extensibility and customization is a positive aspect for long-term utility.
minor comments (3)
- Abstract: The description of runtime benchmarks does not include any summary of key quantitative findings (e.g., relative runtimes or hardware context), which would better convey the practical performance claims to readers.
- Use cases section (hyperparameter tuning example): The integration with mlr3tuning is shown via code snippets, but the manuscript would benefit from explicit discussion of how default torch hyperparameters are exposed or overridden in the mlr3 learner interface.
- Throughout: Some code examples use inline comments that could be expanded into short explanatory paragraphs for readers less familiar with the mlr3pipelines graph syntax.
Simulated Author's Rebuttal
We thank the referee for their positive and constructive review of our manuscript introducing mlr3torch. The assessment that the package provides a meaningful addition to the R machine learning ecosystem, particularly through its graph-based unification of modeling pipelines and support for tabular, tensor, and multimodal data, is appreciated. We also value the recognition of the practical value added by the use cases and runtime benchmarks. As the recommendation is for minor revision and no specific major comments were raised, we have no point-by-point responses to provide. We will incorporate any minor suggestions into the revised version of the manuscript.
Circularity Check
No significant circularity; package introduction is self-contained
full rationale
The paper introduces the mlr3torch R package as a new software contribution for integrating torch-based deep learning into the mlr3 ecosystem. It contains no mathematical derivations, first-principles predictions, fitted parameters, or uniqueness theorems. Claims rest on explicit design descriptions, graph-based model definitions, three concrete use cases (hyperparameter tuning, fine-tuning, multimodal data), and runtime benchmarks. Minor references to the existing mlr3 ecosystem are contextual and do not reduce the central contribution to a self-citation or input by construction. No steps match any enumerated circularity pattern.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Van Rossum, Guido and Drake Jr, Fred L , year=
-
[2]
2017 , publisher=
Bezanson, Jeff and Edelman, Alan and Karpinski, Stefan and Shah, Viral B , journal=. 2017 , publisher=
2017
-
[3]
Matsakis, Nicholas D and Klock, Felix S , booktitle=. The
-
[4]
Applied Machine Learning Using mlr3 in R. 2024
2024
-
[5]
Hyperparameter Optimization
Marc Becker and Lennart Schneider and Sebastian Fischer. Hyperparameter Optimization. Applied Machine Learning Using mlr3 in R. 2024
2024
-
[6]
Advanced Tuning Methods and Black Box Optimization
Lennart Schneider and Marc Becker. Advanced Tuning Methods and Black Box Optimization. Applied Machine Learning Using mlr3 in R. 2024
2024
-
[7]
Predict Sets, Validation and Internal Tuning
Sebastian Fischer. Predict Sets, Validation and Internal Tuning. Applied Machine Learning Using mlr3 in R. 2024
2024
-
[8]
Jones , journal =
Bernd Bischl and Michel Lang and Lars Kotthoff and Julia Schiffner and Jakob Richter and Erich Studerus and Giuseppe Casalicchio and Zachary M. Jones , journal =. 2016 , volume =
2016
-
[9]
Marc Becker and Michel Lang and Jakob Richter and Bernd Bischl and Daniel Schalk , year =
-
[10]
Lennart Schneider and Jakob Richter and Marc Becker and Michel Lang and Bernd Bischl and Florian Pfisterer and Martin Binder and Sebastian Fischer , year =
-
[11]
Michel Lang and Bernd Bischl and Jakob Richter and Xudong Sun and Martin Binder , year =
-
[12]
Michel Lang , year =
-
[13]
2019 , number =
Michel Lang and Martin Binder and Jakob Richter and Patrick Schratz and Florian Pfisterer and Stefan Coors and Quay Au and Giuseppe Casalicchio and Lars Kotthoff and Bernd Bischl , journal =. 2019 , number =
2019
-
[14]
Bioinformatics , volume=
Sonabend, Raphael and Kir. Bioinformatics , volume=. 2021 , publisher=
2021
-
[15]
Damir Pulatov and Michel Lang , year =
-
[16]
2021 , volume =
Martin Binder and Florian Pfisterer and Michel Lang and Lennart Schneider and Lars Kotthoff and Bernd Bischl , journal =. 2021 , volume =
2021
-
[17]
2017 , month =
Michel Lang and Bernd Bischl and Dirk Surmann , journal =. 2017 , month =
2017
-
[18]
Michel Lang and Patrick Schratz and Raphael Sonabend and Marc Becker and Jakob Richter and John Zobolas , year =
-
[19]
Tyson Barrett and Matt Dowle and Arun Srinivasan and Jan Gorecki and Michael Chirico and Toby Hocking and Benjamin Schwendinger , year =
-
[20]
Winston Chang , year =
-
[21]
Terry Therneau and Beth Atkinson , year =
-
[22]
Journal of Statistical Software , author =
Building Predictive Models in. Journal of Statistical Software , author =. 2008 , pages =. doi:10.18637/jss.v028.i05 , number =
-
[23]
Max Kuhn and Hadley Wickham , url =
-
[24]
Kevin Ushey and JJ Allaire and Yuan Tang , year =
-
[25]
2024 , doi =
Christian Amesoeder and Florian Hartig and Maximilian Pichler , journal =. 2024 , doi =
2024
-
[26]
Daniel Falbel , year =
-
[27]
Daniel Falbel and Javier Luraschi , year =
-
[28]
Max Kuhn and Daniel Falbel , year =
-
[29]
Tomasz Kalinowski and JJ Allaire and François Chollet , year =
-
[30]
JJ Allaire and Yuan Tang , year =
-
[31]
James Bradbury and Roy Frostig and Peter Hawkins and Matthew James Johnson and Chris Leary and Dougal Maclaurin and George Necula and Adam Paszke and Jake Vander
-
[32]
2019,, 1.4 doi: 10.5281/zenodo.3828935
Falcon, William and. doi:10.5281/zenodo.3828935 , license =
-
[33]
and Varoquaux, G
Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E. , journal=. 2011 , url =
2011
-
[34]
Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others , journal=
-
[35]
Proceedings of the 18th ACM international conference on Multimedia , pages=
Marcel, S. Proceedings of the 18th ACM international conference on Multimedia , pages=
-
[36]
2022 , organization=
Yang, Yao-Yuan and Hira, Moto and Ni, Zhaoheng and Astafurov, Artyom and Chen, Caroline and Puhrsch, Christian and Pollack, David and Genzel, Dmitriy and Greenberg, Donny and Yang, Edward Z and others , booktitle=. 2022 , organization=
2022
-
[37]
Array Programming with
Harris, Charles R and Millman, K Jarrod and Van Der Walt, St. Array Programming with. Nature , volume=. 2020 , publisher=
2020
-
[38]
Fan and Daniel Nouri and Benjamin Bossan and
Marian Tietz and Thomas J. Fan and Daniel Nouri and Benjamin Bossan and
-
[39]
Astrophysics source code library , pages=
Chollet, Fran. Astrophysics source code library , pages=
-
[40]
12th USENIX symposium on operating systems design and implementation (OSDI 16) , pages=
Abadi, Mart. 12th USENIX symposium on operating systems design and implementation (OSDI 16) , pages=
-
[41]
Innes, Mike , journal=
-
[42]
2020 , url =
Blaom, Anthony D and Kiraly, Franz and Lienart, Thibaut and Simillides, Yiannis and Arenas, Diego and Vollmer, Sebastian J , journal=. 2020 , url =
2020
-
[43]
Simard, Nathaniel and Fortier-Dubois, Louis and Tadjibaev, Dilshod and Lagrange, Guillaume and
-
[44]
Hugging Face , url =
-
[45]
arXiv preprint arXiv:2502.02496 , year=
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries , author=. arXiv preprint arXiv:2502.02496 , year=
-
[46]
IEEE transactions on neural networks and learning systems , year=
Deep Neural Networks and Tabular Data: A Survey , author=. IEEE transactions on neural networks and learning systems , year=
-
[47]
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization , author=. arXiv preprint arXiv:1711.05101 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Mobilenetv2: Inverted Residuals and Linear Bottlenecks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[49]
Advances in Neural Information Processing Systems , volume=
When do Neural Nets Outperform Boosted Trees on Tabular Data? , author=. Advances in Neural Information Processing Systems , volume=
-
[50]
2013 , howpublished =
Fanaee-T, Hadi , title =. 2013 , howpublished =
2013
-
[51]
International Skin Imaging Collaboration , year=
Siim-isic 2020 Challenge Dataset , author=. International Skin Imaging Collaboration , year=
2020
-
[52]
2013 , howpublished =
Will Cukierski , title =. 2013 , howpublished =
2013
-
[53]
IEEE Signal Processing Magazine , volume=
The MNIST Database of Handwritten Digit Images for Machine Learning Research , author=. IEEE Signal Processing Magazine , volume=. 2012 , publisher=
2012
-
[54]
Biometrics , pages=
Building Multiple Regression Models Interactively , author=. Biometrics , pages=. 1981 , publisher=
1981
-
[55]
Statistics & Probability Letters , volume=
Sparse Spatial Autoregressions , author=. Statistics & Probability Letters , volume=. 1997 , publisher=
1997
-
[56]
2016 , isbn =
Hadley Wickham , title =. 2016 , isbn =
2016
-
[57]
The American Statistician , volume=
Semi-Structured Distributional Regression , author=. The American Statistician , volume=. 2024 , publisher=
2024
-
[58]
2023 , publisher=
Deep Learning: Foundations and Concepts , author=. 2023 , publisher=
2023
-
[59]
Advances in neural information processing systems , volume=
Revisiting Deep Learning Models for Tabular Data , author=. Advances in neural information processing systems , volume=
-
[60]
Gardner, Jacob and Pleiss, Geoff and Weinberger, Kilian Q and Bindel, David and Wilson, Andrew G , journal=
-
[61]
W. N. Venables and B. D. Ripley , publisher =. Modern Applied Statistics with. 2002 , note =
2002
-
[62]
Turgut Abdullayev , year =
-
[63]
Wright , year =
Stefan Fritsch and Frauke Guenther and Marvin N. Wright , year =
-
[64]
Tomas Fryda and Erin LeDell and Navdeep Gill and Spencer Aiello and Anqi Fu and Arno Candel and Cliff Click and Tom Kraljevic and Tomas Nykodym and Patrick Aboyoun and Michal Kurka and Michal Malohlava and Sebastien Poirier and Wendy Wong , year =
-
[65]
Ben\'itez , journal =
Christoph Bergmeir and Jos\'e M. Ben\'itez , journal =. Neural Networks in. 2012 , volume =
2012
-
[66]
Martin Binder and Florian Pfisterer and Marc Becker and Marvin N. Wright. Non-sequential Pipelines and Tuning. Applied Machine Learning Using mlr3 in R. 2024
2024
-
[67]
Advances in neural information processing systems , volume=
Attention is all you Need , author=. Advances in neural information processing systems , volume=
-
[68]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Deep Residual Learning for Image Recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[69]
2016 , eprint=
Layer Normalization , author=. 2016 , eprint=
2016
-
[70]
Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=
Deep Sparse Rectifier Neural Networks , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=
2011
-
[71]
The journal of machine learning research , volume=
Dropout: A Simple way to Prevent Neural Networks from Overfitting , author=. The journal of machine learning research , volume=. 2014 , publisher=
2014
-
[72]
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=
Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges , author=. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=. 2023 , publisher=
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.