pith. sign in

arxiv: 2604.18152 · v1 · submitted 2026-04-20 · 📊 stat.ML · cs.LG

mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

Pith reviewed 2026-05-10 04:04 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords mlr3torchdeep learningR packagetorchneural networksmlr3 ecosystemgraph pipelinesmachine learning framework
0
0 comments X p. Extension

The pith

mlr3torch embeds torch neural networks into the mlr3 R framework by representing full workflows as graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents mlr3torch as a new R package that brings deep learning to the mlr3 ecosystem by wrapping the torch library. It supports defining, training, and evaluating neural networks for classification and regression on tabular data as well as images and other tensors. Users can convert existing torch models into mlr3 learners or build networks as graphs that also incorporate preprocessing and data augmentation steps. This design lets the package inherit mlr3 tools for resampling, benchmarking, and tuning. The authors illustrate the approach with examples of hyperparameter optimization, model fine-tuning, and multimodal architectures, plus runtime comparisons.

Core claim

mlr3torch is an extensible deep learning framework for R built on torch that simplifies neural network work by converting torch models to mlr3 learners and by letting users express entire pipelines, including network architecture, as graphs in the mlr3pipelines language.

What carries the argument

The graph representation drawn from mlr3pipelines, which composes neural network layers with preprocessing and augmentation steps into a single directed structure that mlr3 can treat as a learner.

If this is right

  • Hyperparameter tuning of neural networks can reuse mlr3's existing tuning and resampling infrastructure without extra wrappers.
  • Fine-tuning of pretrained torch models becomes possible inside the same mlr3 learner objects used for classical algorithms.
  • Multimodal models that mix tabular and image inputs can be assembled in one graph and evaluated uniformly with mlr3 tools.
  • Preprocessing and augmentation steps can be version-controlled and benchmarked together with the network architecture.
  • Runtime benchmarks indicate the package remains competitive for training and inference on standard hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph approach could let researchers treat neural architecture search as an optimization problem over the same pipeline objects used for model selection.
  • Production deployment of deep learning models in R might become more consistent because the same mlr3 prediction and monitoring tools apply to both classical and neural learners.
  • Statistical comparisons between deep learning and traditional methods gain a shared interface, making direct head-to-head resampling studies easier to set up.
  • Similar graph wrappers could be written for other deep learning backends, creating a uniform modeling layer across libraries.

Load-bearing premise

The graph conversion and torch integration will operate without major friction for typical users and will deliver the promised simplification of modeling workflows.

What would settle it

Attempting to define a multimodal graph, run hyperparameter tuning on it, and then apply mlr3 resampling to the resulting learner on a standard image-plus-tabular dataset, then checking whether training completes and predictions match expected behavior.

Figures

Figures reproduced from arXiv: 2604.18152 by Bernd Bischl, Carson Zhang, Lukas Burk, Martin Binder, Sebastian Fischer.

Figure 1
Figure 1. Figure 1: Simple transformer-based neural network architecture represented as a graph. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Training and prediction phase for a generating [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: More complex network segments. One also often wants to repeat a specific neural network segment multiple times. This can be achieved by using the PipeOpTorchBlock operator. This PipeOp takes in another Graph primarily consisting of PipeOpTorch objects. During training, the PipeOp will repeatedly attach the same network segment to the neural network. R> blocks <- nn("block", residual_layer, n_blocks = 5) As… view at source ↗
Figure 4
Figure 4. Figure 4: ROC curve for the multimodal neural network evaluated using holdout resampling. [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Runtime results for AdamW (left) and SGD (right) for GPU (top) and CPU (bot [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Median runtime of mlr3torch and torch relative to PyTorch. optimizer when there are many layers and the number of latent dimensions is 9000, but this is likely an artifact explained by some internal mechanism of torch or LibTorch. On the CPU, both the relative and absolute overhead of the R implementations is larger for SGD than it is for the more compute and memory-intensive AdamW optimizer. Also, the rel… view at source ↗
read the original abstract

Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces the mlr3torch R package as an extensible deep learning framework for the mlr3 ecosystem, built on the torch package. It simplifies definition, training, and evaluation of neural networks for tabular data and generic tensors (e.g., images) via predefined architectures, easy conversion of torch models to mlr3 learners, and graph-based definitions that unify preprocessing, data augmentation, and network architecture using the mlr3pipelines graph language. The paper explains the design and customization options, demonstrates capabilities with three use cases (hyperparameter tuning, fine-tuning, and multimodal data architectures), and presents runtime benchmarks.

Significance. If the described integration and features function as outlined, mlr3torch would provide a meaningful addition to the R machine learning ecosystem by enabling seamless deep learning workflows within mlr3's established tools for resampling, benchmarking, and tuning. The graph-based unification of the full modeling pipeline and the support for both tabular and tensor data (including multimodal cases) address practical needs for R users. The inclusion of concrete use cases and runtime benchmarks strengthens the practical value, and the emphasis on extensibility and customization is a positive aspect for long-term utility.

minor comments (3)
  1. Abstract: The description of runtime benchmarks does not include any summary of key quantitative findings (e.g., relative runtimes or hardware context), which would better convey the practical performance claims to readers.
  2. Use cases section (hyperparameter tuning example): The integration with mlr3tuning is shown via code snippets, but the manuscript would benefit from explicit discussion of how default torch hyperparameters are exposed or overridden in the mlr3 learner interface.
  3. Throughout: Some code examples use inline comments that could be expanded into short explanatory paragraphs for readers less familiar with the mlr3pipelines graph syntax.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and constructive review of our manuscript introducing mlr3torch. The assessment that the package provides a meaningful addition to the R machine learning ecosystem, particularly through its graph-based unification of modeling pipelines and support for tabular, tensor, and multimodal data, is appreciated. We also value the recognition of the practical value added by the use cases and runtime benchmarks. As the recommendation is for minor revision and no specific major comments were raised, we have no point-by-point responses to provide. We will incorporate any minor suggestions into the revised version of the manuscript.

Circularity Check

0 steps flagged

No significant circularity; package introduction is self-contained

full rationale

The paper introduces the mlr3torch R package as a new software contribution for integrating torch-based deep learning into the mlr3 ecosystem. It contains no mathematical derivations, first-principles predictions, fitted parameters, or uniqueness theorems. Claims rest on explicit design descriptions, graph-based model definitions, three concrete use cases (hyperparameter tuning, fine-tuning, multimodal data), and runtime benchmarks. Minor references to the existing mlr3 ecosystem are contextual and do not reduce the central contribution to a self-citation or input by construction. No steps match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software package introduction paper with no mathematical derivations, fitted parameters, or new postulated entities. It builds on existing torch and mlr3 libraries without introducing free parameters or axioms beyond standard ML assumptions.

pith-pipeline@v0.9.0 · 5513 in / 1105 out tokens · 38433 ms · 2026-05-10T04:04:34.689667+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Van Rossum, Guido and Drake Jr, Fred L , year=

  2. [2]

    2017 , publisher=

    Bezanson, Jeff and Edelman, Alan and Karpinski, Stefan and Shah, Viral B , journal=. 2017 , publisher=

  3. [3]

    Matsakis, Nicholas D and Klock, Felix S , booktitle=. The

  4. [4]

    Applied Machine Learning Using mlr3 in R. 2024

  5. [5]

    Hyperparameter Optimization

    Marc Becker and Lennart Schneider and Sebastian Fischer. Hyperparameter Optimization. Applied Machine Learning Using mlr3 in R. 2024

  6. [6]

    Advanced Tuning Methods and Black Box Optimization

    Lennart Schneider and Marc Becker. Advanced Tuning Methods and Black Box Optimization. Applied Machine Learning Using mlr3 in R. 2024

  7. [7]

    Predict Sets, Validation and Internal Tuning

    Sebastian Fischer. Predict Sets, Validation and Internal Tuning. Applied Machine Learning Using mlr3 in R. 2024

  8. [8]

    Jones , journal =

    Bernd Bischl and Michel Lang and Lars Kotthoff and Julia Schiffner and Jakob Richter and Erich Studerus and Giuseppe Casalicchio and Zachary M. Jones , journal =. 2016 , volume =

  9. [9]

    Marc Becker and Michel Lang and Jakob Richter and Bernd Bischl and Daniel Schalk , year =

  10. [10]

    Lennart Schneider and Jakob Richter and Marc Becker and Michel Lang and Bernd Bischl and Florian Pfisterer and Martin Binder and Sebastian Fischer , year =

  11. [11]

    Michel Lang and Bernd Bischl and Jakob Richter and Xudong Sun and Martin Binder , year =

  12. [12]

    Michel Lang , year =

  13. [13]

    2019 , number =

    Michel Lang and Martin Binder and Jakob Richter and Patrick Schratz and Florian Pfisterer and Stefan Coors and Quay Au and Giuseppe Casalicchio and Lars Kotthoff and Bernd Bischl , journal =. 2019 , number =

  14. [14]

    Bioinformatics , volume=

    Sonabend, Raphael and Kir. Bioinformatics , volume=. 2021 , publisher=

  15. [15]

    Damir Pulatov and Michel Lang , year =

  16. [16]

    2021 , volume =

    Martin Binder and Florian Pfisterer and Michel Lang and Lennart Schneider and Lars Kotthoff and Bernd Bischl , journal =. 2021 , volume =

  17. [17]

    2017 , month =

    Michel Lang and Bernd Bischl and Dirk Surmann , journal =. 2017 , month =

  18. [18]

    Michel Lang and Patrick Schratz and Raphael Sonabend and Marc Becker and Jakob Richter and John Zobolas , year =

  19. [19]

    Tyson Barrett and Matt Dowle and Arun Srinivasan and Jan Gorecki and Michael Chirico and Toby Hocking and Benjamin Schwendinger , year =

  20. [20]

    Winston Chang , year =

  21. [21]

    Terry Therneau and Beth Atkinson , year =

  22. [22]

    Journal of Statistical Software , author =

    Building Predictive Models in. Journal of Statistical Software , author =. 2008 , pages =. doi:10.18637/jss.v028.i05 , number =

  23. [23]

    Max Kuhn and Hadley Wickham , url =

  24. [24]

    Kevin Ushey and JJ Allaire and Yuan Tang , year =

  25. [25]

    2024 , doi =

    Christian Amesoeder and Florian Hartig and Maximilian Pichler , journal =. 2024 , doi =

  26. [26]

    Daniel Falbel , year =

  27. [27]

    Daniel Falbel and Javier Luraschi , year =

  28. [28]

    Max Kuhn and Daniel Falbel , year =

  29. [29]

    Tomasz Kalinowski and JJ Allaire and François Chollet , year =

  30. [30]

    JJ Allaire and Yuan Tang , year =

  31. [31]

    James Bradbury and Roy Frostig and Peter Hawkins and Matthew James Johnson and Chris Leary and Dougal Maclaurin and George Necula and Adam Paszke and Jake Vander

  32. [32]

    2019,, 1.4 doi: 10.5281/zenodo.3828935

    Falcon, William and. doi:10.5281/zenodo.3828935 , license =

  33. [33]

    and Varoquaux, G

    Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E. , journal=. 2011 , url =

  34. [34]

    Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others , journal=

  35. [35]

    Proceedings of the 18th ACM international conference on Multimedia , pages=

    Marcel, S. Proceedings of the 18th ACM international conference on Multimedia , pages=

  36. [36]

    2022 , organization=

    Yang, Yao-Yuan and Hira, Moto and Ni, Zhaoheng and Astafurov, Artyom and Chen, Caroline and Puhrsch, Christian and Pollack, David and Genzel, Dmitriy and Greenberg, Donny and Yang, Edward Z and others , booktitle=. 2022 , organization=

  37. [37]

    Array Programming with

    Harris, Charles R and Millman, K Jarrod and Van Der Walt, St. Array Programming with. Nature , volume=. 2020 , publisher=

  38. [38]

    Fan and Daniel Nouri and Benjamin Bossan and

    Marian Tietz and Thomas J. Fan and Daniel Nouri and Benjamin Bossan and

  39. [39]

    Astrophysics source code library , pages=

    Chollet, Fran. Astrophysics source code library , pages=

  40. [40]

    12th USENIX symposium on operating systems design and implementation (OSDI 16) , pages=

    Abadi, Mart. 12th USENIX symposium on operating systems design and implementation (OSDI 16) , pages=

  41. [41]

    Innes, Mike , journal=

  42. [42]

    2020 , url =

    Blaom, Anthony D and Kiraly, Franz and Lienart, Thibaut and Simillides, Yiannis and Arenas, Diego and Vollmer, Sebastian J , journal=. 2020 , url =

  43. [43]

    Simard, Nathaniel and Fortier-Dubois, Louis and Tadjibaev, Dilshod and Lagrange, Guillaume and

  44. [44]

    Hugging Face , url =

  45. [45]

    arXiv preprint arXiv:2502.02496 , year=

    Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries , author=. arXiv preprint arXiv:2502.02496 , year=

  46. [46]

    IEEE transactions on neural networks and learning systems , year=

    Deep Neural Networks and Tabular Data: A Survey , author=. IEEE transactions on neural networks and learning systems , year=

  47. [47]

    Decoupled Weight Decay Regularization

    Decoupled Weight Decay Regularization , author=. arXiv preprint arXiv:1711.05101 , year=

  48. [48]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Mobilenetv2: Inverted Residuals and Linear Bottlenecks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  49. [49]

    Advances in Neural Information Processing Systems , volume=

    When do Neural Nets Outperform Boosted Trees on Tabular Data? , author=. Advances in Neural Information Processing Systems , volume=

  50. [50]

    2013 , howpublished =

    Fanaee-T, Hadi , title =. 2013 , howpublished =

  51. [51]

    International Skin Imaging Collaboration , year=

    Siim-isic 2020 Challenge Dataset , author=. International Skin Imaging Collaboration , year=

  52. [52]

    2013 , howpublished =

    Will Cukierski , title =. 2013 , howpublished =

  53. [53]

    IEEE Signal Processing Magazine , volume=

    The MNIST Database of Handwritten Digit Images for Machine Learning Research , author=. IEEE Signal Processing Magazine , volume=. 2012 , publisher=

  54. [54]

    Biometrics , pages=

    Building Multiple Regression Models Interactively , author=. Biometrics , pages=. 1981 , publisher=

  55. [55]

    Statistics & Probability Letters , volume=

    Sparse Spatial Autoregressions , author=. Statistics & Probability Letters , volume=. 1997 , publisher=

  56. [56]

    2016 , isbn =

    Hadley Wickham , title =. 2016 , isbn =

  57. [57]

    The American Statistician , volume=

    Semi-Structured Distributional Regression , author=. The American Statistician , volume=. 2024 , publisher=

  58. [58]

    2023 , publisher=

    Deep Learning: Foundations and Concepts , author=. 2023 , publisher=

  59. [59]

    Advances in neural information processing systems , volume=

    Revisiting Deep Learning Models for Tabular Data , author=. Advances in neural information processing systems , volume=

  60. [60]

    Gardner, Jacob and Pleiss, Geoff and Weinberger, Kilian Q and Bindel, David and Wilson, Andrew G , journal=

  61. [61]

    W. N. Venables and B. D. Ripley , publisher =. Modern Applied Statistics with. 2002 , note =

  62. [62]

    Turgut Abdullayev , year =

  63. [63]

    Wright , year =

    Stefan Fritsch and Frauke Guenther and Marvin N. Wright , year =

  64. [64]

    Tomas Fryda and Erin LeDell and Navdeep Gill and Spencer Aiello and Anqi Fu and Arno Candel and Cliff Click and Tom Kraljevic and Tomas Nykodym and Patrick Aboyoun and Michal Kurka and Michal Malohlava and Sebastien Poirier and Wendy Wong , year =

  65. [65]

    Ben\'itez , journal =

    Christoph Bergmeir and Jos\'e M. Ben\'itez , journal =. Neural Networks in. 2012 , volume =

  66. [66]

    Martin Binder and Florian Pfisterer and Marc Becker and Marvin N. Wright. Non-sequential Pipelines and Tuning. Applied Machine Learning Using mlr3 in R. 2024

  67. [67]

    Advances in neural information processing systems , volume=

    Attention is all you Need , author=. Advances in neural information processing systems , volume=

  68. [68]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deep Residual Learning for Image Recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  69. [69]

    2016 , eprint=

    Layer Normalization , author=. 2016 , eprint=

  70. [70]

    Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

    Deep Sparse Rectifier Neural Networks , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

  71. [71]

    The journal of machine learning research , volume=

    Dropout: A Simple way to Prevent Neural Networks from Overfitting , author=. The journal of machine learning research , volume=. 2014 , publisher=

  72. [72]

    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=

    Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges , author=. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=. 2023 , publisher=