pith. sign in

arxiv: 2606.17497 · v1 · pith:JICL5CPAnew · submitted 2026-06-16 · 🧮 math.NA · cs.NA

Design principles for stable and generalizable data-driven discretizations for solving linear hyperbolic conservation laws

Pith reviewed 2026-06-27 00:16 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords data-driven discretizationfinite volume methodslinear advectionneural networkssemilinearitynormalizationflux limitergeneralization
0
0 comments X

The pith

Enforcing semilinearity via local stencil-scale normalization stabilizes data-driven finite-volume schemes for linear advection and improves generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates neural networks as finite-volume discretizations for the one-dimensional linear advection equation, built to respect core numerical analysis principles. It shows that reconstruction from cell averages alone creates a multi-valued learning problem, which limits generalization when training data spans different curvature regimes. Stability and generalization follow from enforcing semilinearity through local stencil-scale normalization, which makes the scheme invariant under affine transformations of its inputs. Training on polynomial profiles produces stable, high-order accurate schemes whose formal order is set by polynomial degree, while a new data-driven flux limiter adds mild antidiffusion in near-linear regions to preserve shapes better than the classical OSTVD3 scheme.

Core claim

Numerical stability and good generalization can be achieved by enforcing semilinearity through local stencil-scale normalization, which ensures invariance under affine transformations of the inputs. Training on polynomial profiles yields stable, high-order accurate discretizations, with the polynomial degree controlling the formal order of accuracy. A new data-driven flux limiter outperforms the classical OSTVD3 scheme in shape preservation by introducing mild antidiffusion in near-linear regimes.

What carries the argument

Local stencil-scale normalization that enforces semilinearity (invariance under affine transformations of the inputs)

If this is right

  • The formal order of accuracy of the learned scheme is controlled by the degree of the polynomials used for training.
  • The new flux limiter provides better shape preservation than OSTVD3 by adding mild antidiffusion near linear profiles.
  • Higher-order reconstruction in non-monotonic regions yields only limited improvement once semilinearity is enforced.
  • Reconstruction from cell averages alone produces a multi-valued problem that blocks generalization across curvature regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local normalization could be tested on variable-coefficient or multi-dimensional linear advection to check whether affine invariance remains sufficient for stability.
  • Input representation choices may prove more decisive for generalization than network depth in other learned discretizations of hyperbolic problems.
  • Polynomial training might serve as a minimal way to anchor high-order accuracy before extending to nonlinear conservation laws.

Load-bearing premise

That training exclusively on polynomial profiles resolves the multi-valued learning problem sufficiently for generalization to other curvature regimes encountered in practice.

What would settle it

Running the trained scheme on a test profile whose curvature lies outside the polynomial family used in training, such as a sharp discontinuity or high-frequency oscillation, and checking for instability or loss of shape preservation.

Figures

Figures reproduced from arXiv: 2606.17497 by Alistair Adcroft, Antoine-Alexis Nasser.

Figure 1
Figure 1. Figure 1: Illustration of the problem setup and architectures of the data-driven discretizations. (a) Data-driven formulation for the advection problem. (b–c) Generation of training data using analytical profiles (b) and random polynomials (c). (d) Under local input–output normalization (as in panel a), the network is trained on exact pairs of FV cell-averages and numerical fluxes. (e) Training workflow of the data-… view at source ↗
Figure 2
Figure 2. Figure 2: Possible configurations of a three-point stencil in normalized phase space, sorted according to the curvature of the gridded data. The horizontal axis represents the local curvature of the normalized field, ̃𝑞𝑖−1 − 2 ̃𝑞𝑖 + ̃𝑞𝑖+1. Black dots denote values equal to 0 or 1 depending on their vertical position, whereas intermediate values are shown in orange. Configurations located in the upper (lower) part of… view at source ↗
Figure 3
Figure 3. Figure 3: Flux-curvature diagrams for perfect advection of three initial conditions for a three-point stencil. Each color denotes a shape specified in the legend. The normalized numerical flux 𝐹̃ 𝑖+1∕2 is plotted in the 𝑦-axis against the local curvature of the normalized gridded field, ̃𝑞𝑖−1 − 2 ̃𝑞𝑖 + ̃𝑞𝑖+1 along 𝑥-axis. The curvature range corresponds to the same stencil values shown in [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 4
Figure 4. Figure 4: Flux-curvature diagram for data-driven discretizations trained on specific solutions. Panel (a): phase space of neural networks trained to predict the flux for a step profile (two random seeds (b,c)) and for a sine profile (d); axes are identical to those in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Advection over one period. The dotted line denotes the initial condition and the square markers denote the final state. From top to bottom: the optimized data-driven limiter (DDL), the TVD third order one-step scheme (OSTVD3) and the unlimited scheme (OS3), for comparison. All the schemes are used with CFL 𝜈 = 0.2. First, the compressive Superbee limiter closely matches the ramp profile (in orange), consis… view at source ↗
Figure 6
Figure 6. Figure 6: Advection over 10 periods. From top to bottom: the optimized data-driven limiter (DDL), the TVD third order one-step scheme (OSTVD3) and the unlimited scheme (OS3), for comparison. All the schemes are used with CFL 𝜈 = 0.2. cell-averages provides a simple visualization of local reconstructions using a three-point stencil. In this framework, both slope limiters and flux limiters correspond to specific traje… view at source ↗
Figure 7
Figure 7. Figure 7: Data-driven and conventional flux limiters in the Sweby 𝑟-diagram. Panel (b) shows a magnified view of the inset in panel (a), while panel (c) further zooms into panel (b). The accuracy function Φ is plotted along the vertical axis and the slope ratio 𝑟 along the horizontal axis, with CFL 𝜈 = 0.2. The best-performing data-driven limiter DDL is shown as a solid black line. Panel d) shows the accuracy functi… view at source ↗
Figure 8
Figure 8. Figure 8: Advection over 10 periods for modifications of the DDL limiter. In (a), all Φ values for 𝑟 < 0 are set to zero. In panels (b-c), the limiter enforces DDL when ΦDDL value is respectively above or below ΦOSTVD3 for 𝑟 > 0. All simulations use Courant number 𝜈 = 0.2. best performance remains a critical step when designing a data-driven discretization, which should be assessed based on the application. Finally,… view at source ↗
Figure 9
Figure 9. Figure 9: Advective schemes diagnosed in the flux-curvature diagram. Each color denotes a shape specified in the legend. The normalized numerical flux 𝐹̃ 𝑛 𝑖+1∕2 is plotted in the 𝑦-axis, for a CFL number 𝜈 = 0.2, against the local curvature of the normalized gridded field, ̃𝑞𝑖−1 − 2 ̃𝑞𝑖 + ̃𝑞𝑖+1 along 𝑥-axis. was constructed from first principles and used as a numerical advection scheme. Several data-driven strategi… view at source ↗
read the original abstract

We investigate data-driven finite-volume discretizations of the linear advection equation in one dimension. Neural networks for use as numerical advection schemes are constructed adhering to first principles of numerical analysis, allowing us to examine how normalization, training data, and architectural choices influence stability, accuracy, and shape preservation. (i) We show that reconstruction based solely on cell averages leads to a multi-valued learning problem, explaining limited generalization when training data includes widely different curvature regimes. (ii) Numerical stability and good generalization can be achieved by enforcing semilinearity (Lin and Rood 1998) through local stencil-scale normalization, which ensures invariance under affine transformations of the inputs. (iii) A new data-driven flux limiter is introduced that outperforms the classical 'OSTVD3' (Arora and Roe, 1997) scheme in shape preservation by introducing mild antidiffusion in near-linear regimes, while higher-order reconstruction in non-monotonic regions provides limited benefit. (iv) We show that training on polynomial profiles yields stable, high-order accurate discretizations, with the polynomial degree controlling the formal order of accuracy. Together, these results illustrate how the representational, architectural, and training choices govern the stability and generalization of data-driven finite-volume schemes for linear advection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates data-driven finite-volume discretizations of the 1D linear advection equation. It identifies that reconstruction from cell averages alone creates a multi-valued learning problem that limits generalization across curvature regimes. It shows that enforcing semilinearity (Lin and Rood 1998) via local stencil-scale normalization achieves invariance under affine input transformations, yielding numerical stability and improved generalization. A new data-driven flux limiter is introduced that outperforms the classical OSTVD3 scheme in shape preservation through mild antidiffusion in near-linear regimes. Training exclusively on polynomial profiles is shown to produce stable, high-order accurate schemes, with the polynomial degree controlling the formal order of accuracy.

Significance. If the central claims hold, the work supplies concrete design principles for stable data-driven schemes by grounding neural-network discretizations in established numerical-analysis concepts such as semilinearity and affine invariance. The explicit link between polynomial training degree and formal order of accuracy, together with the new flux limiter, constitutes a practical contribution that could inform construction of reliable learned advection operators. The manuscript also demonstrates how architectural choices (normalization) directly address the multi-valued mapping issue identified in claim (i).

major comments (2)
  1. [Abstract (ii),(iv)] Abstract (ii) and (iv): the claim that polynomial-only training resolves the multi-valued learning problem sufficiently for generalization rests on the premise that the learned operator will select the same stable branch outside the polynomial curvature regime. No quantitative evidence (error tables, stability tests, or shape-preservation metrics) is supplied for non-polynomial profiles such as high-frequency sinusoids or near-discontinuities, which directly bears on whether the semilinearity enforcement alone guarantees the reported stability and generalization.
  2. [Abstract (iii)] Abstract (iii): the assertion that the new data-driven flux limiter outperforms OSTVD3 in shape preservation is load-bearing for the practical utility claim, yet the manuscript supplies no side-by-side comparison of total-variation or local-extrema counts on a standardized test suite; without these metrics it is unclear whether the reported improvement is robust or confined to the polynomial training distribution.
minor comments (2)
  1. Notation for the local stencil-scale normalization should be introduced with an explicit formula (e.g., Eq. (X)) rather than described only in prose, to allow readers to verify the affine-invariance property directly.
  2. Figure captions for the generalization experiments should state the precise polynomial degrees used in training and the exact non-polynomial test functions employed, if any.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. The suggestions highlight areas where additional quantitative support can strengthen the claims regarding generalization and the flux limiter's performance. We address each major comment below and will incorporate the requested evidence in a revised manuscript.

read point-by-point responses
  1. Referee: [Abstract (ii),(iv)] Abstract (ii) and (iv): the claim that polynomial-only training resolves the multi-valued learning problem sufficiently for generalization rests on the premise that the learned operator will select the same stable branch outside the polynomial curvature regime. No quantitative evidence (error tables, stability tests, or shape-preservation metrics) is supplied for non-polynomial profiles such as high-frequency sinusoids or near-discontinuities, which directly bears on whether the semilinearity enforcement alone guarantees the reported stability and generalization.

    Authors: We agree that the manuscript would benefit from explicit quantitative evidence on non-polynomial profiles to support the generalization claims. The local stencil-scale normalization enforces semilinearity and affine invariance, which resolves the multi-valued mapping identified in (i) by ensuring the learned operator depends only on normalized curvature; this architectural choice is independent of the training distribution. Polynomial profiles were selected to isolate the effect of formal order (via degree) while spanning smooth regimes. To address the concern directly, the revised manuscript will include error tables, stability tests, and shape-preservation metrics for high-frequency sinusoids and near-discontinuities. revision: yes

  2. Referee: [Abstract (iii)] Abstract (iii): the assertion that the new data-driven flux limiter outperforms OSTVD3 in shape preservation is load-bearing for the practical utility claim, yet the manuscript supplies no side-by-side comparison of total-variation or local-extrema counts on a standardized test suite; without these metrics it is unclear whether the reported improvement is robust or confined to the polynomial training distribution.

    Authors: We acknowledge that the current manuscript relies on visual comparisons and aggregate error measures rather than explicit total-variation or local-extrema counts on a standardized suite. The data-driven limiter introduces controlled antidiffusion only in near-linear regions while preserving monotonicity elsewhere, which is shown to improve shape preservation relative to OSTVD3 on the tested profiles. To make the comparison more rigorous and demonstrate robustness, the revised version will add side-by-side total-variation and local-extrema counts on a standardized test suite that includes both polynomial and non-polynomial cases. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external citation and empirical training outcomes

full rationale

The paper's central results (i-iv) are obtained by constructing neural networks, training them on polynomial cell-average data, and measuring stability/generalization on held-out profiles. Semilinearity is imported from the external reference Lin and Rood 1998; the local stencil-scale normalization is presented as an architectural choice that produces affine invariance, not as a quantity defined in terms of the target stability metric. No equation or fitted parameter is renamed as a prediction, no self-citation chain is load-bearing, and the polynomial-training distribution is an explicit, falsifiable modeling decision rather than a tautology. The derivation chain therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that semilinearity is a desirable guiding property and that polynomial profiles are representative training data for general cases.

free parameters (1)
  • Polynomial degree for training
    Chosen to set the formal order of accuracy of the resulting discretization.
axioms (1)
  • domain assumption Semilinearity as defined by Lin and Rood 1998 produces invariance under affine transformations and thereby stability.
    Invoked to justify the architectural choice of local stencil-scale normalization.

pith-pipeline@v0.9.1-grok · 5754 in / 1194 out tokens · 53594 ms · 2026-06-27T00:16:44.251092+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 8 canonical work pages

  1. [1]

    Balaji, F

    V. Balaji, F. Couvreux, J. Deshayes, J. Gautrais, F. Hourdin, C. Rio, Are general circulation models obsolete?, Proceedings of the National Academy of Sciences 119 (2022) e2202075119

  2. [2]

    J. M. Stone, T. A. Gardiner, P. Teuben, J. F. Hawley, J. B. Simon, Athena: A New Code for Astrophysical MHD, The Astrophysical Journal Supplement Series 178 (2008) 137

  3. [3]

    J. H. Ferziger, M. Perić, Computational Methods for Fluid Dynamics, Springer, Berlin, Heidelberg, 2002. URL:http://link.springer. com/10.1007/978-3-642-56026-2. doi:10.1007/978-3-642-56026-2

  4. [4]

    Zanna, W

    L. Zanna, W. Gregory, P. Perezhogin, A. Sane, C. Zhang, A. Adcroft, M. Bushuk, C. Fernandez-Granda, B. Reichl, D. Balwada, J. Busecke, W. Chapman, A. Connolly, D. Du, K. Everard, F. Falasca, R. Falga, D. Kamm, E. Meunier, Q. Liu, A. Nasser, M. Pudig, A. Shao, J. L. Simpson,L.Vogt,J.Wu,AFrameworkforHybridPhysics-AICoupledOceanModels,2025.URL:http://arxiv.o...

  5. [5]

    Bar-Sinai, S

    Y. Bar-Sinai, S. Hoyer, J. Hickey, M. P. Brenner, Learning data-driven discretizations for partial differential equations, Proceedings of the National Academy of Sciences 116 (2019) 15344–15349. A.-A. Nasser and A. Adcroft:Preprint submitted to ElsevierPage 20 of 22 Design principles for stable and generalizable data-driven discretizations

  6. [6]

    Zhuang, D

    J. Zhuang, D. Kochkov, Y. Bar-Sinai, M. P. Brenner, S. Hoyer, Learned discretizations for passive scalar advection in a two-dimensional turbulent flow, Physical Review Fluids 6 (2021) 064605

  7. [7]

    V.Morand,N.Müller,R.Weightman,B.Piccoli,A.Keimer,A.M.Bayen, Deeplearningoffirst-ordernonlinearhyperbolicconservationlaw solvers, Journal of Computational Physics 511 (2024) 113114

  8. [8]

    Stevens, T

    B. Stevens, T. Colonius, Enhancement of shock-capturing methods via machine learning, Theoretical and Computational Fluid Dynamics 34 (2020) 483–496

  9. [9]

    I.Timofeyev,A.Schwarzmann,D.Kuzmin,Applicationofmachinelearningandconvexlimitingtosubgridfluxmodelingintheshallow-water equations, Mathematics and Computers in Simulation 238 (2025) 163–178

  10. [10]

    Kochkov, J

    D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, S. Hoyer, Machine learning–accelerated computational fluid dynamics, Proceedings of the National Academy of Sciences 118 (2021) e2101784118

  11. [11]

    Alieva, S

    A. Alieva, S. Hoyer, M. Brenner, G. Iaccarino, P. Norgaard, Toward accelerated data-driven Rayleigh-Bénard convection simulations, The European Physical Journal. E, Soft Matter 46 (2023) 64

  12. [12]

    P. Lax, B. Wendroff, Systems of conservation laws, Communications on Pure and Applied Mathematics 13 (1960) 217–237. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160130205

  13. [13]

    S.-J. Lin, R. B. Rood, Multidimensional Flux-Form Semi-Lagrangian Transport Schemes, Monthly Weather Review 124 (1996) 2046–2070

  14. [14]

    T.Beucler,P.Gentine,J.Yuval,A.Gupta,L.Peng,J.Lin,S.Yu,S.Rasp,F.Ahmed,P.A.O’Gorman,J.D.Neelin,N.J.Lutsko,M.Pritchard, Climate-invariant machine learning, Science Advances 10 (2024)

  15. [15]

    T. Kim, J. Kim, Y. Tae, C. Park, J.-H. Choi, J. Choo, Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift, 2021. URL:https://openreview.net/forum?id=cGDAkQo1C0p

  16. [16]

    R. J. LeVeque, Numerical Methods for Conservation Laws, Birkhäuser, Basel, 1992. URL:http://link.springer.com/10.1007/ 978-3-0348-8629-1. doi:10.1007/978-3-0348-8629-1

  17. [17]

    Colella, P

    P. Colella, P. R. Woodward, The Piecewise Parabolic Method (PPM) for gas-dynamical simulations, Journal of Computational Physics 54 (1984) 174–201

  18. [18]

    Arora, P

    M. Arora, P. L. Roe, A Well-Behaved TVD Limiter for High-Resolution Calculations of Unsteady Flow, Journal of Computational Physics 132 (1997) 3–11

  19. [19]

    Harten, High resolution schemes for hyperbolic conservation laws, Journal of Computational Physics 49 (1983) 357–393

    A. Harten, High resolution schemes for hyperbolic conservation laws, Journal of Computational Physics 49 (1983) 357–393

  20. [20]

    Colella, M

    P. Colella, M. D. Sekora, A limiter for PPM that preserves accuracy at smooth extrema, Journal of Computational Physics 227 (2008) 7069–7076

  21. [21]

    Zhang, C

    D. Zhang, C. Jiang, D. Liang, L. Cheng, A review on TVD schemes and a refined flux-limiter for steady-state calculations, Journal of Computational Physics 302 (2015) 114–154

  22. [22]

    Nguyen-Fotiadis, M

    N. Nguyen-Fotiadis, M. McKerns, A. Sornborger, Machine learning changes the rules for flux limiters, Physics of Fluids 34 (2022) 085136

  23. [23]

    Nguyen-Fotiadis, R

    N. Nguyen-Fotiadis, R. Chiodi, M. McKerns, D. Livescu, A. Sornborger, Probabilistic flux limiters, Physics of Fluids 37 (2025) 046112

  24. [24]

    Huang, A

    C. Huang, A. S. Sebastian, V. Viswanathan, Learning second-order TVD flux limiters using differentiable solvers, 2025. URL:http: //arxiv.org/abs/2503.09625. doi:10.48550/arXiv.2503.09625, arXiv:2503.09625 [physics]

  25. [25]

    P. Roe, M. Baines, Asymptotic behaviour of some non-linear schemes for linear advection, Notes on Numerical Fluid Mechanics 7 (1983) 283–290

  26. [26]

    713–1018

    R.Eymard,T.Gallouët,R.Herbin, Finitevolumemethods, in:SolutionofEquationinRn(Part3),TechniquesofScientificComputing(Part 3), volume 7 ofHandbook of Numerical Analysis, Elsevier, 2000, pp. 713–1018. URL:https://www.sciencedirect.com/science/ article/pii/S1570865900070058. doi:https://doi.org/10.1016/S1570-8659(00)07005-8

  27. [27]

    V. Daru, C. Tenaud, High order one-step monotonicity-preserving schemes for unsteady compressible flow calculations, Journal of Computational Physics 193 (2004) 563–594

  28. [28]

    Del Pino, H

    S. Del Pino, H. Jourdren, Arbitrary high-order schemes for the linear advection and wave equations: application to hydrodynamics and aeroacoustics, Comptes Rendus Mathematique 342 (2006) 441–446

  29. [29]

    Y.Wang,C.-Y.Lai, Multi-stageneuralnetworks:Functionapproximatorofmachineprecision, JournalofComputationalPhysics504(2024) 112865

  30. [30]

    Lipnikov, D

    K. Lipnikov, D. Svyatskiy, Y. Vassilevski, Minimal stencil finite volume scheme with the discrete maximum principle, Russian Journal of Numerical Analysis and Mathematical Modelling 27 (2012)

  31. [31]

    P. L. Roe, Characteristic-based schemes for the euler equations, Annual Review of Fluid Mechanics 18 (1986) 337–365. ADS Bibcode: 1986AnRFM..18..337R

  32. [32]

    Woodfield, H

    J. Woodfield, H. Weller, C. J. Cotter, New limiter regions for multidimensional flows, Journal of Computational Physics 515 (2024) 113286

  33. [33]

    S.Spekreijse, Multigridsolutionofmonotonesecond-orderdiscretizationsofhyperbolicconservationlaws, MathematicsofComputation49 (1987) 135–155

  34. [34]

    R. G. Patel, I. Manickam, N. A. Trask, M. A. Wood, M. Lee, I. Tomas, E. C. Cyr, Thermodynamically consistent physics-informed neural networks for hyperbolic systems, Journal of Computational Physics 449 (2022) 110754

  35. [35]

    G. d. Romémont, F. Renac, F. Chinesta, J. Nunez, D. Gueyffier, Data-Driven Adaptive Gradient Recovery for Unstructured Finite Volume Computations, 2025. URL:http://arxiv.org/abs/2507.16571. doi:10.48550/arXiv.2507.16571, arXiv:2507.16571 [math]

  36. [36]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, Adaptive Computation and Machine Learning series, MIT Press, Cambridge, MA, USA, 2016. URL:https://mitpress.mit.edu/9780262035613/deep-learning/

  37. [37]

    Friedman (Eds.), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, 2009, pp

    T.Hastie,R.Tibshirani,J.Friedman, TheElementsofStatisticalLearning:DataMining,Inference,andPrediction, in:T.Hastie,R.Tibshirani, J. Friedman (Eds.), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, 2009, pp. 1–8. URL:https://doi.org/10.1007/978-0-387-84858-7_1. doi:10.1007/978-0-387-84858-7_1

  38. [38]

    J. H. Friedman, Multivariate Adaptive Regression Splines, The Annals of Statistics 19 (1991) 1–67. A.-A. Nasser and A. Adcroft:Preprint submitted to ElsevierPage 21 of 22 Design principles for stable and generalizable data-driven discretizations

  39. [39]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High- Performance Deep Learning Library, in: Advances in Neural Information Processing Syste...