pith. machine review for the scientific record. sign in

arxiv: 1912.01703 · v1 · submitted 2019-12-03 · 💻 cs.LG · cs.MS· stat.ML

Recognition: unknown

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas K\"opf , Edward Yang , Zach DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , Soumith Chintala

Authors on Pith no claims yet
classification 💻 cs.LG cs.MSstat.ML
keywords pytorchlearningdeepimperativeimplementationlibraryspeedstyle
0
0 comments X
read the original abstract

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Efficient Training on Multiple Consumer GPUs with RoundPipe

    cs.DC 2026-04 conditional novelty 8.0

    RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on...

  2. Stability and Generalization in Looped Transformers

    cs.LG 2026-04 unverdicted novelty 8.0

    Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant ...

  3. Traces of Helium Detected in Type Ic Supernova 2014L

    astro-ph.HE 2026-03 accept novelty 8.0

    Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

  4. Editing Models with Task Arithmetic

    cs.LG 2022-12 accept novelty 8.0

    Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

  5. Reconstructing the Stripping History of the Sagittarius Stream with Neural Networks

    astro-ph.GA 2026-05 unverdicted novelty 7.0

    A neural network trained on simulations infers stripping times for Sagittarius stream stars from phase-space data, measuring a 0.3 dex/Gyr metallicity gradient and estimating ages for globular clusters such as Pal 12 ...

  6. Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

    cs.MA 2026-05 unverdicted novelty 7.0

    Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.

  7. End-to-End Population Inference from Gravitational-Wave Strain using Transformers

    gr-qc 2026-05 unverdicted novelty 7.0

    Dingo-Pop uses a transformer to perform amortized, end-to-end population inference from GW strain data in seconds, bypassing per-event Monte Carlo sampling.

  8. Learning reveals invisible structure in low-rank RNNs

    cs.LG 2026-05 unverdicted novelty 7.0

    Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.

  9. Dynamical magnetotropic susceptibility as a new probe of Kitaev materials and beyond

    cond-mat.str-el 2026-05 unverdicted novelty 7.0

    Dynamical magnetotropic susceptibility k(ω) acts as a probe of uniform spin and charge fluctuations, with its static scaling in α-RuCl3 arising specifically from dominant Kitaev interactions in the models examined.

  10. Sampling two-dimensional spin systems with transformers

    cond-mat.dis-nn 2026-04 unverdicted novelty 7.0

    Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural sam...

  11. Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

    astro-ph.GA 2026-04 unverdicted novelty 7.0

    A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

  12. Big Dipper, Help Me Find A Way -- Dip-hunting at hadron colliders

    hep-ph 2026-04 unverdicted novelty 7.0

    Parametric neural networks learn likelihood ratios to infer top-philic scalar resonances from dip patterns caused by signal-background interference in hadron collider data.

  13. Graph-Conditioned Meta-Optimizer for QAOA Parameter Generation on Multiple Problem Classes

    quant-ph 2026-04 unverdicted novelty 7.0

    A graph-conditioned meta-optimizer learns QAOA parameter trajectories from one problem class and transfers them to others, yielding better initializations than standard methods in an empirical study of 64 settings.

  14. Rates of forgetting for the sequentially Markov coalescent

    math.PR 2026-04 unverdicted novelty 7.0

    SMC forgets its initial condition geometrically in the jump chain and as 1/ℓ in continuous genetic distance, justifying independent-locus approximations.

  15. Concept Graph Convolutions: Message Passing in the Concept Space

    cs.LG 2026-04 unverdicted novelty 7.0

    Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.

  16. A Non-stationary, Amortized, Transfer Learning Approach for Modeling Italian Air Quality

    stat.AP 2026-04 unverdicted novelty 7.0

    A neural network learns non-stationary anisotropic correlations from gridded CTM outputs and transfers the structure via LatticeKrig basis functions to station data for refined fine-scale NO2 predictions with uncertainty.

  17. Probing the 3D Structures of Supernovae through IR Signatures of CO and SiO

    astro-ph.HE 2026-04 unverdicted novelty 7.0

    MOFAT applied to SN2024ggi shows CO triggering inner SiO formation with a receding edge, order-of-magnitude mass drop, clumping signatures, and no dust formation.

  18. Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

    cs.PL 2026-04 unverdicted novelty 7.0

    A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compi...

  19. Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality

    cs.AR 2026-04 unverdicted novelty 7.0

    The Tensor Memory Engine provides on-the-fly data reorganization to achieve ideal memory locality for CPU computations in edge systems.

  20. Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

    q-bio.QM 2026-04 unverdicted novelty 7.0

    Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and show...

  21. How pore-scale disorder controls fluid stretching in porous media

    physics.flu-dyn 2026-04 unverdicted novelty 7.0

    Pore-scale disorder accelerates fluid stretching in porous media, producing quadratic time growth and faster mixing than the linear growth seen in ordered structures.

  22. Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

    cs.HC 2026-03 unverdicted novelty 7.0

    XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.

  23. Diffusion Models Beat GANs on Image Synthesis

    cs.LG 2021-05 accept novelty 7.0

    Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

  24. Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

    cs.MA 2026-05 unverdicted novelty 6.0

    Proposes an event-triggered MARL framework with Neural Manifold Diversity and event-based hypernetworks to enable dynamic, agent-agnostic behavioral transitions while preserving reward maximization.

  25. Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding

    cs.CV 2026-05 unverdicted novelty 6.0

    A new framework combines self-attention on the Oblique manifold with bidirectional geodesic cross-attention on the Lorentz hyperboloid to improve both localization accuracy and descriptive coherence in 3D dense captioning.

  26. POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles

    cs.LG 2026-05 unverdicted novelty 6.0

    POETS uses compute-efficient LLM policy ensembles to implicitly perform KL-regularized Thompson sampling, delivering O(sqrt(T gamma_T)) regret bounds and state-of-the-art sample efficiency in scientific discovery task...

  27. What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

    cs.LG 2026-05 unverdicted novelty 6.0

    MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.

  28. Why Does Agentic Safety Fail to Generalize Across Tasks?

    cs.LG 2026-05 conditional novelty 6.0

    Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstr...

  29. Decoding Alignment without Encoding Alignment: A critique of similarity analysis in neuroscience

    q-bio.NC 2026-05 unverdicted novelty 6.0

    Decoding alignment metrics can remain high and unchanged even when encoding manifold topology is causally altered, so they do not imply similar function or computation across neural populations.

  30. Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning

    cs.MM 2026-05 unverdicted novelty 6.0

    SeqLight maps music to multi-light HSV control via SkipBART for global color prediction followed by hybrid imitation learning in a goal-conditioned MDP to decompose colors across lights.

  31. Euclid preparation. CosmoPostProcess: A simulation calibrated framework for weak lensing selection bias in richness-selected galaxy clusters

    astro-ph.CO 2026-05 unverdicted novelty 6.0

    CosmoPostProcess delivers simulation-calibrated radial corrections for projection-induced selection bias (20-40% amplitude near 1 h^{-1} Mpc) and baryonic effects in Euclid richness-selected cluster weak lensing profiles.

  32. ClarifySTL: An Interactive LLM Agent Framework for STL Transformation through Requirements Clarification

    cs.SE 2026-05 unverdicted novelty 6.0

    ClarifySTL uses LLM agents to interactively detect and resolve vagueness and ambiguity in natural language requirements via clarification queries before generating STL formulas, with evaluations on existing and new be...

  33. Compressibility of micromagnetic solutions in tensor train format

    cond-mat.mes-hall 2026-04 unverdicted novelty 6.0

    Tensor-train compressed micromagnetic solutions for flux-closure states in soft-magnetic prisms scale as L^{1.8} and (1/a)^{1.2} by exploiting spatial sparsity in domain walls versus uniform domains.

  34. Learning Sparse BRDF Measurement Samples from Image

    cs.CV 2026-04 unverdicted novelty 6.0

    A sampler network learns to select informative sparse BRDF measurement directions by optimizing against a fixed pretrained hypernetwork reconstructor and differentiable renderer, improving low-budget reconstruction on...

  35. A Physics Informed Bayesian Neural Network for the Neutron Star Equation of State

    astro-ph.HE 2026-04 unverdicted novelty 6.0

    A physics-informed Bayesian neural network learns neutron-star equations of state from theoretical priors and constraints, then generates posterior mass-radius and mass-tidal-deformability distributions consistent wit...

  36. Data-Driven Acceleration of Eccentricity Reduction for Binary Black Hole Simulations

    gr-qc 2026-04 unverdicted novelty 6.0

    A Gaussian Process Regression model trained on an archive of eccentricity-reduced binary black hole simulations predicts initial conditions that achieve low eccentricity with zero or one iteration.

  37. JAX-BEM: Gradient-Based Acoustic Shape Optimisation via a Differentiable Boundary Element Method

    cs.CE 2026-04 unverdicted novelty 6.0

    A JAX-based differentiable BEM solver matches traditional BEM accuracy on benchmarks and supports gradient-driven acoustic geometry optimization.

  38. PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

    cs.LG 2026-04 unverdicted novelty 6.0

    PRiMeFlow is a flow-matching model that approximates the full empirical distribution of single-cell gene expression after perturbations.

  39. TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

    cs.LG 2026-04 conditional novelty 6.0

    TCL delivers 16.8x faster tuning on CPU and 12.48x on GPU with modestly lower inference latency by combining RDU active sampling, a lightweight Mamba cost model, and cross-platform continual knowledge distillation.

  40. Particle transformers for identifying Lorentz-boosted Higgs bosons decaying to a pair of W bosons

    hep-ex 2026-04 unverdicted novelty 6.0

    PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.

  41. Minimising Willmore Energy via Neural Flow

    math.DG 2026-04 unverdicted novelty 6.0

    Neural networks minimize Willmore energy on embedded surfaces, recovering the round sphere and Clifford torus while supplying a search procedure for genus-2 minimal surfaces.

  42. AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

    cs.CR 2026-04 unverdicted novelty 6.0

    AEGIS reduces inter-GPU communication by up to 81.3% in self-attention and reaches 96.62% scaling efficiency with 3.86x speedup on four GPUs for 2048-token encrypted Transformer inference.

  43. Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

    cs.LG 2026-04 unverdicted novelty 6.0

    PoLAR-VBLL combines orthogonalized low-rank adapters with variational Bayesian last-layer inference to enable scalable, well-calibrated uncertainty quantification in fine-tuned LLMs.

  44. Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models

    cs.LG 2026-04 unverdicted novelty 6.0

    MI-VAE generates physics-constrained synthetic trajectories from scarce real data to improve offline RL policy performance on planetary lander tasks over standard VAEs.

  45. LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

    cs.LG 2026-03 unverdicted novelty 6.0

    LeWM is the first end-to-end trainable JEPA from pixels that uses only two loss terms for stable training and fast planning on 2D/3D control tasks.

  46. SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

    cs.LG 2025-06 unverdicted novelty 6.0

    SmolVLA is a small efficient VLA model that achieves performance comparable to 10x larger models while training on one GPU and deploying on consumer hardware via community data and chunked asynchronous action prediction.

  47. Steering Llama 2 via Contrastive Activation Addition

    cs.CL 2023-12 unverdicted novelty 6.0

    Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.

  48. MONAI: An open-source framework for deep learning in healthcare

    cs.LG 2022-11 accept novelty 6.0

    MONAI is a community-supported PyTorch framework that extends deep learning to medical data with domain-specific architectures, transforms, and deployment tools.

  49. Search for pair production of additional neutral scalars within the Inert Doublet Model in a final state with two electrons or two muons in proton-proton collisions at $\sqrt{s}$ = 13 TeV and 13.6 TeV

    hep-ex 2026-05 accept novelty 5.0

    No significant excess found; new exclusion limits reach m_H = 108 GeV for m_H - m_A = 78 GeV in the Inert Doublet Model.

  50. ERPPO: Entropy Regularization-based Proximal Policy Optimization

    cs.LG 2026-05 unverdicted novelty 5.0

    ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

  51. Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data

    astro-ph.GA 2026-05 unverdicted novelty 5.0

    A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.

  52. Latency Analysis and Optimization of Alpamayo 1 via Efficient Trajectory Generation

    cs.AI 2026-05 unverdicted novelty 5.0

    Redesigning Alpamayo 1 to single-reasoning and optimizing diffusion action generation cuts inference latency by 69.23% while preserving trajectory diversity and prediction quality.

  53. Compositional Quantum Heuristics for Max-Clique Detection

    quant-ph 2026-05 unverdicted novelty 5.0

    Compositional quantum circuits with symmetry-induced invariant losses produce trainable equivariant quantum GNNs that generalize on max-clique problems and improve hybrid recursive search accuracy and scalability.

  54. AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

    cs.LG 2026-05 unverdicted novelty 5.0

    AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.

  55. MCMit: Mid-Circuit Measurement Error Mitigation

    quant-ph 2026-04 unverdicted novelty 5.0

    MCMit mitigates mid-circuit measurement errors via a new multi-control branch instruction, CNN and transformer discriminators, and software techniques, reporting up to 70% latency reduction and 80% lower logical error...

  56. Optimization of Model Splitting, Placement, and Chaining for Multi-hop Split Learning and Inference

    cs.NI 2026-04 unverdicted novelty 5.0

    An ILP model and BCD heuristic jointly optimize model splitting, node placement, and smashed-data routing in an SFC-based multi-hop split learning/inference architecture to minimize end-to-end latency.

  57. Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

    cs.LG 2026-04 unverdicted novelty 5.0

    Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.

  58. A New Adaptive Deep Learning based Reduced Order Model for Hybrid-Type Parabolic PDEs: Rigorous Error Analysis and Applications

    math.NA 2026-04 unverdicted novelty 5.0

    Two new DOD-based reduced-order models (DOD-DL-ROM and DOD+DFNN) are introduced for hybrid-type parabolic PDEs, with rigorous error bounds linking performance to optimal map regularity and conditions for outperforming...

  59. Revisiting Neural Activation Coverage for Uncertainty Estimation

    cs.LG 2026-04 unverdicted novelty 5.0

    Neural activation coverage can be adapted to provide uncertainty estimates in regression that the authors' experiments show are more meaningful than Monte-Carlo Dropout.

  60. The swept-back multipolar magnetic field of neutron stars: Application to NICER MSP J0030+0451

    astro-ph.HE 2026-04 conditional novelty 5.0

    A centered swept-back multipolar magnetic field up to octupole order reproduces the bolometric thermal X-ray light curve of MSP J0030+0451.