super hub Mixed citations

PyTorch: An Imperative Style, High-Performance Deep Learning Library

· 2019 · cs.LG · arXiv 1912.01703

Mixed citation behavior. Most common role is background (53%).

158 Pith papers citing it

Background 53% of classified citations

open full Pith review browse 158 citing papers arXiv PDF

abstract

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 18 method 11 dataset 1

citation-polarity summary

background 16 use method 11 unclear 2 use dataset 1

claims ledger

abstract Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o

co-cited works

representative citing papers

Efficient Training on Multiple Consumer GPUs with RoundPipe

cs.DC · 2026-04-29 · conditional · novelty 8.0

RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller

cs.RO · 2025-12-22 · conditional · novelty 8.0

First in-orbit demonstration of a DRL-trained AI satellite attitude controller that performs robust inertial pointing after sim-to-real transfer.

Automated discovery of heralded ballistic graph state generators for fusion-based photonic quantum computation

quant-ph · 2025-08-22 · unverdicted · novelty 8.0

A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

ffortissimo: A Freeform Forward-Modeling Pipeline for High-Contrast Images of Circumstellar Disks Based on Automatic Differentiation

astro-ph.IM · 2026-06-22 · unverdicted · novelty 7.0

ffortissimo is a JAX-based freeform forward-modeling pipeline that fits complex dust distributions and infers scattering properties in KLIP-reduced images of circumstellar disks such as HR 4796A.

A matrix-free, differentiable PyTorch solver for phase-field fracture: Formulation, benchmarks, and inverse analysis

cs.CE · 2026-06-22 · unverdicted · novelty 7.0

A matrix-free, GPU-compatible PyTorch implementation of phase-field fracture with explicit dynamics, custom differentiable implicit damage solve, benchmarks on dynamic and quasi-static cases, and inverse recovery of fracture energy G_c via L-BFGS.

Attention-based optimizer for symmetry finding

quant-ph · 2026-05-28 · unverdicted · novelty 7.0

A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.

A Fast Method to Compute Scalar Induced Gravitational Waves on a Lattice with Primordial Non-Gaussianities

astro-ph.CO · 2026-05-26 · unverdicted · novelty 7.0

A new lattice method recasts SIGW integrals as FFT convolutions to compute fully non-Gaussian spectra in seconds with ~10% error on a radiation-dominated background.

Forecasting megaelectron-volt electron flux in the Earth's outer radiation belt using supervised machine learning algorithms and a timeseries foundation model

astro-ph.IM · 2026-05-15 · unverdicted · novelty 7.0

Hybrid TimesFM plus ridge regression on covariates forecasts 1-MeV electron flux with average R² of 0.9 on out-of-sample 2024 data, outperforming linear regression, CNN, LSTM and Transformer models.

Reconstructing the Stripping History of the Sagittarius Stream with Neural Networks

astro-ph.GA · 2026-05-14 · unverdicted · novelty 7.0

A neural network trained on simulations infers stripping times for Sagittarius stream stars from phase-space data, measuring a 0.3 dex/Gyr metallicity gradient and estimating ages for globular clusters such as Pal 12 and NGC 2419.

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

cs.MA · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.

Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction

q-bio.BM · 2026-05-12 · unverdicted · novelty 7.0

Vibrational mode graphs from molecular dynamics enable sequence-free protein function prediction via graph neural networks, with entrainment improving signals for collective dynamics.

End-to-End Population Inference from Gravitational-Wave Strain using Transformers

gr-qc · 2026-05-11 · unverdicted · novelty 7.0

Dingo-Pop uses a transformer to perform amortized, end-to-end population inference from GW strain data in seconds, bypassing per-event Monte Carlo sampling.

Learning reveals invisible structure in low-rank RNNs

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.

Dynamical magnetotropic susceptibility as a new probe of Kitaev materials and beyond

cond-mat.str-el · 2026-05-01 · unverdicted · novelty 7.0

Dynamical magnetotropic susceptibility k(ω) acts as a probe of uniform spin and charge fluctuations, with its static scaling in α-RuCl3 arising specifically from dominant Kitaev interactions in the models examined.

Sampling two-dimensional spin systems with transformers

cond-mat.dis-nn · 2026-04-30 · unverdicted · novelty 7.0

Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural samplers at criticality.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Big Dipper, Help Me Find A Way -- Dip-hunting at hadron colliders

hep-ph · 2026-04-28 · unverdicted · novelty 7.0

Parametric neural networks learn likelihood ratios to infer top-philic scalar resonances from dip patterns caused by signal-background interference in hadron collider data.

Graph-Conditioned Meta-Optimizer for QAOA Parameter Generation on Multiple Problem Classes

quant-ph · 2026-04-28 · unverdicted · novelty 7.0

A graph-conditioned meta-optimizer learns QAOA parameter trajectories from one problem class and transfers them to others, yielding better initializations than standard methods in an empirical study of 64 settings.

Rates of forgetting for the sequentially Markov coalescent

math.PR · 2026-04-22 · unverdicted · novelty 7.0

SMC forgets its initial condition geometrically in the jump chain and as 1/ℓ in continuous genetic distance, justifying independent-locus approximations.

Concept Graph Convolutions: Message Passing in the Concept Space

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.

A Non-stationary, Amortized, Transfer Learning Approach for Modeling Italian Air Quality

stat.AP · 2026-04-20 · unverdicted · novelty 7.0

A neural network learns non-stationary anisotropic correlations from gridded CTM outputs and transfers the structure via LatticeKrig basis functions to station data for refined fine-scale NO2 predictions with uncertainty.

citing papers explorer

Showing 40 of 40 citing papers after filters.

Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 18 · internal anchor
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
Editing Models with Task Arithmetic cs.LG · 2022-12-08 · accept · none · ref 81 · internal anchor
Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.
Learning reveals invisible structure in low-rank RNNs cs.LG · 2026-05-05 · unverdicted · none · ref 55 · internal anchor
Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.
Concept Graph Convolutions: Message Passing in the Concept Space cs.LG · 2026-04-22 · unverdicted · none · ref 45 · internal anchor
Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.
RLGT: A reinforcement learning framework for extremal graph theory cs.LG · 2026-02-19 · unverdicted · none · ref 39 · internal anchor
RLGT is a modular reinforcement learning framework for extremal graph theory that handles undirected, directed, looped, and multi-colored graphs to facilitate future research.
KernelBench: Can LLMs Write Efficient GPU Kernels? cs.LG · 2025-02-14 · accept · none · ref 28 · internal anchor
KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 46 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning cs.LG · 2026-05-20 · unverdicted · none · ref 27 · internal anchor
Position-Weighted On-Policy Self-Distillation (PW-OPSD) weights later tokens more heavily after a diagnostic shows position predicts teacher reliability better than entropy, yielding +1.0 and +1.1 Avg@12 gains on AIME 2024/2025.
POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles cs.LG · 2026-05-08 · unverdicted · none · ref 92 · internal anchor
POETS uses compute-efficient LLM policy ensembles to implicitly perform KL-regularized Thompson sampling, delivering O(sqrt(T gamma_T)) regret bounds and state-of-the-art sample efficiency in scientific discovery tasks such as protein search and quantum circuit design.
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 102 · internal anchor
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
Why Does Agentic Safety Fail to Generalize Across Tasks? cs.LG · 2026-05-07 · conditional · none · ref 87 · internal anchor
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning cs.LG · 2026-04-14 · conditional · none · ref 26 · internal anchor
TCL delivers 16.8x faster tuning on CPU and 12.48x on GPU with modestly lower inference latency by combining RDU active sampling, a lightweight Mamba cost model, and cross-platform continual knowledge distillation.
Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters cs.LG · 2026-04-03 · unverdicted · none · ref 10 · internal anchor
PoLAR-VBLL combines orthogonalized low-rank adapters with variational Bayesian last-layer inference to enable scalable, well-calibrated uncertainty quantification in fine-tuned LLMs.
Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models cs.LG · 2026-04-02 · unverdicted · none · ref 42 · internal anchor
MI-VAE generates physics-constrained synthetic trajectories from scarce real data to improve offline RL policy performance on planetary lander tasks over standard VAEs.
The hidden risks of temporal resampling in clinical reinforcement learning cs.LG · 2026-02-06 · conditional · none · ref 65 · internal anchor
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
Learning from Historical Activations in Graph Neural Networks cs.LG · 2026-01-03 · unverdicted · none · ref 17 · internal anchor
HISTOGRAPH applies unified layer-wise attention followed by node-wise attention over historical GNN activations to improve graph classification, especially in deep models.
Understanding the Staged Dynamics of Transformers in Learning Latent Structure cs.LG · 2025-11-24 · unverdicted · none · ref 4 · internal anchor
Transformers learn latent structure components in discrete stages during training, composing rules more robustly than decomposing complex examples, with identified layer plasticity windows.
Thermodynamically consistent machine learning model for excess Gibbs energy cs.LG · 2025-09-08 · unverdicted · none · ref 44 · internal anchor
HANNA is a thermodynamically consistent ML model for predicting excess Gibbs energy from molecular structures, trained on various binary mixture data and extended to multi-component mixtures using geometric projection.
Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs cs.LG · 2025-08-21 · unverdicted · none · ref 10 · internal anchor
Introduces layer-wise learning signals combining knowledge distillation and local errors into Equilibrium Propagation, enabling scalable training of deep VGG-style CRNNs with SOTA results on CIFAR-10 and CIFAR-100.
Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence cs.LG · 2025-06-15 · unverdicted · none · ref 22 · internal anchor
New unsupervised method adapts the multivariate logrank statistic into a differentiable loss for training any neural network on any data modality to discover prognostically distinct patient clusters, demonstrated on myeloma lab data and lung cancer CT images with post-hoc explainability.
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics cs.LG · 2025-06-02 · unverdicted · none · ref 32 · internal anchor
SmolVLA is a small efficient VLA model that achieves performance comparable to 10x larger models while training on one GPU and deploying on consumer hardware via community data and chunked asynchronous action prediction.
MONAI: An open-source framework for deep learning in healthcare cs.LG · 2022-11-04 · accept · none · ref 6 · internal anchor
MONAI is a community-supported PyTorch framework that extends deep learning to medical data with domain-specific architectures, transforms, and deployment tools.
Continual Learning for Sequential Personalization of Small Language Models: A Stability Monitoring Analysis cs.LG · 2026-06-26 · unverdicted · none · ref 8 · internal anchor
Checkpoint monitoring during sequential LoRA adaptation of SLMs reveals instability patterns via reference set diagnostics that standard task metrics can miss.
Reducing Experimental Testing in Space Propulsion Film Cooling Analyses by Pixelwise Generative Image Interpolation cs.LG · 2026-05-28 · unverdicted · none · ref 23 · internal anchor
A feed-forward neural network with positional encoding generates film cooling images from 30% fewer experimental measurements while achieving RMSE below 8% and SSIM above 93%.
Physics-Informed Graph Neural Network Surrogates for Turbulent Nanoparticle Dispersion in Dental Clinical Environments cs.LG · 2026-05-19 · unverdicted · none · ref 45 · internal anchor
ELGIN is a graph-based physics-informed surrogate model that predicts carrier flow and polydisperse particle motion in dental aerosol scenarios, achieving lower tracking errors and 37x speedup versus full OpenFOAM CFD in a preliminary single-case test.
ERPPO: Entropy Regularization-based Proximal Policy Optimization cs.LG · 2026-05-13 · unverdicted · none · ref 44 · internal anchor
ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments cs.LG · 2026-05-01 · unverdicted · none · ref 22 · internal anchor
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models cs.LG · 2026-04-24 · unverdicted · none · ref 66 · internal anchor
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
Revisiting Neural Activation Coverage for Uncertainty Estimation cs.LG · 2026-04-24 · unverdicted · none · ref 6 · internal anchor
Neural activation coverage can be adapted to provide uncertainty estimates in regression that the authors' experiments show are more meaningful than Monte-Carlo Dropout.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 127 · internal anchor
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling cs.LG · 2026-04-15 · unverdicted · none · ref 14 · 2 links · internal anchor
PRiMeFlow applies flow matching in gene expression space with a U-Net velocity field and pretraining-finetuning to model perturbation-induced heterogeneity, showing strong benchmark performance on PerturBench and the ARC Virtual Cell Challenge.
General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations cs.LG · 2026-04-02 · unverdicted · none · ref 30 · internal anchor
GEN is a neural network that solves PDEs by constructing explicit function approximations from basis functions based on prior PDE knowledge, yielding more robust and extensible solutions than standard PINNs.
Efficient compression of neural networks and datasets cs.LG · 2025-05-23 · unverdicted · none · ref 50 · internal anchor
Refined probabilistic and smooth l0 pruning techniques approximate minimum description length for neural networks, achieving high compression with minimal accuracy loss and empirically verifying better sample efficiency and generalization on image and text tasks.
Learning to model pediatric asthma exacerbation from multiple risk factors: a case study in coastal Virginia cs.LG · 2026-06-04 · unverdicted · none · ref 60 · internal anchor
A case study develops a sparse dictionary learning approach to model pediatric asthma exacerbations from multiple risk factors and reports consensus on relative risks across statistical and machine learning models.
Identifying Gems from Roman RAPIDly cs.LG · 2026-06-03 · unverdicted · none · ref 23 · internal anchor
Machine learning models RuBR_comb, RuBR_loc, and RuBR_DA for real-bogus classification of transients using combined simulated data and domain adaptation for the Roman RAPID pipeline.
Libra: Efficient Resource Management for Agentic RL Post-Training cs.LG · 2026-06-02 · unverdicted · none · ref 37 · internal anchor
Libra optimizes GPU allocation across rollout and training in agentic RL via an elastic hybrid pool and C-MLFQ scheduler based on tool-return causal signals, claiming up to 3.0x throughput and 2.5x faster reward convergence on 48 A800 GPUs.
QuChaTeR: A Hybrid Quantum-Chaotic Temporal Framework for Earthquake Prediction cs.LG · 2026-05-14 · unverdicted · none · ref 23 · internal anchor
QuChaTeR hybridizes chaotic maps and variational quantum circuits with recurrent networks and wavelets to achieve faster convergence and better performance than classical and quantum-inspired baselines on real seismic datasets.
Dynamics-Encoded Deep Learning for Robust System Identification and Parameter Estimation cs.LG · 2024-10-05 · unverdicted · none · ref 60 · internal anchor
Dynamics-encoded deep learning approaches are developed for system identification and parameter estimation in dynamical systems using numerical discretization schemes.
Block-Based Double Decoders cs.LG · 2026-05-11 · unreviewed · ref 15 · internal anchor
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels cs.LG · 2026-03-13 · unreviewed · ref 53 · internal anchor

PyTorch: An Imperative Style, High-Performance Deep Learning Library

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer