super hub Mixed citations

PyTorch: An Imperative Style, High-Performance Deep Learning Library

· 2019 · cs.LG · arXiv 1912.01703

Mixed citation behavior. Most common role is background (53%).

177 Pith papers citing it

Background 53% of classified citations

open full Pith review browse 177 citing papers arXiv PDF

abstract

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 18 method 11 dataset 1

citation-polarity summary

background 16 use method 11 unclear 2 use dataset 1

claims ledger

abstract Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o

co-cited works

representative citing papers

Efficient Training on Multiple Consumer GPUs with RoundPipe

cs.DC · 2026-04-29 · conditional · novelty 8.0

RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller

cs.RO · 2025-12-22 · conditional · novelty 8.0

First in-orbit demonstration of a DRL-trained AI satellite attitude controller that performs robust inertial pointing after sim-to-real transfer.

Automated discovery of heralded ballistic graph state generators for fusion-based photonic quantum computation

quant-ph · 2025-08-22 · unverdicted · novelty 8.0

A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

Sampling the Schwinger Model with Gauge-Equivariant Diffusion

hep-lat · 2026-06-25 · unverdicted · novelty 7.0

A gauge-equivariant diffusion model samples Schwinger model configurations, yielding unbiased observables matching MCMC and qualitatively less topological freezing than HMC.

ffortissimo: A Freeform Forward-Modeling Pipeline for High-Contrast Images of Circumstellar Disks Based on Automatic Differentiation

astro-ph.IM · 2026-06-22 · unverdicted · novelty 7.0

ffortissimo is a JAX-based freeform forward-modeling pipeline that fits complex dust distributions and infers scattering properties in KLIP-reduced images of circumstellar disks such as HR 4796A.

A matrix-free, differentiable PyTorch solver for phase-field fracture: Formulation, benchmarks, and inverse analysis

cs.CE · 2026-06-22 · unverdicted · novelty 7.0

A matrix-free, GPU-compatible PyTorch implementation of phase-field fracture with explicit dynamics, custom differentiable implicit damage solve, benchmarks on dynamic and quasi-static cases, and inverse recovery of fracture energy G_c via L-BFGS.

Reweighting Adversarial Networks for Unbinned Unfolding

hep-ph · 2026-06-04 · unverdicted · novelty 7.0

RANs generalize moment unfolding to full phase-space unbinned unfolding via detector-level Wasserstein critics without requiring support overlap or multiple iterations.

Attention-based optimizer for symmetry finding

quant-ph · 2026-05-28 · unverdicted · novelty 7.0

A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.

A Fast Method to Compute Scalar Induced Gravitational Waves on a Lattice with Primordial Non-Gaussianities

astro-ph.CO · 2026-05-26 · unverdicted · novelty 7.0

A new lattice method recasts SIGW integrals as FFT convolutions to compute fully non-Gaussian spectra in seconds with ~10% error on a radiation-dominated background.

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

ARBITER models reasoning trajectory basins in test-time sampling and uses model-internal signals to correct majority-vote failures, recovering part of the oracle gap on math benchmarks.

Forecasting megaelectron-volt electron flux in the Earth's outer radiation belt using supervised machine learning algorithms and a timeseries foundation model

astro-ph.IM · 2026-05-15 · unverdicted · novelty 7.0

Hybrid TimesFM plus ridge regression on covariates forecasts 1-MeV electron flux with average R² of 0.9 on out-of-sample 2024 data, outperforming linear regression, CNN, LSTM and Transformer models.

Reconstructing the Stripping History of the Sagittarius Stream with Neural Networks

astro-ph.GA · 2026-05-14 · unverdicted · novelty 7.0

A neural network trained on simulations infers stripping times for Sagittarius stream stars from phase-space data, measuring a 0.3 dex/Gyr metallicity gradient and estimating ages for globular clusters such as Pal 12 and NGC 2419.

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

cs.MA · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.

Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction

q-bio.BM · 2026-05-12 · unverdicted · novelty 7.0

Vibrational mode graphs from molecular dynamics enable sequence-free protein function prediction via graph neural networks, with entrainment improving signals for collective dynamics.

End-to-End Population Inference from Gravitational-Wave Strain using Transformers

gr-qc · 2026-05-11 · unverdicted · novelty 7.0

Dingo-Pop uses a transformer to perform amortized, end-to-end population inference from GW strain data in seconds, bypassing per-event Monte Carlo sampling.

Learning reveals invisible structure in low-rank RNNs

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.

Dynamical magnetotropic susceptibility as a new probe of Kitaev materials and beyond

cond-mat.str-el · 2026-05-01 · unverdicted · novelty 7.0

Dynamical magnetotropic susceptibility k(ω) acts as a probe of uniform spin and charge fluctuations, with its static scaling in α-RuCl3 arising specifically from dominant Kitaev interactions in the models examined.

Sampling two-dimensional spin systems with transformers

cond-mat.dis-nn · 2026-04-30 · unverdicted · novelty 7.0

Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural samplers at criticality.

Rendering-Aware Sparse Sampling for BRDF Acquisition

cs.CV · 2026-04-29 · unverdicted · novelty 7.0

Rendering-aware optimization of sparse BRDF samples via fixed reconstructor and differentiable renderer improves final rendered appearance over BRDF-space baselines.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Big Dipper, Help Me Find A Way -- Dip-hunting at hadron colliders

hep-ph · 2026-04-28 · unverdicted · novelty 7.0

Parametric neural networks learn likelihood ratios to infer top-philic scalar resonances from dip patterns caused by signal-background interference in hadron collider data.

citing papers explorer

Showing 42 of 42 citing papers after filters.

LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller cs.RO · 2025-12-22 · conditional · none · ref 30 · internal anchor
First in-orbit demonstration of a DRL-trained AI satellite attitude controller that performs robust inertial pointing after sim-to-real transfer.
Automated discovery of heralded ballistic graph state generators for fusion-based photonic quantum computation quant-ph · 2025-08-22 · unverdicted · none · ref 39 · internal anchor
A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.
Cobble: Compiling Block Encodings for Quantum Computational Linear Algebra cs.PL · 2025-11-03 · unverdicted · none · ref 45 · internal anchor
Cobble is a domain-specific language for quantum block encodings that compiles high-level matrix expressions to optimized circuits using analyses and quantum singular value transformation, achieving 2.6x-25.4x speedups over unoptimized baselines on benchmarks.
Atomistic Machine Learning with Irreducible Cartesian Natural Tensors cond-mat.mtrl-sci · 2025-10-05 · unverdicted · none · ref 64 · internal anchor
CarNet develops irreducible Cartesian natural tensors and an equivariant model that matches leading spherical-tensor performance for ML interatomic potentials and high-rank tensor predictions like elastic constants.
pop-cosmos: Star formation over 12 Gyr from generative modelling of a deep infrared-selected galaxy catalogue astro-ph.GA · 2025-09-24 · unverdicted · none · ref 186 · internal anchor
A score-based diffusion generative model on deep infrared galaxy photometry yields a star formation rate density peaking at z=1.3 and shows distinct non-parametric star formation histories plus AGN activity peaking during the quenching transition of massive galaxies.
Meson spectroscopy of exotic symmetries of Ising criticality in Rydberg atom arrays quant-ph · 2025-06-26 · unverdicted · none · ref 43 · internal anchor
Rydberg arrays realize Ising criticality with E8 mass spectra in chains and first signatures of D8^(1)-organized bound states from interchain confinement in ladders.
GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction q-bio.QM · 2025-04-08 · conditional · none · ref 50 · internal anchor
GraphGDel builds graph representations from constraint-based metabolic models and trains a deep learning framework integrating graph structure with gene and metabolite sequences to predict growth-coupled gene deletions, showing accuracy gains of 4-16% over baselines on three models.
KernelBench: Can LLMs Write Efficient GPU Kernels? cs.LG · 2025-02-14 · accept · none · ref 28 · internal anchor
KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.
A window for water-hydrogen demixing on warm metal-rich sub-Neptunes astro-ph.EP · 2025-12-01 · conditional · none · ref 60 · internal anchor
Water-hydrogen demixing occurs on warm sub-Neptunes with envelope metallicities of 150-700 times solar, including TOI-270 d, implying layered interiors and underestimated bulk metallicities when using fully-miscible models.
Understanding the Staged Dynamics of Transformers in Learning Latent Structure cs.LG · 2025-11-24 · unverdicted · none · ref 4 · internal anchor
Transformers learn latent structure components in discrete stages during training, composing rules more robustly than decomposing complex examples, with identified layer plasticity windows.
SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs cs.IR · 2025-11-18 · unverdicted · none · ref 30 · internal anchor
SilverTorch replaces standalone ANN indexing and filtering with a unified GPU model using a model-based Bloom index and fused Int8 ANN kernel, delivering up to 23.7x throughput and 13.35x cost efficiency gains on industry data.
CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification cs.CL · 2025-10-19 · unverdicted · none · ref 51 · internal anchor
CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.
Image reconstruction with the JWST Interferometer astro-ph.IM · 2025-10-13 · unverdicted · none · ref 62 · internal anchor
Dorito enables diffraction-limited image reconstruction from JWST AMI observations by deconvolving images or Fourier observables using maximum entropy and total variation regularization.
Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach cs.SD · 2025-09-26 · unverdicted · none · ref 29 · internal anchor
A zero-training VLM framework generates music from images via ABC notation, multi-modal RAG, and self-refinement while providing text and visual explanations for the outputs.
Differentiable Acoustic Radiance Transfer cs.SD · 2025-09-19 · unverdicted · none · ref 54 · internal anchor
DART adds differentiability to acoustic radiance transfer, enabling material optimization and improved performance on sparse acoustic field prediction tasks compared to signal processing and neural baselines.
Optimizing Quantum Photonic Integrated Circuits using Differentiable Tensor Networks quant-ph · 2025-09-15 · unverdicted · none · ref 51 · internal anchor
Gradient-based optimization of quantum photonic circuits is achieved via differentiable tensor networks that model nonlinear unitary gates and stochastic losses at low photon numbers.
Thermodynamically consistent machine learning model for excess Gibbs energy cs.LG · 2025-09-08 · unverdicted · none · ref 44 · internal anchor
HANNA is a thermodynamically consistent ML model for predicting excess Gibbs energy from molecular structures, trained on various binary mixture data and extended to multi-component mixtures using geometric projection.
Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs cs.LG · 2025-08-21 · unverdicted · none · ref 10 · internal anchor
Introduces layer-wise learning signals combining knowledge distillation and local errors into Equilibrium Propagation, enabling scalable training of deep VGG-style CRNNs with SOTA results on CIFAR-10 and CIFAR-100.
Stability-Constrained AC Optimal Power Flow--A Gaussian Process-Based Approach math.OC · 2025-07-30 · unverdicted · none · ref 25 · internal anchor
A Gaussian Process surrogate for the stability exponent of generator dynamics is integrated into AC Optimal Power Flow to produce both cost-optimal and dynamically stable operating points.
Neural simulation-based inference of the Higgs trilinear self-coupling via off-shell Higgs production hep-ph · 2025-07-02 · unverdicted · none · ref 59 · internal anchor
A hybrid NSBI technique is presented for inferring the Higgs trilinear coupling via off-shell production in SMEFT, achieving near-theoretical-optimum sensitivity with expected HL-LHC constraints.
Characterizing control between interacting subsystems with deep Jacobian estimation q-bio.QM · 2025-07-02 · unverdicted · none · ref 71 · internal anchor
JacobianODE learns Jacobians from data to quantify directional control in nonlinear systems and shows sensory-to-cognitive control strengthening in a trained working-memory RNN.
Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence cs.LG · 2025-06-15 · unverdicted · none · ref 22 · internal anchor
New unsupervised method adapts the multivariate logrank statistic into a differentiable loss for training any neural network on any data modality to discover prognostically distinct patient clusters, demonstrated on myeloma lab data and lung cancer CT images with post-hoc explainability.
Learning Encodings by Maximizing State Distinguishability: Variational Quantum Error Correction quant-ph · 2025-06-13 · unverdicted · none · ref 83 · internal anchor
VarQEC uses a distinguishability loss as a machine-learning objective to variationally discover resource-efficient encoding circuits optimized for given noise models.
Neuralized Fermionic Tensor Networks for Quantum Many-Body Systems cond-mat.dis-nn · 2025-06-10 · unverdicted · none · ref 60 · internal anchor
NN-fTNS enhance fermionic tensor networks with neural parametrization to improve expressivity and achieve order-of-magnitude better energies than pure fTNS on Hubbard models while maintaining linear scaling.
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics cs.LG · 2025-06-02 · unverdicted · none · ref 32 · internal anchor
SmolVLA is a small efficient VLA model that achieves performance comparable to 10x larger models while training on one GPU and deploying on consumer hardware via community data and chunked asynchronous action prediction.
Modal Decomposition and Identification for a Population of Structures Using Physics-Informed Graph Neural Networks and Transformers cs.CE · 2025-05-06 · unverdicted · none · ref 38 · internal anchor
A physics-informed GNN-transformer model performs unsupervised modal decomposition and identification for populations of structures from sparse dynamic measurements.
Tensor-Programmable Quantum Circuits for Solving Differential Equations quant-ph · 2025-02-06 · unverdicted · none · ref 95 · internal anchor
A quantum solver for PDEs is introduced via flexible matrix product operator representations with mid-circuit measurements and state-dependent norm correction to handle non-unitary dynamics.
Variational decision diagrams for quantum-inspired machine learning applications quant-ph · 2025-02-06 · unverdicted · none · ref 41 · internal anchor
The paper proposes variational decision diagrams (VDDs) for quantum state representation in QML and reports successful training without barren plateaus on transverse-field Ising and Heisenberg Hamiltonians.
Machine learning for smell: Ordinal odor strength prediction of molecular perfumery components physics.chem-ph · 2025-12-09 · unverdicted · none · ref 42 · internal anchor
The authors compile an ordinal odor strength dataset for over 2,000 molecules from public sources and demonstrate supervised ML prediction of intensity categories, identifying molecular size, polarity, rings, and branching as key drivers via SHAP analysis.
Stochastic versus Deterministic in Stochastic Gradient Descent math.OC · 2025-09-03 · unverdicted · none · ref 18 · internal anchor
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning cs.CV · 2025-08-15 · unverdicted · none · ref 46 · internal anchor
UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation cs.CL · 2025-07-02 · unverdicted · none · ref 30 · internal anchor
LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.
Efficient compression of neural networks and datasets cs.LG · 2025-05-23 · unverdicted · none · ref 50 · internal anchor
Refined probabilistic and smooth l0 pruning techniques approximate minimum description length for neural networks, achieving high compression with minimal accuracy loss and empirically verifying better sample efficiency and generalization on image and text tasks.
A mixed-integer framework for analyzing neural network-based controllers for piecewise affine systems with bounded disturbances eess.SY · 2025-04-15 · unverdicted · none · ref 14 · internal anchor
A mixed-integer framework represents neural network-controlled piecewise affine systems with bounded disturbances as MI linear constraints, enabling computation of robustly positively invariant sets via MI linear programs for stability and constraint certification.
Auto-encoder model for faster generation of effective one-body gravitational waveform approximations gr-qc · 2025-11-16 · unverdicted · none · ref 73 · internal anchor
Auto-encoder approximates SEOBNRv4 waveforms for four-parameter aligned-spin binaries, delivering 4 orders of magnitude speedup at median mismatch of 10^{-2}.
Physics-informed neural network (PINN) modeling of charged particle multiplicity using the two-component framework in heavy-ion collisions: A comparison with data-driven neural networks hep-ph · 2025-11-07 · unverdicted · none · ref 21 · internal anchor
A PINN constrained by the two-component multiplicity model learns the hard-scattering fraction from Zr+Zr events and predicts N_ch more accurately than a data-driven NN on unseen Ru+Ru and Au+Au collisions.
Identifying lopsidedness in spiral galaxies using a Deep Convolutional Neural Network astro-ph.GA · 2025-05-26 · conditional · none · ref 22 · internal anchor
Transfer learning with a Zoobot CNN on SDSS DR18 data identifies 3,679 lopsided spiral galaxies at 87% test accuracy, with lopsided systems showing higher star formation, bluer colors, lower mass and concentration.
Clinical utility of foundation models in musculoskeletal MRI for biomarker fidelity and predictive outcomes eess.IV · 2025-01-23 · unverdicted · none · ref 67 · internal anchor
Fine-tuned foundation models produce reliable MSK MRI biomarkers that support workload-reducing triage and calibrated 48-month prediction of knee replacement and incident OA.
Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation q-fin.CP · 2025-02-24 · unverdicted · none · ref 22 · internal anchor
CausalGAN + SAC RL pipeline generates synthetic bond yield data; fine-tuned Qwen2.5-7B LLM produces trading signals, with reported MAE 0.103, 60% profit rate, and LLM score 3.37/5.
Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project cs.DC · 2025-04-14 · unverdicted · none · ref 52 · internal anchor
Engineering report detailing HPC infrastructure, software choices, and performance measurements for training a 7B LLM using 3D parallelism on JUWELS Booster.
AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research cs.DC · 2025-12-18 · unreviewed · ref 74 · internal anchor
Page image classification for content-specific data processing cs.IR · 2025-07-11 · unreviewed · ref 13 · internal anchor

PyTorch: An Imperative Style, High-Performance Deep Learning Library

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer