super hub Mixed citations

PyTorch: An Imperative Style, High-Performance Deep Learning Library

· 2019 · cs.LG · arXiv 1912.01703

Mixed citation behavior. Most common role is background (53%).

157 Pith papers citing it

Background 53% of classified citations

open full Pith review browse 157 citing papers arXiv PDF

abstract

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 18 method 11 dataset 1

citation-polarity summary

background 16 use method 11 unclear 2 use dataset 1

claims ledger

abstract Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o

co-cited works

representative citing papers

Efficient Training on Multiple Consumer GPUs with RoundPipe

cs.DC · 2026-04-29 · conditional · novelty 8.0

RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller

cs.RO · 2025-12-22 · conditional · novelty 8.0

First in-orbit demonstration of a DRL-trained AI satellite attitude controller that performs robust inertial pointing after sim-to-real transfer.

Automated discovery of heralded ballistic graph state generators for fusion-based photonic quantum computation

quant-ph · 2025-08-22 · unverdicted · novelty 8.0

A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

ffortissimo: A Freeform Forward-Modeling Pipeline for High-Contrast Images of Circumstellar Disks Based on Automatic Differentiation

astro-ph.IM · 2026-06-22 · unverdicted · novelty 7.0

ffortissimo is a JAX-based freeform forward-modeling pipeline that fits complex dust distributions and infers scattering properties in KLIP-reduced images of circumstellar disks such as HR 4796A.

A matrix-free, differentiable PyTorch solver for phase-field fracture: Formulation, benchmarks, and inverse analysis

cs.CE · 2026-06-22 · unverdicted · novelty 7.0

A matrix-free, GPU-compatible PyTorch implementation of phase-field fracture with explicit dynamics, custom differentiable implicit damage solve, benchmarks on dynamic and quasi-static cases, and inverse recovery of fracture energy G_c via L-BFGS.

Attention-based optimizer for symmetry finding

quant-ph · 2026-05-28 · unverdicted · novelty 7.0

A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.

Forecasting megaelectron-volt electron flux in the Earth's outer radiation belt using supervised machine learning algorithms and a timeseries foundation model

astro-ph.IM · 2026-05-15 · unverdicted · novelty 7.0

Hybrid TimesFM plus ridge regression on covariates forecasts 1-MeV electron flux with average R² of 0.9 on out-of-sample 2024 data, outperforming linear regression, CNN, LSTM and Transformer models.

Reconstructing the Stripping History of the Sagittarius Stream with Neural Networks

astro-ph.GA · 2026-05-14 · unverdicted · novelty 7.0

A neural network trained on simulations infers stripping times for Sagittarius stream stars from phase-space data, measuring a 0.3 dex/Gyr metallicity gradient and estimating ages for globular clusters such as Pal 12 and NGC 2419.

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

cs.MA · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.

Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction

q-bio.BM · 2026-05-12 · unverdicted · novelty 7.0

Vibrational mode graphs from molecular dynamics enable sequence-free protein function prediction via graph neural networks, with entrainment improving signals for collective dynamics.

End-to-End Population Inference from Gravitational-Wave Strain using Transformers

gr-qc · 2026-05-11 · unverdicted · novelty 7.0

Dingo-Pop uses a transformer to perform amortized, end-to-end population inference from GW strain data in seconds, bypassing per-event Monte Carlo sampling.

Learning reveals invisible structure in low-rank RNNs

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.

Dynamical magnetotropic susceptibility as a new probe of Kitaev materials and beyond

cond-mat.str-el · 2026-05-01 · unverdicted · novelty 7.0

Dynamical magnetotropic susceptibility k(ω) acts as a probe of uniform spin and charge fluctuations, with its static scaling in α-RuCl3 arising specifically from dominant Kitaev interactions in the models examined.

Sampling two-dimensional spin systems with transformers

cond-mat.dis-nn · 2026-04-30 · unverdicted · novelty 7.0

Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural samplers at criticality.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Big Dipper, Help Me Find A Way -- Dip-hunting at hadron colliders

hep-ph · 2026-04-28 · unverdicted · novelty 7.0

Parametric neural networks learn likelihood ratios to infer top-philic scalar resonances from dip patterns caused by signal-background interference in hadron collider data.

Graph-Conditioned Meta-Optimizer for QAOA Parameter Generation on Multiple Problem Classes

quant-ph · 2026-04-28 · unverdicted · novelty 7.0

A graph-conditioned meta-optimizer learns QAOA parameter trajectories from one problem class and transfers them to others, yielding better initializations than standard methods in an empirical study of 64 settings.

Rates of forgetting for the sequentially Markov coalescent

math.PR · 2026-04-22 · unverdicted · novelty 7.0

SMC forgets its initial condition geometrically in the jump chain and as 1/ℓ in continuous genetic distance, justifying independent-locus approximations.

Concept Graph Convolutions: Message Passing in the Concept Space

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.

A Non-stationary, Amortized, Transfer Learning Approach for Modeling Italian Air Quality

stat.AP · 2026-04-20 · unverdicted · novelty 7.0

A neural network learns non-stationary anisotropic correlations from gridded CTM outputs and transfers the structure via LatticeKrig basis functions to station data for refined fine-scale NO2 predictions with uncertainty.

Probing the 3D Structures of Supernovae through IR Signatures of CO and SiO

astro-ph.HE · 2026-04-20 · unverdicted · novelty 7.0

MOFAT applied to SN2024ggi shows CO triggering inner SiO formation with a receding edge, order-of-magnitude mass drop, clumping signatures, and no dust formation.

citing papers explorer

Showing 50 of 157 citing papers.

Efficient Training on Multiple Consumer GPUs with RoundPipe cs.DC · 2026-04-29 · conditional · none · ref 39 · internal anchor
RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.
Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 18 · internal anchor
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller cs.RO · 2025-12-22 · conditional · none · ref 30 · internal anchor
First in-orbit demonstration of a DRL-trained AI satellite attitude controller that performs robust inertial pointing after sim-to-real transfer.
Automated discovery of heralded ballistic graph state generators for fusion-based photonic quantum computation quant-ph · 2025-08-22 · unverdicted · none · ref 39 · internal anchor
A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.
Editing Models with Task Arithmetic cs.LG · 2022-12-08 · accept · none · ref 81 · internal anchor
Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.
Traces of Helium Detected in Type Ic Supernova 2014L astro-ph.HE · 2026-03-31 · accept · none · ref 74
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
ffortissimo: A Freeform Forward-Modeling Pipeline for High-Contrast Images of Circumstellar Disks Based on Automatic Differentiation astro-ph.IM · 2026-06-22 · unverdicted · none · ref 45 · internal anchor
ffortissimo is a JAX-based freeform forward-modeling pipeline that fits complex dust distributions and infers scattering properties in KLIP-reduced images of circumstellar disks such as HR 4796A.
A matrix-free, differentiable PyTorch solver for phase-field fracture: Formulation, benchmarks, and inverse analysis cs.CE · 2026-06-22 · unverdicted · none · ref 27 · internal anchor
A matrix-free, GPU-compatible PyTorch implementation of phase-field fracture with explicit dynamics, custom differentiable implicit damage solve, benchmarks on dynamic and quasi-static cases, and inverse recovery of fracture energy G_c via L-BFGS.
Attention-based optimizer for symmetry finding quant-ph · 2026-05-28 · unverdicted · none · ref 86 · internal anchor
A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.
Forecasting megaelectron-volt electron flux in the Earth's outer radiation belt using supervised machine learning algorithms and a timeseries foundation model astro-ph.IM · 2026-05-15 · unverdicted · none · ref 51 · internal anchor
Hybrid TimesFM plus ridge regression on covariates forecasts 1-MeV electron flux with average R² of 0.9 on out-of-sample 2024 data, outperforming linear regression, CNN, LSTM and Transformer models.
Reconstructing the Stripping History of the Sagittarius Stream with Neural Networks astro-ph.GA · 2026-05-14 · unverdicted · none · ref 95 · internal anchor
A neural network trained on simulations infers stripping times for Sagittarius stream stars from phase-space data, measuring a 0.3 dex/Gyr metallicity gradient and estimating ages for globular clusters such as Pal 12 and NGC 2419.
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning cs.MA · 2026-05-12 · unverdicted · none · ref 53 · 2 links · internal anchor
Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.
Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction q-bio.BM · 2026-05-12 · unverdicted · none · ref 33 · internal anchor
Vibrational mode graphs from molecular dynamics enable sequence-free protein function prediction via graph neural networks, with entrainment improving signals for collective dynamics.
End-to-End Population Inference from Gravitational-Wave Strain using Transformers gr-qc · 2026-05-11 · unverdicted · none · ref 35 · internal anchor
Dingo-Pop uses a transformer to perform amortized, end-to-end population inference from GW strain data in seconds, bypassing per-event Monte Carlo sampling.
Learning reveals invisible structure in low-rank RNNs cs.LG · 2026-05-05 · unverdicted · none · ref 55 · internal anchor
Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.
Dynamical magnetotropic susceptibility as a new probe of Kitaev materials and beyond cond-mat.str-el · 2026-05-01 · unverdicted · none · ref 49 · internal anchor
Dynamical magnetotropic susceptibility k(ω) acts as a probe of uniform spin and charge fluctuations, with its static scaling in α-RuCl3 arising specifically from dominant Kitaev interactions in the models examined.
Sampling two-dimensional spin systems with transformers cond-mat.dis-nn · 2026-04-30 · unverdicted · none · ref 29 · internal anchor
Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural samplers at criticality.
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning astro-ph.GA · 2026-04-28 · unverdicted · none · ref 52 · internal anchor
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Big Dipper, Help Me Find A Way -- Dip-hunting at hadron colliders hep-ph · 2026-04-28 · unverdicted · none · ref 29 · internal anchor
Parametric neural networks learn likelihood ratios to infer top-philic scalar resonances from dip patterns caused by signal-background interference in hadron collider data.
Graph-Conditioned Meta-Optimizer for QAOA Parameter Generation on Multiple Problem Classes quant-ph · 2026-04-28 · unverdicted · none · ref 43 · internal anchor
A graph-conditioned meta-optimizer learns QAOA parameter trajectories from one problem class and transfers them to others, yielding better initializations than standard methods in an empirical study of 64 settings.
Rates of forgetting for the sequentially Markov coalescent math.PR · 2026-04-22 · unverdicted · none · ref 114 · internal anchor
SMC forgets its initial condition geometrically in the jump chain and as 1/ℓ in continuous genetic distance, justifying independent-locus approximations.
Concept Graph Convolutions: Message Passing in the Concept Space cs.LG · 2026-04-22 · unverdicted · none · ref 45 · internal anchor
Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.
A Non-stationary, Amortized, Transfer Learning Approach for Modeling Italian Air Quality stat.AP · 2026-04-20 · unverdicted · none · ref 1 · internal anchor
A neural network learns non-stationary anisotropic correlations from gridded CTM outputs and transfers the structure via LatticeKrig basis functions to station data for refined fine-scale NO2 predictions with uncertainty.
Probing the 3D Structures of Supernovae through IR Signatures of CO and SiO astro-ph.HE · 2026-04-20 · unverdicted · none · ref 96 · internal anchor
MOFAT applied to SN2024ggi shows CO triggering inner SiO formation with a receding edge, order-of-magnitude mass drop, clumping signatures, and no dust formation.
Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution cs.PL · 2026-04-19 · unverdicted · none · ref 53 · internal anchor
A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality cs.AR · 2026-04-14 · unverdicted · none · ref 32 · internal anchor
The Tensor Memory Engine provides on-the-fly data reorganization to achieve ideal memory locality for CPU computations in edge systems.
Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings q-bio.QM · 2026-04-09 · unverdicted · none · ref 46 · internal anchor
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
How pore-scale disorder controls fluid stretching in porous media physics.flu-dyn · 2026-04-03 · unverdicted · none · ref 52 · internal anchor
Pore-scale disorder accelerates fluid stretching in porous media, producing quadratic time growth and faster mixing than the linear growth seen in ordered structures.
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini cs.HC · 2026-03-25 · unverdicted · none · ref 39 · internal anchor
XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.
Polarized Target Nuclear Magnetic Resonance Measurements with Deep Neural Networks physics.ins-det · 2026-03-10 · unverdicted · none · ref 28 · internal anchor
Deep neural networks reduce fitting uncertainties in CW-NMR polarization measurements for dynamically polarized targets.
RLGT: A reinforcement learning framework for extremal graph theory cs.LG · 2026-02-19 · unverdicted · none · ref 39 · internal anchor
RLGT is a modular reinforcement learning framework for extremal graph theory that handles undirected, directed, looped, and multi-colored graphs to facilitate future research.
Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models cs.CE · 2026-02-05 · unverdicted · none · ref 29 · internal anchor
Koopman autoencoders with forcings and temporal unrolling deliver accurate year-long predictions for coastal-ocean models at 300-1400x speedup, outperforming POD in two of three cases.
Cobble: Compiling Block Encodings for Quantum Computational Linear Algebra cs.PL · 2025-11-03 · unverdicted · none · ref 45 · internal anchor
Cobble is a domain-specific language for quantum block encodings that compiles high-level matrix expressions to optimized circuits using analyses and quantum singular value transformation, achieving 2.6x-25.4x speedups over unoptimized baselines on benchmarks.
Atomistic Machine Learning with Irreducible Cartesian Natural Tensors cond-mat.mtrl-sci · 2025-10-05 · unverdicted · none · ref 64 · internal anchor
CarNet develops irreducible Cartesian natural tensors and an equivariant model that matches leading spherical-tensor performance for ML interatomic potentials and high-rank tensor predictions like elastic constants.
pop-cosmos: Star formation over 12 Gyr from generative modelling of a deep infrared-selected galaxy catalogue astro-ph.GA · 2025-09-24 · unverdicted · none · ref 186 · internal anchor
A score-based diffusion generative model on deep infrared galaxy photometry yields a star formation rate density peaking at z=1.3 and shows distinct non-parametric star formation histories plus AGN activity peaking during the quenching transition of massive galaxies.
Meson spectroscopy of exotic symmetries of Ising criticality in Rydberg atom arrays quant-ph · 2025-06-26 · unverdicted · none · ref 43 · internal anchor
Rydberg arrays realize Ising criticality with E8 mass spectra in chains and first signatures of D8^(1)-organized bound states from interchain confinement in ladders.
GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction q-bio.QM · 2025-04-08 · conditional · none · ref 50 · internal anchor
GraphGDel builds graph representations from constraint-based metabolic models and trains a deep learning framework integrating graph structure with gene and metabolite sequences to predict growth-coupled gene deletions, showing accuracy gains of 4-16% over baselines on three models.
KernelBench: Can LLMs Write Efficient GPU Kernels? cs.LG · 2025-02-14 · accept · none · ref 28 · internal anchor
KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.
Clustering in pure-attention hardmax transformers and its role in sentiment analysis cs.CL · 2024-06-26 · unverdicted · none · ref 26 · internal anchor
Hardmax transformers converge to leader-determined clusters, enabling an interpretable model for sentiment analysis.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 46 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
The Impact of Host Galaxy Properties on Supernova Classification with Hierarchical Labels astro-ph.IM · 2026-06-23 · unverdicted · none · ref 34 · internal anchor
Host galaxy properties enable >90% pure Type Ia samples from photometry alone and improve classification accuracy when redshift is unavailable, via a new hierarchical cross-entropy objective.
21cmEMUv3: a hybrid diffusion-LSTM emulator of 21cmFAST summary observables astro-ph.CO · 2026-05-29 · unverdicted · none · ref 176 · internal anchor
21cmEMUv3 emulates the cylindrical 21cm power spectrum via score-based diffusion and six other 21cmFAST observables via LSTM networks at sub-percent accuracy, then uses the emulator to infer a lower limit on soft-band X-ray luminosity from HERA data.
First steps towards gauge-independent vortex identification through machine learning hep-lat · 2026-05-27 · unverdicted · none · ref 28 · internal anchor
A neural network trained on 2D SU(2) lattices with inserted thin Z2 vortices, after random gauge transformations, noise, and cooling, can locate center vortices at moderate visibility levels and scales via tiling.
When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning cs.LG · 2026-05-20 · unverdicted · none · ref 27 · internal anchor
Position-Weighted On-Policy Self-Distillation (PW-OPSD) weights later tokens more heavily after a diagnostic shows position predicts teacher reliability better than entropy, yielding +1.0 and +1.1 Avg@12 gains on AIME 2024/2025.
Universal Jaynes-Cummings Control of an Oscillator quant-ph · 2026-05-18 · unverdicted · none · ref 58 · internal anchor
Experimental demonstration of universal qudit control on a cavity oscillator via compiled Jaynes-Cummings gates with a transmon ancilla, reaching 96% mean post-selected process fidelity for qutrit gates.
CAM-VFD: Cross-Attention Multimodal Video Forgery Detection cs.CV · 2026-05-16 · unverdicted · none · ref 34 · internal anchor
CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex cs.CV · 2026-05-15 · unverdicted · none · ref 63 · internal anchor
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding cs.CV · 2026-05-09 · unverdicted · none · ref 38 · internal anchor
A new framework combines self-attention on the Oblique manifold with bidirectional geodesic cross-attention on the Lorentz hyperboloid to improve both localization accuracy and descriptive coherence in 3D dense captioning.
POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles cs.LG · 2026-05-08 · unverdicted · none · ref 92 · internal anchor
POETS uses compute-efficient LLM policy ensembles to implicitly perform KL-regularized Thompson sampling, delivering O(sqrt(T gamma_T)) regret bounds and state-of-the-art sample efficiency in scientific discovery tasks such as protein search and quantum circuit design.
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 102 · internal anchor
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.

PyTorch: An Imperative Style, High-Performance Deep Learning Library

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer