{"work":{"id":"c30b6d2c-7bb4-4ab0-8ef8-2015313610a9","openalex_id":null,"doi":null,"arxiv_id":"1912.01703","raw_key":null,"title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","authors":null,"authors_text":"A","year":2019,"venue":"cs.LG","abstract":"Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.\n  In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.\n  We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.","external_url":"https://arxiv.org/abs/1912.01703","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T09:03:16.216981+00:00","pith_arxiv_id":"1912.01703","created_at":"2026-05-08T18:18:55.667565+00:00","updated_at":"2026-06-29T09:03:16.216981+00:00","title_quality_ok":true,"display_title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","render_title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library"},"hub":{"state":{"work_id":"c30b6d2c-7bb4-4ab0-8ef8-2015313610a9","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":155,"external_cited_by_count":null,"distinct_field_count":47,"first_pith_cited_at":"2021-05-11T17:50:24+00:00","last_pith_cited_at":"2026-06-26T01:16:03+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:08:47.308413+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":18},{"context_role":"method","n":11},{"context_role":"dataset","n":1}],"polarity_counts":[{"context_polarity":"background","n":16},{"context_polarity":"use_method","n":11},{"context_polarity":"unclear","n":2},{"context_polarity":"use_dataset","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","claims":[{"claim_text":"Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.\n  In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"For nucleotide modeling, where functional and structural dependencies can span large genomic distances, DTA's context extension capability may complement existing long-range approaches. Future work should focus on MLM-specific variants of position dropping, potentially enabling robust long-context extension in bidirectional settings without full long-context pretraining. Methods Data sources Argmax position probe.Synthetic sequences were generated by sampling integers uniformly from[0,v)wherev= ","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"The signal samples are also reweighted such that the sum of weights for each sam- ple is equal. During training, the signal samples are passed through the pNN with the IDM parameters(m H,m A)at which they were simulated. The background samples are randomly assigned(m H,m A)values from the set of signal(m H,m A)values used in the training. The pNN is implemented using PYTORCH[88], and consists of a feedforward network with six layers, each layer containing 60 hidden units and an ELU activation fu","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"i }, and then use simulated annealing to find the configuration that maximizes the sign predicted by the NN. We build our model using a feed-forward NN with a variable number of layers and neurons, and use the sigmoid activation function to ensure that the output lies in [0, 1]. We also use the Huber loss function to weight our model. For the implementation, we use the PyTorch library [ 49]. As a first test, we apply our model to a lattice with N = 4 unit cells for the pure Kitaev model at inver","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"models are equally important and pervasive. There are a plethora of competing high-quality, general-purpose machine learning libraries and software frameworks that are used by researchers and developers to create, train, and deploy AI models. Examples of high-quality frameworks for AI development include actively supported frameworks such as Tensorﬂow [5], Keras, PyTorch [6], JAX and Apache MXNet [7], as well as many deprecated frameworks such as Theano, Torch, Caﬀe, and CNTK. The two most popul","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Fu, \"Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems,\" 2020. Version Number: 3, 10.48550/ARXIV .2005.01643. [41] G. Yao, N. Zhang, Z. Duan, and C. Tian, \"Improved SARSA and DQN algorithms for reinforcement learning,\"Theoretical Computer Science, V ol. 1027, 2025, p. 115025, https://doi.org/10.1016/j.tcs.2024.115025. [42] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., \"Pytorch: An imperative","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"arXiv preprint arXiv:1810.12282, 2018. [86] Jacopo Panerati, Hehui Zheng, SiQi Zhou, James Xu, Amanda Prorok, and Angela P Schoellig. Learning to fly-a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7512-7519. IEEE, 2021. [87] Adam Paszke. Pytorch: An imperative style, high-performance deep learning library.arXiv preprint arXiv:1912.01703, 2019. 14 [88] Mi","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks PyTorch: An Imperative Style, High-Performance Deep Learning Library because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (15 contexts).","role_counts":[{"n":15,"context_role":"background"},{"n":9,"context_role":"method"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-18T21:31:20.157774+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[]},"error":null,"updated_at":"2026-05-18T21:31:20.163383+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T10:08:46.753869+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":13},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":12},{"title":"R., Millman, K","work_id":"b05b154d-0381-4d1b-911f-1b35eb7a6768","shared_citers":9},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":8},{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":6},{"title":"E., et al","work_id":"1a44c2d3-9a48-46f3-924b-d2fa43b6729a","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":4},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":4},{"title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing","work_id":"9d86da8d-01d3-41af-a0d2-ee14897927a9","shared_citers":4},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":4},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":4},{"title":"Advanced LIGO","work_id":"b93186e6-8d0a-440a-aa48-9de6dbff57b9","shared_citers":3},{"title":"Advanced Virgo: a 2nd generation interferometric gravitational wave detector","work_id":"29d52a5a-6fd3-471b-8fec-d17c29cf9026","shared_citers":3},{"title":"Deep Residual Learning for Image Recognition","work_id":"ae9e5671-23e8-4853-82a4-699b5b8dd639","shared_citers":3},{"title":"DeepSeek- R1: Incentivizing reasoning capability in LLMs via reinforcement learning","work_id":"9835b482-5032-4135-93dd-82a066677569","shared_citers":3},{"title":"Generative Adversarial Networks","work_id":"ad1c2a45-7ac7-45e3-9ffa-c83ca5f20ab9","shared_citers":3},{"title":"Jiarui Zhang, Ollie Liu, Tianyu Yu, Jinyi Hu, and Willie Neiswanger","work_id":"eb18b0c2-9ed0-4254-b208-425469f09e64","shared_citers":3},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":3},{"title":"Longformer: The Long-Document Transformer","work_id":"abea7a44-6668-4de7-aab6-f53a6e5aa088","shared_citers":3},{"title":"Mixed Precision Training","work_id":"c525941b-ce20-4bcb-8509-a9968f1e89c3","shared_citers":3},{"title":"Optuna: A next-generation hyperparameter optimization framework","work_id":"a8219024-82bb-4802-97b6-8cb1aed9e461","shared_citers":3},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":3},{"title":"RoFormer: Enhanced Transformer with Rotary Position Embedding","work_id":"4e5eee26-cd04-4c7a-988f-3e6d1a1f0eb9","shared_citers":3}],"time_series":[{"n":1,"year":2021},{"n":2,"year":2022},{"n":1,"year":2023},{"n":2,"year":2025},{"n":61,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T10:08:40.582256+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T10:08:49.098437+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","claims":[{"claim_text":"Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.\n  In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"For nucleotide modeling, where functional and structural dependencies can span large genomic distances, DTA's context extension capability may complement existing long-range approaches. Future work should focus on MLM-specific variants of position dropping, potentially enabling robust long-context extension in bidirectional settings without full long-context pretraining. Methods Data sources Argmax position probe.Synthetic sequences were generated by sampling integers uniformly from[0,v)wherev= ","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"The signal samples are also reweighted such that the sum of weights for each sam- ple is equal. During training, the signal samples are passed through the pNN with the IDM parameters(m H,m A)at which they were simulated. The background samples are randomly assigned(m H,m A)values from the set of signal(m H,m A)values used in the training. The pNN is implemented using PYTORCH[88], and consists of a feedforward network with six layers, each layer containing 60 hidden units and an ELU activation fu","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"i }, and then use simulated annealing to find the configuration that maximizes the sign predicted by the NN. We build our model using a feed-forward NN with a variable number of layers and neurons, and use the sigmoid activation function to ensure that the output lies in [0, 1]. We also use the Huber loss function to weight our model. For the implementation, we use the PyTorch library [ 49]. As a first test, we apply our model to a lattice with N = 4 unit cells for the pure Kitaev model at inver","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"models are equally important and pervasive. There are a plethora of competing high-quality, general-purpose machine learning libraries and software frameworks that are used by researchers and developers to create, train, and deploy AI models. Examples of high-quality frameworks for AI development include actively supported frameworks such as Tensorﬂow [5], Keras, PyTorch [6], JAX and Apache MXNet [7], as well as many deprecated frameworks such as Theano, Torch, Caﬀe, and CNTK. The two most popul","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Fu, \"Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems,\" 2020. Version Number: 3, 10.48550/ARXIV .2005.01643. [41] G. Yao, N. Zhang, Z. Duan, and C. Tian, \"Improved SARSA and DQN algorithms for reinforcement learning,\"Theoretical Computer Science, V ol. 1027, 2025, p. 115025, https://doi.org/10.1016/j.tcs.2024.115025. [42] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., \"Pytorch: An imperative","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"arXiv preprint arXiv:1810.12282, 2018. [86] Jacopo Panerati, Hehui Zheng, SiQi Zhou, James Xu, Amanda Prorok, and Angela P Schoellig. Learning to fly-a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7512-7519. IEEE, 2021. [87] Adam Paszke. Pytorch: An imperative style, high-performance deep learning library.arXiv preprint arXiv:1912.01703, 2019. 14 [88] Mi","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks PyTorch: An Imperative Style, High-Performance Deep Learning Library because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (15 contexts).","role_counts":[{"n":15,"context_role":"background"},{"n":9,"context_role":"method"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-18T21:31:20.161393+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","claims":[{"claim_text":"Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.\n  In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks PyTorch: An Imperative Style, High-Performance Deep Learning Library because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T10:08:51.420472+00:00"}},"summary":{"title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","claims":[{"claim_text":"Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.\n  In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect o","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks PyTorch: An Imperative Style, High-Performance Deep Learning Library because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":13},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":12},{"title":"R., Millman, K","work_id":"b05b154d-0381-4d1b-911f-1b35eb7a6768","shared_citers":9},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":8},{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":6},{"title":"E., et al","work_id":"1a44c2d3-9a48-46f3-924b-d2fa43b6729a","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":4},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":4},{"title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing","work_id":"9d86da8d-01d3-41af-a0d2-ee14897927a9","shared_citers":4},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":4},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":4},{"title":"Advanced LIGO","work_id":"b93186e6-8d0a-440a-aa48-9de6dbff57b9","shared_citers":3},{"title":"Advanced Virgo: a 2nd generation interferometric gravitational wave detector","work_id":"29d52a5a-6fd3-471b-8fec-d17c29cf9026","shared_citers":3},{"title":"Deep Residual Learning for Image Recognition","work_id":"ae9e5671-23e8-4853-82a4-699b5b8dd639","shared_citers":3},{"title":"DeepSeek- R1: Incentivizing reasoning capability in LLMs via reinforcement learning","work_id":"9835b482-5032-4135-93dd-82a066677569","shared_citers":3},{"title":"Generative Adversarial Networks","work_id":"ad1c2a45-7ac7-45e3-9ffa-c83ca5f20ab9","shared_citers":3},{"title":"Jiarui Zhang, Ollie Liu, Tianyu Yu, Jinyi Hu, and Willie Neiswanger","work_id":"eb18b0c2-9ed0-4254-b208-425469f09e64","shared_citers":3},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":3},{"title":"Longformer: The Long-Document Transformer","work_id":"abea7a44-6668-4de7-aab6-f53a6e5aa088","shared_citers":3},{"title":"Mixed Precision Training","work_id":"c525941b-ce20-4bcb-8509-a9968f1e89c3","shared_citers":3},{"title":"Optuna: A next-generation hyperparameter optimization framework","work_id":"a8219024-82bb-4802-97b6-8cb1aed9e461","shared_citers":3},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":3},{"title":"RoFormer: Enhanced Transformer with Rotary Position Embedding","work_id":"4e5eee26-cd04-4c7a-988f-3e6d1a1f0eb9","shared_citers":3}],"time_series":[{"n":1,"year":2021},{"n":2,"year":2022},{"n":1,"year":2023},{"n":2,"year":2025},{"n":61,"year":2026}],"dependency_candidates":[]},"authors":[]}}