Recognition: unknown
Prism: Symbolic Superoptimization of Tensor Programs
Pith reviewed 2026-05-10 09:00 UTC · model grok-4.3
The pith
Prism is the first symbolic superoptimizer for tensor programs that uses sGraph for compact representation of program families, two-level search, e-graph equivalence checking, and auto-tuning to achieve up to 2.2x speedup over prior superoptimizers on LLM workloads.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Prism achieves up to 2.2× speedup over best superoptimizers and 4.9× over best compiler-based approaches, while reducing end-to-end optimization time by up to 3.4× on five LLM workloads.
Load-bearing premise
That the symbolic reasoning over operator semantics, algebraic identities, and hardware constraints can correctly and completely prune provably suboptimal regions of the search space without excluding optimal implementations.
Figures
read the original abstract
This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Tensor operator semantics and algebraic identities can be symbolically modeled for equivalence verification and pruning of suboptimal programs
invented entities (1)
-
sGraph
no independent evidence
Reference graph
Works this paper leans on
-
[1]
XLA: Optimizing Compiler for TensorFlow.https : //www
2017. XLA: Optimizing Compiler for TensorFlow.https : //www. tensorflow.org/xla
2017
-
[2]
Transformer related optimizations.https://github.com/NVIDIA/ FasterTransformer
2020. Transformer related optimizations.https://github.com/NVIDIA/ FasterTransformer
2020
-
[3]
Flash-Decoding for long-context inference.https : //crfm
2023. Flash-Decoding for long-context inference.https : //crfm. stanford.edu/2023/10/12/flashdecoding.html
2023
-
[4]
NVIDIA H100 Tensor Core GPU.https://www.nvidia.com/en- us/data- center/h100/
2023. NVIDIA H100 Tensor Core GPU.https://www.nvidia.com/en- us/data- center/h100/
2023
-
[5]
Murray, Benoit Steiner, Paul Tucker, Vijay Va- sudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irv- ing, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Va- sudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
-
[6]
In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI)
TensorFlow: A System for Large-Scale Machine Learning.. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI)
-
[7]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan- Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe
-
[8]
InProceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT)
OpenTuner: An Extensible Framework for Program Autotun- ing. InProceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT). ACM
-
[9]
Sorav Bansal and Alex Aiken. 2006. Automatic Generation of Peephole Superoptimizers. InProceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems(San Jose, California, USA)(ASPLOS XII)
2006
-
[10]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dal- las Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, St...
work page internal anchor Pith review arXiv 2022
-
[11]
Available: https://doi.org/10.48550/arXiv.1802.04799
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q. Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: End-to-End Optimization Stack for Deep Learning.CoRRabs/1802.04799 (2018).http : //arxiv.org/abs/1802. 04799
-
[12]
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. InAdvances in Neural Infor- mation Processing Systems 31
2018
- [13]
-
[14]
Dense Linear Algebra on GPUs.https : //developer
cuBLAS 2016. Dense Linear Algebra on GPUs.https : //developer. nvidia.com/cublas
2016
-
[15]
Tri Dao, Daniel Haziza, Francisco Massa, and Grigory Sizov. 2023. Flash-Decoding for Long-Context Inference
2023
- [16]
-
[17]
Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai, and Zhihao Jia. 2024. Optimal Kernel Orchestration for Tensor Programs with Korch. InProceedings of the 29th ACM International Conference on Architectural Support for Pro- gramming Languages and Operating Sy...
-
[18]
Ganger, Tianqi Chen, and Zhihao Jia
Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, and Zhihao Jia. 2025. GraphPipe: Improving Performance and Scalabil- ity of DNN Training with Graph Pipeline Parallelism. InProceedings of the 30th ACM International C...
-
[19]
Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Za- haria, and Alex Aiken. 2019. TASO: Optimizing Deep Learning Com- putation with Automatic Generation of Graph Substitutions. InPro- ceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada)(SOSP ’19). Association for Computing Machinery, New York, NY, US...
-
[20]
Zhihao Jia, Matei Zaharia, and Alex Aiken. 2019. Beyond Data and Model Parallelism for Deep Neural Networks. InProceedings of the 2nd Conference on Systems and Machine Learning (SysML’19)
2019
-
[21]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S. Vetter. 2018. NVIDIA Tensor Core Programmabil- ity, Performance & Precision. In2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. doi:10.1109/ipdpsw.2018.00091
-
[22]
Henry Massalin. 1987. Superoptimizer: a look at the smallest program. InACM SIGARCH Computer Architecture News, Vol. 15
1987
-
[23]
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan- Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines.ACM Trans. Graph.35, 4 (2016)
2016
-
[24]
Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Ko- zlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. 2025. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131(2025)
work page internal anchor Pith review arXiv 2025
-
[25]
Jongseok Park, Kyungmin Bin, Gibum Park, Sangtae Ha, and Kyung- han Lee. 2023. ASPEN: Breaking Operator Barriers for Efficient Paral- lelization of Deep Neural Networks. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 68625– 68638.https : //p...
2023
-
[26]
Tensors and Dynamic neural networks in Python with strong GPU acceleration.https://pytorch.org
PyTorch 2017. Tensors and Dynamic neural networks in Python with strong GPU acceleration.https://pytorch.org
2017
-
[27]
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic super- optimization. InACM SIGPLAN Notices, Vol. 48
2013
-
[28]
Yining Shi, Zhi Yang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Zim- ing Miao, Yuxiao Guo, Fan Yang, and Lidong Zhou. 2023. Welder: Scheduling Deep Learning Memory Access via Tile-graph. In17th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI 23). USENIX Association, Boston, MA, 701–718.https : //www.usenix.org/conference/osdi23/presentation/shi
2023
-
[29]
NVIDIA TensorRT: Programmable Inference Acceler- ator.https://developer.nvidia.com/tensorrt
TensorRT 2017. NVIDIA TensorRT: Programmable Inference Acceler- ator.https://developer.nvidia.com/tensorrt
2017
-
[30]
McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, and Alex Aiken
Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Patrick S. McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, and Alex Aiken. 2022. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parall...
2022
-
[31]
Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, and Zhihao Jia. 2021. PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections. In15th USENIX Sym- posium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 37–54.https : ...
2021
- [32]
- [33]
-
[34]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: Fast and Extensible Equality Saturation.Proc. ACM Program. Lang.5, POPL, Article 23 (Jan. 2021), 29 pages. doi:10.1145/3434304
-
[35]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Syl- vain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush
-
[36]
Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.https://github.com/huggingface/transformers
-
[37]
Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, and Zhihao Jia
-
[38]
InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI)
Mirage: A Multi-Level Superoptimizer for Tensor Programs. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association
-
[39]
Yichen Yang, Phitchaya Phothilimthana, Yisu Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. 2021. Equality Saturation for Ten- sor Graph Superoptimization.Proceedings of Machine Learning and Systems3 (March 2021), 255–268
2021
-
[40]
Ansor: Generating high-performance tensor programs for deep learning,
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor : Generat- ing High-Performance Tensor Programs for Deep Learning.CoRR abs/2006.06762 (2020). arXiv:2006.06762https://arxiv.org/abs/2006. 06762
-
[41]
Liyan Zheng, Haojie Wang, Jidong Zhai, Muyan Hu, Zixuan Ma, Tuowei Wang, Shuhong Huang, Xupeng Miao, Shizhi Tang, Kezhao Huang, and Zhihao Jia. 2023. EINNET: Optimizing Tensor Pro- grams with Derivation-Based Transformations. In17th USENIX Sym- posium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA, 739–755.https:/...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.