AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code
Pith reviewed 2026-05-20 11:38 UTC · model grok-4.3
The pith
An 8B LLM trained via data synthesis and reinforcement learning generates explicit SIMD vectorized code that reaches state-of-the-art results and sometimes exceeds -O3 compiler output.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the combination of an automated synthesis pipeline for domain-specific intrinsic data and a reinforcement learning process that rewards measured execution efficiency allows an 8B model to achieve leading performance on the SSE and AVX portions of relevant benchmarks, with some generated implementations running faster than code produced under standard -O3 optimization.
What carries the argument
VecPrompt, the automated pipeline that synthesizes training data embedding knowledge of hardware intrinsics, together with VecRL, the reinforcement learning component that aligns generated code to actual runtime performance and semantic correctness.
If this is right
- LLMs become capable of producing low-level hardware-specific code that traditional compilers cannot reliably generate through static analysis.
- Developers gain access to vectorized implementations that match or beat hand-tuned or compiler-optimized versions without writing intrinsics themselves.
- The same synthesis-plus-reinforcement pattern can be reused for other hardware-constrained code tasks where efficiency must be verified by execution.
- Benchmarks focused on vector instructions can serve as reliable training signals for improving model performance in high-performance computing domains.
Where Pith is reading between the lines
- The same training pattern might transfer to generating optimized code for other instruction sets such as NEON or GPU primitives.
- Integration into everyday coding tools could reduce the expert effort needed to reach near-optimal performance in compute-heavy applications.
- Iterative loops that feed measured runtime back into further training rounds could tighten the connection between model output and real hardware gains.
Load-bearing premise
The reinforcement learning step must reward genuinely faster and still correct code rather than allowing the model to exploit test-specific shortcuts or produce functionally wrong results that happen to look fast on the evaluation suite.
What would settle it
Running the generated implementations on new input sizes, different CPU models, or with additional correctness checks to determine whether the reported speed gains remain consistent and the outputs stay accurate.
Figures
read the original abstract
Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-performance computing. To fully exploit hardware potential, developers often resort to explicit vectorization using intrinsics, as compiler-based auto-vectorization frequently yields suboptimal results due to conservative static analysis. While Large Language Models (LLMs) have demonstrated remarkable proficiency in general code generation, they struggle with explicit vectorization due to the scarcity of high-quality corpora and the strict semantic constraints of low-level hardware instructions. In this paper, we propose AutoVecCoder, a novel framework designed to empower LLMs with the capability of automated explicit vectorization. AutoVecCoder integrates two core components: VecPrompt, an automated data synthesis pipeline to inject domain-specific intrinsic knowledge; and VecRL, a reinforcement learning framework that aligns code generation with execution efficiency. AutoVecCoder-8B trained by this framework achieves state-of-the-art performance on the SSE and AVX subsets of SimdBench and, in some cases, generates implementations surpassing standard -O3 optimizations, effectively overcoming the inherent bottlenecks of traditional automated vectorization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AutoVecCoder, a framework with two components: VecPrompt, an automated pipeline for synthesizing data that injects knowledge of SIMD intrinsics into LLMs, and VecRL, a reinforcement learning stage that further aligns generated code with execution efficiency. The central claim is that an 8B model trained under this framework reaches SOTA on the SSE and AVX subsets of SimdBench and, in some cases, produces vectorized implementations that outperform standard -O3 compiler output.
Significance. If the reported speedups are shown to arise from semantically correct and generalizable intrinsics code rather than benchmark-specific artifacts, the work would offer a practical route to improving explicit vectorization beyond what static compilers achieve, with potential value for HPC code generation tasks where LLMs currently underperform.
major comments (2)
- [§3.2] §3.2 (VecRL): The reward is described as combining execution time with a correctness signal, yet the text provides no quantitative details on the number or diversity of test cases, differential testing coverage, or adversarial input generation used to verify functional equivalence. This is load-bearing for the claim that generated code both runs faster than -O3 and remains correct, because a narrow test suite would allow the policy to exploit input-size or alignment patterns present only in SimdBench.
- [§4.1 and Table 2] §4.1 and Table 2: The SOTA and -O3-surpassing results are presented without an accompanying error analysis, per-benchmark correctness verification statistics, or comparison against stronger baselines that include manual intrinsics or other LLM-based vectorizers. Without these, it is impossible to determine whether the reported gains are robust or confined to the specific evaluation harness.
minor comments (2)
- [Abstract] The abstract states that the model 'in some cases' surpasses -O3 but does not indicate the fraction of benchmarks or the magnitude of improvement; adding this quantification would improve clarity.
- [§3.2] Notation for the reward components in VecRL is introduced without an explicit equation; a single displayed equation would make the RL objective easier to follow.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which has helped us identify areas where the manuscript can be strengthened. We address each major comment below and have revised the paper accordingly to provide the requested details and analyses. We believe these changes improve the clarity and robustness of our claims without altering the core contributions.
read point-by-point responses
-
Referee: [§3.2] §3.2 (VecRL): The reward is described as combining execution time with a correctness signal, yet the text provides no quantitative details on the number or diversity of test cases, differential testing coverage, or adversarial input generation used to verify functional equivalence. This is load-bearing for the claim that generated code both runs faster than -O3 and remains correct, because a narrow test suite would allow the policy to exploit input-size or alignment patterns present only in SimdBench.
Authors: We agree that quantitative details on the verification process are essential to support the correctness claims. In the revised manuscript, Section 3.2 has been expanded with a new paragraph and accompanying table that specifies: 512 test cases per kernel (drawn from a pool of 2000+ generated inputs), covering input sizes from 32 to 8192 elements, multiple alignments (including unaligned and misaligned cases), and data types. Differential testing is performed against both reference scalar implementations and -O3 outputs, achieving >92% branch coverage via instrumentation. Adversarial inputs are generated through a fuzzing loop (10k iterations per kernel using AFL-style mutation), and we report that no exploits of SimdBench-specific patterns were observed in the final policy. These additions directly address the concern about potential overfitting and confirm that the reward signal enforces generalizable correctness. revision: yes
-
Referee: [§4.1 and Table 2] §4.1 and Table 2: The SOTA and -O3-surpassing results are presented without an accompanying error analysis, per-benchmark correctness verification statistics, or comparison against stronger baselines that include manual intrinsics or other LLM-based vectorizers. Without these, it is impossible to determine whether the reported gains are robust or confined to the specific evaluation harness.
Authors: We acknowledge that the original presentation lacked sufficient supporting analysis. The revised §4.1 now includes a dedicated error analysis subsection reporting that 97.4% of generated codes pass functional equivalence checks on a held-out test set of 300 inputs per benchmark (distinct from training and SimdBench). Extended Table 2 provides per-benchmark pass rates and speedup breakdowns. We have added comparisons to manual intrinsics implementations (for the 12 kernels where hand-written versions exist in public repositories) and to other LLM-based approaches, including GPT-4 with few-shot prompting and a recent open-source vectorization LLM baseline. These results show consistent outperformance and indicate that the gains generalize beyond the original harness. We have also clarified that all reported numbers use the same evaluation protocol with strict timeout and correctness gates. revision: yes
Circularity Check
No circularity: empirical training pipeline evaluated on external benchmarks
full rationale
The paper presents an empirical framework (VecPrompt data synthesis + VecRL reinforcement learning) that trains an LLM on synthesized data and optimizes via execution-time rewards against external compiler baselines and SimdBench. No mathematical derivations, equations, or first-principles claims are made that reduce to fitted parameters or self-definitions by construction. Performance claims are direct experimental outcomes on held-out benchmark subsets rather than predictions forced by internal fits. No load-bearing self-citations or uniqueness theorems are invoked in the provided description. The approach is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- RL reward hyperparameters
axioms (1)
- domain assumption Synthesized data from VecPrompt injects accurate domain-specific intrinsic knowledge into the LLM.
invented entities (2)
-
VecPrompt
no independent evidence
-
VecRL
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formulate the total reward R_total as: R_total = I(correct)·(β_base + β_perf·tanh(α·Δ))
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AutoVecCoder-8B ... achieves state-of-the-art performance on the SSE and AVX subsets of SimdBench
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Publications Manual , year = "1983", publisher =
work page 1983
-
[3]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
- [4]
-
[5]
Dan Gusfield , title =. 1997
work page 1997
-
[6]
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
work page 2015
-
[7]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[8]
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization , author=. 2025 , eprint=
work page 2025
-
[9]
SuperCoder: Assembly Program Superoptimization with Large Language Models , author=. 2025 , eprint=
work page 2025
-
[10]
KernelBench: Can LLMs Write Efficient GPU Kernels? , author=. 2025 , eprint=
work page 2025
-
[11]
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs , author=. 2025 , eprint=
work page 2025
-
[12]
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning , author=. 2025 , eprint=
work page 2025
-
[14]
Towards Better Correctness and Efficiency in Code Generation , author=. 2025 , eprint=
work page 2025
-
[15]
VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations , author=. 2025 , eprint=
work page 2025
-
[16]
SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation , author=. 2025 , eprint=
work page 2025
-
[17]
VecIntrinBench: Benchmarking Cross-Architecture Intrinsic Code Migration for RISC-V Vector , author=. 2025 , eprint=
work page 2025
-
[18]
IntrinTrans: LLM-based Intrinsic Code Translator for RISC-V Vector , author=. 2025 , eprint=
work page 2025
-
[19]
ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning , author=. 2025 , eprint=
work page 2025
-
[20]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=
work page 2024
-
[21]
and Wong, Tommy and Padua, David A
Maleki, Saeed and Gao, Yaoqing and Garzarán, María J. and Wong, Tommy and Padua, David A. , booktitle=. An Evaluation of Vectorizing Compilers , year=
-
[22]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. 2024 , eprint=
work page 2024
- [23]
-
[26]
Qwen3-Coder: Agentic Coding in the World , howpublished =. 2025 , note =
work page 2025
-
[28]
Grok 4 Fast , year =
-
[29]
Claude Sonnet: Hybrid Reasoning Frontier Model , year =
-
[30]
Introducing GPT-5 , year =
-
[31]
LLaMeSIMD: The Ultimate SIMD Intrinsic & Function Translation Benchmarking Suite , year =
-
[32]
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
ECCO: Can we improve model-generated code efficiency without sacrificing functional correctness? , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2024
-
[33]
Advances in Neural Information Processing Systems , volume=
Effibench: Benchmarking the efficiency of automatically generated code , author=. Advances in Neural Information Processing Systems , volume=
-
[34]
Advances in Neural Information Processing Systems , volume=
Mercury: A code efficiency benchmark for code large language models , author=. Advances in Neural Information Processing Systems , volume=
-
[35]
Advances in neural information processing systems , volume=
Learning to summarize with human feedback , author=. Advances in neural information processing systems , volume=
-
[36]
HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =
work page 2024
- [37]
-
[38]
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation , author=. 2025 , eprint=
work page 2025
- [39]
-
[42]
A microbenchmark support library , url =
Google , year =. A microbenchmark support library , url =
- [43]
- [44]
-
[45]
and Vasudevan, Nalini and Wu, Youfeng , title =
Baghsorkhi, Sara S. and Vasudevan, Nalini and Wu, Youfeng , title =. SIGPLAN Not. , month = jun, pages =. 2016 , issue_date =. doi:10.1145/2980983.2908111 , abstract =
-
[48]
Mendis, Charith and Yang, Cambridge and Pu, Yewen and Amarasinghe, Saman and Carbin, Michael , title =. Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =. 2019 , publisher =
work page 2019
-
[49]
A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages , author=. 2025 , eprint=
work page 2025
-
[50]
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code , author=. 2024 , eprint=
work page 2024
-
[51]
Nuzman, D. and Henderson, R. , booktitle=. Multi-platform auto-vectorization , year=
-
[54]
Faruk Akgul. 2013. ZeroMQ. Packt Publishing
work page 2013
-
[55]
Anthropic . 2025. https://www.anthropic.com/claude/sonnet Claude sonnet: Hybrid reasoning frontier model . https://www.anthropic.com/claude/sonnet. Accessed: 2025-12-30
work page 2025
-
[56]
ARM. 2025. https://developer.arm.com/documentation/102699/0100 Sve optimization guide . Accessed: 2025-12-30
work page 2025
-
[57]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and 1 others. 2021. Program synthesis with large language models. arXiv preprint arXiv:2108.07732
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[58]
Baghsorkhi, Nalini Vasudevan, and Youfeng Wu
Sara S. Baghsorkhi, Nalini Vasudevan, and Youfeng Wu. 2016. https://doi.org/10.1145/2908080.2908111 Flexvec: auto-vectorization for irregular loops . In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '16, page 697–710, New York, NY, USA. Association for Computing Machinery
-
[59]
Yishen Chen, Charith Mendis, Michael Carbin, and Saman Amarasinghe. 2021. https://doi.org/10.1145/3445814.3446692 Vegen: a vectorizer generator for simd and beyond . In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '21, page 902–914, New York, NY, USA. Association for ...
-
[60]
Yuxuan Chen, Dewen Guo, Sen Mei, Xinze Li, Hao Chen, Yishan Li, Yixuan Wang, Chaoyue Tang, Ruobing Wang, Dingjun Wu, Yukun Yan, Zhenghao Liu, Shi Yu, Zhiyuan Liu, and Maosong Sun. 2025 a . https://arxiv.org/abs/2504.08761 Ultrarag: A modular and automated toolkit for adaptive retrieval-augmented generation . Preprint, arXiv:2504.08761
-
[61]
Zhirong Chen, Kaiyan Chang, Zhuolin Li, Xinyang He, Chujie Chen, Cangyuan Li, Mengdi Wang, Haobo Xu, Yinhe Han, and Ying Wang. 2025 b . https://arxiv.org/abs/2507.04736 Chipseek-r1: Generating human-surpassing rtl with llm via hierarchical reward-driven reinforcement learning . Preprint, arXiv:2507.04736
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, and 1 others. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[63]
DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, and 181 others. 2025. https://arxiv.org/abs/2412.19437 Deepseek-v3 technical report . Preprint, arXiv:2412.19437
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[64]
Mingzhe Du, Anh Tuan Luu, Bin Ji, Qian Liu, and See-Kiong Ng. 2024. Mercury: A code efficiency benchmark for code large language models. Advances in Neural Information Processing Systems, 37:16601--16622
work page 2024
- [65]
- [66]
-
[67]
Google. 2014. https://github.com/google/benchmark A microbenchmark support library . Originally released in 2014; accessed 2025
work page 2014
-
[68]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, and 1 others. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [69]
- [70]
- [71]
-
[72]
Dong Huang, Yuhao Qing, Weiyi Shang, Heming Cui, and Jie M Zhang. 2024. Effibench: Benchmarking the efficiency of automatically generated code. Advances in Neural Information Processing Systems, 37:11506--11544
work page 2024
-
[73]
Intel. 2025. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html Intel® intrinsics guide . Accessed: 2025-12-30
work page 2025
- [74]
-
[75]
Jianling Li, ShangZhan Li, Zhenye Gao, Qi Shi, Yuxuan Li, Zefan Wang, Jiacheng Huang, WangHaojie WangHaojie, Jianrong Wang, Xu Han, Zhiyuan Liu, and Maosong Sun. 2025 a . https://doi.org/10.18653/v1/2025.findings-acl.1183 T riton B ench: Benchmarking large language model capabilities for generating triton operators . In Findings of the Association for Com...
- [76]
-
[77]
Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, and 1 others. 2025. Deepseek-v3. 2: Pushing the frontier of open large language models. arXiv preprint arXiv:2512.02556
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[78]
Garzarán, Tommy Wong, and David A
Saeed Maleki, Yaoqing Gao, María J. Garzarán, Tommy Wong, and David A. Padua. 2011. https://doi.org/10.1109/PACT.2011.68 An evaluation of vectorizing compilers . In 2011 International Conference on Parallel Architectures and Compilation Techniques, pages 372--382
-
[79]
Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, and Michael Carbin. 2019. Compiler auto-vectorization with imitation learning. Curran Associates Inc., Red Hook, NY, USA
work page 2019
-
[80]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006 a . https://doi.org/10.1145/1133255.1133997 Auto-vectorization of interleaved data for simd . SIGPLAN Not., 41(6):132–143
-
[81]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006 b . https://doi.org/10.1145/1133981.1133997 Auto-vectorization of interleaved data for simd . In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, page 132–143, New York, NY, USA. Association for Computing Machinery
-
[82]
OpenAI . 2025. https://openai.com/index/introducing-gpt-5/ Introducing gpt-5 . https://openai.com/index/introducing-gpt-5/. Accessed: 2025-12-30
work page 2025
-
[83]
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, and Azalia Mirhoseini. 2025. https://arxiv.org/abs/2502.10517 Kernelbench: Can llms write efficient gpu kernels? Preprint, arXiv:2502.10517
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[84]
Qwen Team . 2025. https://qwenlm.github.io/blog/qwen3-coder/ Qwen3-coder: Agentic coding in the world . Open source model release and technical blog. Available from https://qwenlm.github.io/blog/qwen3-coder/
work page 2025
-
[85]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2024. https://arxiv.org/abs/2305.18290 Direct preference optimization: Your language model is secretly a reward model . Preprint, arXiv:2305.18290
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[86]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. https://arxiv.org/abs/2402.03300 Deepseekmath: Pushing the limits of mathematical reasoning in open language models . Preprint, arXiv:2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[87]
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. 2024. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[88]
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. Advances in neural information processing systems, 33:3008--3021
work page 2020
- [89]
-
[90]
Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, and Shuvendu K. Lahiri. 2025. https://doi.org/10.1145/3696443.3708929 Llm-vectorizer: Llm-based verified loop vectorizer . In Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, CGO '25, page 137–149, New York, NY, USA. Association for Computing Machinery
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.