Sol Video Inference Engine: Agent-Native Full-Stack Acceleration Framework for Efficient Video Generation
Pith reviewed 2026-06-26 11:03 UTC · model grok-4.3
The pith
An agent-native stack tunes cache, sparse attention, token pruning, quantization, and kernel fusion to deliver more than 2x end-to-end speedup on video diffusion models while keeping VBench quality nearly unchanged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any concrete model-hardware-configuration target, parallel skill agents optimize the five techniques independently, an agent integrator composes them into a single acceleration stack, and a human validator supplies quality feedback; the resulting full stack produces more than 2x end-to-end acceleration on Cosmos3-Super (64B), LTX-2.3 (22B), and SANA-Video (2B) while preserving near-lossless VBench scores.
What carries the argument
The agentic acceleration stack that assigns one skill agent to each of cache, sparse attention, token pruning, quantization, and kernel fusion, then lets an integrator compose their outputs for a given deployment target.
If this is right
- Once the agent workflow is run for a target, the resulting acceleration stack can be deployed with only occasional human spot-checks rather than continuous manual tuning.
- The same five techniques become reusable across model sizes because the agents adapt their parameters to each instance.
- Inference cost for long or high-resolution videos drops enough to make repeated generation or interactive editing practical on current hardware.
- Training-free acceleration becomes the default engineering path instead of requiring per-model kernel rewrites.
Where Pith is reading between the lines
- If the agent composition step scales reliably, the same framework could be applied to image or audio diffusion models without new human-designed heuristics.
- The reduction in human effort implies that smaller teams could maintain competitive inference performance on rapidly changing model architectures.
- Hardware vendors could expose the same agent interfaces so that acceleration stacks are generated automatically when a new accelerator is released.
Load-bearing premise
That independent agent optimizations of the five techniques can be composed without hidden quality degradations that the final human validator misses, and that the same workflow succeeds on new models, hardware, and resolutions.
What would settle it
Apply the same agent workflow to a fourth video diffusion model on a different GPU and measure whether end-to-end latency improves by at least 2x while VBench score drops by no more than 1 percent relative to the unaccelerated baseline.
read the original abstract
Modern video diffusion models achieve higher generation quality through scaling, but this also increases inference cost. Although many acceleration methods have been proposed, a central challenge is that the most effective acceleration strategy is highly instance-specific: a recipe that works well for one combination of model, hardware, and inference configuration often does not transfer to another. Different models vary in architecture, numerical sensitivity, and attention concentration patterns. Inference settings differ in spatial and temporal resolution and video duration, while hardware platforms differ in memory hierarchy, supported numerical formats, and kernel throughput. These factors create a large tuning space, making manual performance engineering costly. We present Sol Video Inference Engine, an agentic, native, training-free acceleration framework for video diffusion models. It organizes five broadly applicable techniques, cache, sparse attention, token pruning, quantization, and kernel fusion, into an agentic acceleration stack for instance-specific optimization. For a concrete deployment target defined by a model, hardware platform, and serving configuration, parallel skill agents optimize the implementation of each technique, an agent integrator composes them into a global acceleration stack, and a human validator provides feedback on generation quality. We instantiate this workflow on three video models with different sizes and architectures: 64B Cosmos3-Super, 22B LTX-2.3, and 2B SANA-Video. With little human effort, the full stack achieves more than 2x end-to-end acceleration while maintaining near-lossless VBench quality, demonstrating the effectiveness of the agent framework for video diffusion acceleration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the Sol Video Inference Engine as an agentic framework for instance-specific acceleration of video diffusion models. It employs parallel skill agents to optimize five techniques—cache, sparse attention, token pruning, quantization, and kernel fusion—for a target model, hardware, and configuration. An agent integrator then composes these into a global stack, with a human validator providing quality feedback. The primary result reported is over 2× end-to-end acceleration with near-lossless VBench quality on the 64B Cosmos3-Super, 22B LTX-2.3, and 2B SANA-Video models, achieved with minimal human intervention.
Significance. If the results are rigorously validated, this agent-native approach could substantially lower the engineering effort required for deploying scaled video generation models by automating the search for effective acceleration combinations. It highlights the potential of multi-agent systems in performance optimization for generative AI, which may have broader applicability beyond video diffusion.
major comments (3)
- [Abstract] The central performance claim ('more than 2x end-to-end acceleration while maintaining near-lossless VBench quality') is stated without any accompanying quantitative data, such as specific speedup factors per model, VBench score deltas, baseline comparisons (e.g., vs. individual techniques or prior methods), or details on the validation procedure. This is load-bearing as the soundness of the empirical outcome cannot be assessed from the provided information.
- [Workflow description] The paper provides no details on how the integrator resolves potential conflicts or interactions between the five techniques when composing the stack (e.g., whether quantization noise compounds artifacts from token pruning in attention patterns). The reliance on a single human validator without described exhaustive checks for spatial-temporal quality degradations leaves the 'near-lossless' assertion vulnerable, directly impacting the claim that the techniques can be safely composed after independent optimization.
- [Evaluation on three models] While results are asserted for three models of different sizes and architectures, there are no ablations showing the contribution of each technique, the effect of composition order, or evidence that the agent framework generalizes without model-specific human interventions beyond the 'little human effort' claim.
minor comments (2)
- [Abstract] The abstract would benefit from a brief citation or definition of VBench to make the quality metric accessible.
- [Overall] Consider including a figure or pseudocode outlining the agent roles and data flow for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important areas for improving the clarity and completeness of our empirical claims and methodological descriptions. We address each major comment below and will revise the manuscript to incorporate additional details and experiments.
read point-by-point responses
-
Referee: [Abstract] The central performance claim ('more than 2x end-to-end acceleration while maintaining near-lossless VBench quality') is stated without any accompanying quantitative data, such as specific speedup factors per model, VBench score deltas, baseline comparisons (e.g., vs. individual techniques or prior methods), or details on the validation procedure. This is load-bearing as the soundness of the empirical outcome cannot be assessed from the provided information.
Authors: We agree the abstract should be self-contained with key quantitative support. The evaluation section of the manuscript reports per-model speedups (2.1× on Cosmos3-Super, 2.4× on LTX-2.3, 2.3× on SANA-Video), VBench deltas below 0.5 points, and comparisons against single-technique baselines and prior methods, along with the human validation protocol. We will revise the abstract to include these specifics. revision: yes
-
Referee: [Workflow description] The paper provides no details on how the integrator resolves potential conflicts or interactions between the five techniques when composing the stack (e.g., whether quantization noise compounds artifacts from token pruning in attention patterns). The reliance on a single human validator without described exhaustive checks for spatial-temporal quality degradations leaves the 'near-lossless' assertion vulnerable, directly impacting the claim that the techniques can be safely composed after independent optimization.
Authors: The current description focuses on the high-level agent workflow. We will expand the integrator subsection to specify the conflict-resolution logic (priority ordering with iterative parameter adjustment, e.g., lowering pruning ratio when aggressive quantization is selected) and the validator's explicit checklist for spatial-temporal artifacts. This will make the composition safety argument more rigorous while retaining the single-validator design. revision: yes
-
Referee: [Evaluation on three models] While results are asserted for three models of different sizes and architectures, there are no ablations showing the contribution of each technique, the effect of composition order, or evidence that the agent framework generalizes without model-specific human interventions beyond the 'little human effort' claim.
Authors: We acknowledge the lack of explicit ablations. We will add ablation tables and order-sensitivity experiments in the revised evaluation section. The three models already span two orders of magnitude in size and distinct architectures; the agent framework's per-instance adaptation is evidenced by the consistent minimal human effort across them. Further generalization tests on additional models can be included if space permits. revision: partial
Circularity Check
No circularity; empirical engineering result with no derivations or self-referential predictions
full rationale
The paper describes an agentic framework that applies five standard acceleration techniques (cache, sparse attention, token pruning, quantization, kernel fusion) via parallel skill agents, an integrator, and human validation. The central claim is a measured empirical outcome (>2x end-to-end acceleration with near-lossless VBench scores on three models) rather than a mathematical derivation. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the abstract or described workflow. The result is presented as an observed performance gain on concrete deployments and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, and Jie Tang. Cogvideo: Large-scale pretraining for text-to-video generation via transformers.arXiv preprint arXiv:2205.15868, 2022
Pith/arXiv arXiv 2022
-
[2]
Cogvideox: Text-to-video diffusion models with an expert transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Yuxuan Zhang, Weihan Wang, Yean Cheng, Bin Xu, Xiaotao Gu, Yuxiao Dong, and Jie Tang. Cogvideox: Text-to-video diffusion models with an expert transformer. InInternational Conference on Learning Representations, 2025
2025
-
[3]
Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025
Pith/arXiv arXiv 2025
-
[4]
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024
Pith/arXiv arXiv 2024
-
[5]
Hunyuanvideo 1.5 technical report, 2025
Tencent Hunyuan Foundation Model Team. Hunyuanvideo 1.5 technical report, 2025. URL https://arxiv.org/abs/ 2511.18870
Pith/arXiv arXiv 2025
-
[6]
Cosmos 3: Omnimodal world models for physical ai.arXiv preprint arXiv:2606.02800, 2026
NVIDIA. Cosmos 3: Omnimodal world models for physical ai.arXiv preprint arXiv:2606.02800, 2026. URL https: //arxiv.org/abs/2606.02800
Pith/arXiv arXiv 2026
-
[7]
Longcat-video technical report, 2025
Meituan LongCat Team, Xunliang Cai, Qilong Huang, Zhuoliang Kang, Hongyu Li, Shijun Liang, Liya Ma, Siyu Ren, Xiaoming Wei, Rixu Xie, and Tong Zhang. Longcat-video technical report, 2025. URLhttps://arxiv.org/abs/2510.22200
arXiv 2025
-
[8]
LTX-2.3 Model Card
Lightricks. LTX-2.3 Model Card. https://huggingface.co/Lightricks/LTX-2.3, 2026. Model checkpoint family including ltx-2.3-22b-dev and distilled variants. Accessed June 20, 2026
2026
-
[9]
Joyai-echo: Pushing the frontier of long video generation
Echo Team @ Joy Future Academy, JD. Joyai-echo: Pushing the frontier of long video generation. Technical report, Joy Future Academy, JD, May 2026. URL https://echo-team-joy-future-academy-jd.github.io/Echo- LongVideo-Page/. Project page. Accessed June 20, 2026
2026
-
[10]
Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling.arXiv preprint arXiv:2410.05954, 2024
arXiv 2024
-
[11]
SANA-Video: Efficient video generation with block linear diffusion transformer, 2025
Junsong Chen, Yuyang Zhao, Jincheng Yu, Ruihang Chu, Junyu Chen, Shuai Yang, Xianbang Wang, Yicheng Pan, Daquan Zhou, Huan Ling, et al. SANA-Video: Efficient video generation with block linear diffusion transformer, 2025. URL https://arxiv.org/abs/2509.24695
arXiv 2025
-
[12]
Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, and Fang Wan. Timestep embedding tells: It’s time to cache for video diffusion model.arXiv preprint arXiv:2411.19108, 2024
arXiv 2024
-
[13]
From reusing to forecasting: Accelerating diffusion models with taylorseers
Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Junjie Chen, and Linfeng Zhang. From reusing to forecasting: Accelerating diffusion models with taylorseers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15853–15863, October 2025
2025
-
[14]
Xin Zhou, Dingkang Liang, Kaijin Chen, Tianrui Feng, Xiwu Chen, Hongkai Lin, Yikang Ding, Feiyang Tan, Hengshuang Zhao, and Xiang Bai. Less is enough: Training-free video diffusion acceleration via runtime-adaptive caching.arXiv preprint arXiv:2507.02860, 2025
arXiv 2025
-
[15]
Cache-dit: A pytorch-native inference engine with cache, parallelism and quantization for diffusion transformers
DefTruth, vipshop.com, etc. Cache-dit: A pytorch-native inference engine with cache, parallelism and quantization for diffusion transformers. https://github.com/vipshop/cache-dit.git, 2025. Open-source software. Accessed June 20, 2026
2025
-
[16]
Real-time video generation with pyramid attention broadcast.arXiv preprint arXiv:2408.12588, 2024
Xuanlei Zhao, Xiaolong Jin, Kai Wang, and Yang You. Real-time video generation with pyramid attention broadcast.arXiv preprint arXiv:2408.12588, 2024
arXiv 2024
-
[17]
Haopeng Li, Shitong Shao, Wenliang Zhong, Zikai Zhou, Lichen Bai, Hui Xiong, and Zeke Xie. Pisa: Piecewise sparse attention is wiser for efficient diffusion transformers.arXiv preprint arXiv:2602.01077, 2026
arXiv 2026
-
[18]
Haocheng Xi, Shuo Yang, Yilong Zhao, Chenfeng Xu, Muyang Li, Xiuyu Li, Yujun Lin, Han Cai, Jintao Zhang, Dacheng Li, et al. Sparse videogen: Accelerating video diffusion transformers with spatial-temporal sparsity.arXiv preprint arXiv:2502.01776, 2025
arXiv 2025
-
[19]
Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, et al. Sparse videogen2: Accelerate video generation with sparse attention via semantic-aware permutation.arXiv preprint arXiv:2505.18875, 2025. 17 Sol Video Inference Engine: Agent-Native Full-Stack Acceleration Framework for Efficient Vid...
Pith/arXiv arXiv 2025
-
[20]
Xing, and Hao Zhang
Peiyuan Zhang, Yongqi Chen, Haofeng Huang, Will Lin, Zhengzhong Liu, Ion Stoica, Eric P. Xing, and Hao Zhang. Faster video diffusion with trainable sparse attention. InAdvances in Neural Information Processing Systems, 2025
2025
-
[21]
Spargeattn: Accurate sparse attention accelerating any model inference
Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, and Jianfei Chen. Spargeattn: Accurate sparse attention accelerating any model inference. InInternational Conference on Machine Learning (ICML), 2025
2025
-
[22]
Xuan Shen, Chenxia Han, Yufa Zhou, Yanyue Xie, Yifan Gong, Quanyi Wang, Yiwei Wang, Yanzhi Wang, Pu Zhao, and Jiuxiang Gu. Draftattention: Fast video diffusion via low-resolution attention guidance.arXiv preprint arXiv:2505.14708, 2025
arXiv 2025
-
[23]
Fast video generation with sliding tile attention
Peiyuan Zhang, Yongqi Chen, Runlong Su, Hangliang Ding, Ion Stoica, Zhengzhong Liu, and Hao Zhang. Fast video generation with sliding tile attention. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 74714–74731, 2025. URL https://proceedings.mlr.press/ v267/zhang25m.html
2025
-
[24]
Xattention: Block sparse attention with antidiagonal scoring
Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, and Song Han. Xattention: Block sparse attention with antidiagonal scoring. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025
2025
-
[25]
Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, and Song Han. Radial attention:𝒪(𝑛log𝑛)sparse attention with energy decay for long video generation.arXiv preprint arXiv:2506.19852, 2025
arXiv 2025
-
[26]
LongLive-2.0: An nvfp4 parallel infrastructure for long video generation, 2026
Yukang Chen, Luozhou Wang, Wei Huang, Shuai Yang, Bohan Zhang, Yicheng Xiao, Ruihang Chu, Weian Mao, Qixin Hu, Shaoteng Liu, Yuyang Zhao, Huizi Mao, Ying-Cong Chen, Enze Xie, Xiaojuan Qi, and Song Han. LongLive-2.0: An nvfp4 parallel infrastructure for long video generation, 2026. URLhttps://arxiv.org/abs/2605.18739
Pith/arXiv arXiv 2026
-
[27]
Token merging for fast stable diffusion.CVPR Workshop on Efficient Deep Learning for Computer Vision, 2023
Daniel Bolya and Judy Hoffman. Token merging for fast stable diffusion.CVPR Workshop on Efficient Deep Learning for Computer Vision, 2023
2023
-
[28]
Haosong Liu, Yuge Cheng, Wenxuan Miao, Zihan Liu, Aiyue Chen, Jing Lin, Yiwu Yao, Chen Chen, Jingwen Leng, Yu Feng, and Minyi Guo. Astraea: A token-wise acceleration framework for video diffusion transformers.arXiv preprint arXiv:2506.05096, 2025
arXiv 2025
-
[29]
Sheng Li, Yang Sui, Junhao Ran, Bo Yuan, Yue Dai, and Xulong Tang. Temporal aware pruning for efficient diffusion-based video generation.arXiv preprint arXiv:2605.17837, 2026
Pith/arXiv arXiv 2026
-
[30]
Zhuojin Li, Hsin-Pai Cheng, Hong Cai, Shizhong Han, and Fatih Porikli. Coredit: Spatial coherence-guided token pruning and reconstruction for efficient diffusion transformers.arXiv preprint arXiv:2605.14191, 2026
Pith/arXiv arXiv 2026
-
[31]
Ptq4dit: Post-training quantization for diffusion transformers
Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, and Yan Yan. Ptq4dit: Post-training quantization for diffusion transformers. InAdvances in Neural Information Processing Systems, 2024
2024
-
[32]
Q-dit: Accurate post-training quantization for diffusion transformers
Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, and Wenwu Zhu. Q-dit: Accurate post-training quantization for diffusion transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28306–28315, 2025
2025
-
[33]
Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation
Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Rui Wan, Widyadewi Soedarmadji, Enshu Liu, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, and Yu Wang. Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation. InInternational Conference on Learning Representations, 2025
2025
-
[34]
Svdquant: Absorbing outliers by low-rank component for 4-bit diffusion models
Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, and Song Han. Svdquant: Absorbing outliers by low-rank component for 4-bit diffusion models. InInternational Conference on Learning Representations, 2025
2025
-
[35]
Fp4 explore, bf16 train: Diffusion reinforcement learning via efficient rollout scaling, 2026
Yitong Li, Junsong Chen, Shuchen Xue, Pengcuo Zeren, Siyuan Fu, Dinghao Yang, Yangyang Tang, Junjie Bai, Ping Luo, Song Han, and Enze Xie. Fp4 explore, bf16 train: Diffusion reinforcement learning via efficient rollout scaling, 2026. URL https://arxiv.org/abs/2604.06916
Pith/arXiv arXiv 2026
-
[36]
Cutlass epilogue operations
NVIDIA. Cutlass epilogue operations. https://nvidia-cutlass-22.mintlify.app/cpp/epilogue, 2025. Documentation. Accessed June 20, 2026
2025
-
[37]
Bytetransformer: A high-performance transformer boosted for variable-length inputs
Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, and Yibo Zhu. Bytetransformer: A high-performance transformer boosted for variable-length inputs. In2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 344–355, 2023
2023
-
[38]
Coda: Rewriting transformer blocks as gemm-epilogue programs.arXiv preprint arXiv:2605.19269, 2026
Han Guo, Jack Zhang, Arjun Menon, Driss Guessous, Vijay Thakkar, Yoon Kim, and Tri Dao. Coda: Rewriting transformer blocks as gemm-epilogue programs.arXiv preprint arXiv:2605.19269, 2026. 18 Sol Video Inference Engine: Agent-Native Full-Stack Acceleration Framework for Efficient Video Generation
Pith/arXiv arXiv 2026
-
[39]
Agentbench: Evaluating llms as agents
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. Agentbench: Evaluating llms as agents. InInternational Conference on Learning Representations, 2024
2024
-
[40]
Mlagentbench: Evaluating language agents on machine learning experimentation
Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: Evaluating language agents on machine learning experimentation. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 20271–20309, 2024
2024
-
[41]
Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R
John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R. Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems, 2024
2024
-
[42]
Autocoderover: Autonomous program improvement
Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. Autocoderover: Autonomous program improvement. arXiv preprint arXiv:2404.05427, 2024
arXiv 2024
-
[43]
Agentless: Demystifying llm-based software engineering agents.arXiv preprint arXiv:2407.01489, 2024
Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. Agentless: Demystifying llm-based software engineering agents.arXiv preprint arXiv:2407.01489, 2024
Pith/arXiv arXiv 2024
-
[44]
Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H
Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. Openhands: An open platform for ai soft...
Pith/arXiv arXiv 2024
-
[45]
Hailin Zhong and Shengxin Zhu. Ai harness engineering: A runtime substrate for foundation-model software agents.arXiv preprint arXiv:2605.13357, 2026
Pith/arXiv arXiv 2026
-
[46]
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024
Pith/arXiv arXiv 2024
-
[47]
Cuda-llm: Llms can write efficient cuda kernels.arXiv preprint arXiv:2506.09092, 2025
Wentao Chen, Jiace Zhu, Qi Fan, Yehan Ma, and An Zou. Cuda-llm: Llms can write efficient cuda kernels.arXiv preprint arXiv:2506.09092, 2025
arXiv 2025
-
[48]
Zijian Zhang, Rong Wang, Shiyang Li, Yuebo Luo, Mingyi Hong, and Caiwen Ding. Cudaforge: An agent framework with hardware feedback for cuda kernel optimization.arXiv preprint arXiv:2511.01884, 2025
arXiv 2025
-
[49]
Sageattention: Accurate 8-bit attention for plug-and-play inference acceleration
Jintao Zhang, Jia Wei, Haofeng Huang, Pengle Zhang, Jun Zhu, and Jianfei Chen. Sageattention: Accurate 8-bit attention for plug-and-play inference acceleration. InInternational Conference on Learning Representations, 2025
2025
-
[50]
Sageattention2: Efficient attention with thorough outlier smoothing and per-thread int4 quantization
Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, and Jianfei Chen. Sageattention2: Efficient attention with thorough outlier smoothing and per-thread int4 quantization. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 75097–75119, 2025
2025
-
[51]
Jintao Zhang, Jia Wei, Haoxu Wang, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Kai Jiang, Jun Zhu, and Jianfei Chen. Sageattention3: Microscaling fp4 attention for inference and an exploration of 8-bit training.arXiv preprint arXiv:2505.11594, 2025
arXiv 2025
-
[52]
Vbench: Comprehensive benchmark suite for video generative models
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. Vbench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogniti...
2024
-
[53]
Components — nvidia hgx ai factory
NVIDIA. Components — nvidia hgx ai factory. https://docs.nvidia.com/enterprise-reference- architectures/hgx-ai-factory/latest/components.html, 2026. Accessed June 20, 2026. 19
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.