VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping
Pith reviewed 2026-05-17 21:30 UTC · model grok-4.3
The pith
VVS reduces target model forward passes by 2.8 times in visual autoregressive generation by skipping redundant verification steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Verification redundancy and stale feature reusability in the drafting stage of speculative decoding permit partial verification skipping without meaningful quality loss. The VVS framework realizes this by combining a verification-free token selector with dynamic truncation, token-level feature caching and reuse, and fine-grained skipped step scheduling, thereby lowering target-model forward passes to 2.8 times fewer than vanilla autoregressive decoding while preserving competitive generation quality.
What carries the argument
The VVS framework that integrates verification-free token selection with dynamic truncation, token-level feature caching, and skipped-step scheduling to enable partial verification skipping during speculative decoding.
If this is right
- The number of target model forward passes drops by a factor of 2.8 compared with vanilla autoregressive decoding.
- Image generation quality remains competitive with conventional speculative decoding.
- The speed-quality trade-off improves over existing speculative decoding methods for visual autoregressive models.
- The overall speculative decoding paradigm gains a new direction through selective verification skipping.
Where Pith is reading between the lines
- The same skipping logic could extend to autoregressive generation of video or audio sequences where tokens also show high interchangeability.
- Combining VVS with other latency-reduction techniques such as early exiting or quantization might yield further gains.
- The feature-reuse idea may help in non-visual domains that already use draft-then-verify pipelines.
- Empirical tests on larger visual autoregressive models would show whether the 2.8x reduction scales.
Load-bearing premise
Visual tokens are interchangeable enough and drafting-stage redundancy plus feature reuse are reliable enough that skipping selected verification steps leaves generation quality intact.
What would settle it
Generate images on a standard benchmark with VVS and measure either no reduction in target forward passes or a clear increase in FID or other quality metrics relative to both vanilla autoregressive decoding and standard speculative decoding.
Figures
read the original abstract
Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step" paradigm prevents a direct reduction in the number of forward passes, limiting its acceleration potential. Motivated by the interchangeability of visual tokens, we explore verification skipping in the SD process for the first time to explicitly cut the number of target model forward passes, thereby reducing inference latency. By analyzing the characteristics of the drafting stage, we observe that verification redundancy and stale feature reusability are key factors to maintain generation quality while improving speed for verification-free steps. Inspired by these two observations, we propose a novel SD framework VVS to accelerate visual AR model via partial verification skipping, which integrates three complementary modules: (1) a verification-free token selector with dynamic truncation, (2) token-level feature caching and reuse, and (3) fine-grained skipped step scheduling. Consequently, VVS reduces the number of target model forward passes by $2.8\times$ relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed-quality trade-off over conventional SD frameworks and revealing strong potential to reshape the SD paradigm. Our code is available at https://github.com/HyattDD/VVS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces VVS, a speculative decoding framework for visual autoregressive image generation models. Motivated by observations of verification redundancy and stale feature reusability during the drafting stage, it enables partial verification skipping to reduce target-model forward passes. The framework integrates three modules: a verification-free token selector using dynamic truncation, token-level feature caching and reuse, and fine-grained skipped-step scheduling. The central empirical claim is a 2.8× reduction in target-model forward passes relative to vanilla AR decoding while preserving competitive generation quality and improving the speed-quality trade-off over standard speculative decoding.
Significance. If the quality-preservation results hold under the proposed skipping strategy, the work could meaningfully advance efficient inference for visual AR models by relaxing the rigid draft-then-verify loop of conventional speculative decoding. The empirical grounding in visual-token interchangeability and the open-sourced code are constructive elements that support further exploration of verification-light SD variants.
major comments (2)
- [Method (token-level feature caching and reuse)] The central 2.8× forward-pass reduction rests on the assumption that verification redundancy plus stale feature reuse permit skipping without meaningful quality loss. In the method description of token-level feature caching and reuse, no explicit bound, divergence metric, or ablation is provided on hidden-state drift or logit/perplexity shift as a function of consecutive skip length. This is load-bearing for the quality-preservation claim, especially over long AR sequences where small inconsistencies can compound.
- [Experiments (main results table)] Table reporting the main speedup and quality results: the 2.8× figure and competitive quality metrics should include error bars or standard deviations across multiple random seeds and at least two distinct datasets to demonstrate robustness against post-hoc choices of the dynamic truncation threshold.
minor comments (2)
- [Abstract] The abstract states that VVS 'reveals strong potential to reshape the SD paradigm'; this phrasing is stronger than the concrete contribution and could be revised to 'suggests a promising direction for relaxing verification in SD for visual AR models'.
- [Figures] Figure captions and legends should explicitly label all compared baselines (vanilla AR, standard SD, VVS variants) and state the exact quality metrics (FID, CLIP score, etc.) used in each panel.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript while preserving its core contributions.
read point-by-point responses
-
Referee: [Method (token-level feature caching and reuse)] The central 2.8× forward-pass reduction rests on the assumption that verification redundancy plus stale feature reuse permit skipping without meaningful quality loss. In the method description of token-level feature caching and reuse, no explicit bound, divergence metric, or ablation is provided on hidden-state drift or logit/perplexity shift as a function of consecutive skip length. This is load-bearing for the quality-preservation claim, especially over long AR sequences where small inconsistencies can compound.
Authors: We agree that a more explicit characterization of feature drift would strengthen the methodological justification. The current manuscript supports the quality-preservation claim through end-to-end generation metrics and targeted ablations on the overall VVS framework, but does not include a dedicated per-skip-length analysis of hidden-state or logit divergence. In the revision we will add a new figure and accompanying text that reports cosine similarity of cached features, KL divergence on logits, and perplexity shift as functions of consecutive skip length (up to the maximum used in our scheduling). This addition will directly address concerns about compounding effects in long sequences. revision: yes
-
Referee: [Experiments (main results table)] Table reporting the main speedup and quality results: the 2.8× figure and competitive quality metrics should include error bars or standard deviations across multiple random seeds and at least two distinct datasets to demonstrate robustness against post-hoc choices of the dynamic truncation threshold.
Authors: We will revise the main results table to report means and standard deviations over at least three random seeds for all metrics. For datasets, primary results are reported on the standard ImageNet benchmark used by prior visual AR work; we will add a second dataset (COCO captions) with corresponding speed and quality numbers, either in the main table or as a dedicated row if space is limited. We will also include a short sensitivity plot for the dynamic truncation threshold to demonstrate that the reported 2.8× speedup and quality remain stable across reasonable threshold choices, thereby addressing post-hoc selection concerns. revision: yes
Circularity Check
No circularity: empirical observations and engineering modules form an independent proposal
full rationale
The paper's central claim rests on two stated empirical observations (verification redundancy and stale feature reusability) drawn from analysis of the drafting stage, which then motivate three engineering modules. These observations are presented as direct findings rather than parameters fitted to the target speed-up result. No equations, uniqueness theorems, or self-citations are invoked to force the 2.8× forward-pass reduction; the reduction is reported as a measured outcome on visual AR tasks. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- dynamic truncation threshold
axioms (1)
- domain assumption Visual tokens are interchangeable enough that selected verification steps can be skipped without quality degradation
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
verification redundancy and stale feature reusability are key factors... candidate token sequences exhibit similarity >0.7... similarity between features of adjunct tokens is 0.68
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
VVS reduces the number of target model forward passes by 2.8×... three complementary modules: verification-free token selector with dynamic truncation, token-level feature caching and reuse, fine-grained skipped step scheduling
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Judge Decoding: Faster Speculative Sampling Requires Going Be- yond Model Alignment
Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Sch ¨onfeld, Ali Thabet, and Jonas K Kohler. Judge Decoding: Faster Speculative Sampling Requires Going Be- yond Model Alignment. InThe Thirteenth International Conference on Learning Representations, 2025. 8
work page 2025
-
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 2
work page 1901
-
[3]
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, and Tri Dao. Medusa: Sim- ple LLM Inference Acceleration Framework with Multiple Decoding Heads. InForty-first International Conference on Machine Learning. arXiv, 2024. 1, 4, 8
work page 2024
-
[4]
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean- Baptiste Lespiau, Laurent Sifre, and John Jumper. Acceler- ating large language model decoding with speculative sam- pling.arXiv preprint arXiv:2302.01318, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Collaborative decoding makes visual auto-regressive modeling efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, and Xinchao Wang. Collaborative decoding makes visual auto-regressive modeling efficient. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23334–23344,
-
[6]
arXiv preprint arXiv:2407.06135
Ethan Chern, Jiadi Su, Yan Ma, and Pengfei Liu. Anole: An open, autoregressive, native large multimodal mod- els for interleaved image-text generation.arXiv preprint arXiv:2407.06135, 2024. 1
-
[7]
Deepseek-v3 technical report, 2025
DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, et al. Deepseek-v3 technical report, 2025. 1
work page 2025
-
[8]
Luciano Del Corro, Allie Del Giorno, Sahaj Agarwal, Bin Yu, Ahmed Awadallah, and Subhabrata Mukherjee. Skipdecode: Autoregressive skip decoding with batching and caching for efficient llm inference.arXiv preprint arXiv:2307.02628, 2023. 1
-
[9]
Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Break the sequential dependency of llm inference using lookahead decoding.arXiv preprint arXiv:2402.02057, 2024
-
[10]
Zipar: Parallel Au- toregressive Image Generation through Spatial Locality
Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, and Bohan Zhuang. Zipar: Parallel Au- toregressive Image Generation through Spatial Locality. In Forty-second International Conference on Machine Learn- ing, 2025. 1
work page 2025
-
[11]
CLIPScore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 7514–7528, Online and Punta Cana, Domini- can Republic, 2021. Association for Computational Linguis- tics. 6
work page 2021
-
[12]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 6629–6640, Red Hook, NY , USA, 2017. Curran Associates Inc. 6
work page 2017
-
[13]
Yang, Yeonsung Jung, Ji- hun Yun, Souvik Kundu, Sung-Yub Kim, and Eunho Yang
Doohyuk Jang, Sihwan Park, J. Yang, Yeonsung Jung, Ji- hun Yun, Souvik Kundu, Sung-Yub Kim, and Eunho Yang. Lantern: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding. InInternational Conference on Learning Representations. arXiv, 2024. 1, 3, 6, 8
work page 2024
-
[14]
Improved precision and recall met- ric for assessing generative models
Tuomas Kynk ¨a¨anniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall met- ric for assessing generative models. InNeural Information Processing Systems, 2019. 6
work page 2019
-
[15]
Fast In- ference from Transformers via Speculative Decoding
Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast In- ference from Transformers via Speculative Decoding. InIn- ternational Conference on Machine Learning, pages 19274– 19286, 2022. 3
work page 2022
-
[16]
Eagle: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li, Fangyun Wei, Chao Zhang, and Hongyang Zhang. Eagle: Speculative Sampling Requires Rethinking Feature Uncertainty. InForty-first International Conference on Ma- chine Learning. arXiv, 2024. 4, 8
work page 2024
-
[17]
Eagle-2: Faster Inference of Language Models with Dy- namic Draft Trees
Yuhui Li, Fangyun Wei, Chao Zhang, and Hongyang Zhang. Eagle-2: Faster Inference of Language Models with Dy- namic Draft Trees. InConference on Empirical Methods in Natural Language Processing, pages 7421–7432. arXiv,
-
[18]
Sp-vla: A joint model scheduling and token pruning approach for vla model acceleration, 2025
Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Ji- ajun Fan, Xinzhu Ma, Shutao Xia, Zhi Wang, and Wenwu Zhu. Sp-vla: A joint model scheduling and token pruning approach for vla model acceleration, 2025. 8
work page 2025
-
[19]
Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, and Wenwu Zhu. Prance: Joint token-optimization and structural channel-pruning for adap- tive vit inference.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–17, 2025. 8
work page 2025
-
[20]
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li, Fangyun Wei, Chao Zhang, and Hongyang Zhang. Eagle-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test.arXiv.org, abs/2503.01840, 2025. 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean Conference on Computer Vision, 2014. 6
work page 2014
-
[22]
Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, and Peng Gao. Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Genera- tion with Multimodal Generative Pretraining.arXiv.org, abs/2408.02657, 2024. 1, 2
-
[23]
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, et al. Specinfer: Accel- erating large language model serving with tree-based specu- lative inference and verification. InProceedings of the 29th ACM International Conference on Architectural Support for Programming ...
work page 2024
-
[24]
Grouped speculative decoding for autoregressive im- 9 age generation
Junhyuk So, Juncheol Shin, Hyunho Kook, and Eunhyeok Park. Grouped speculative decoding for autoregressive im- 9 age generation. InInternational Conference on Computer Vision, 2025. 1, 6, 8
work page 2025
-
[25]
Mitchell Stern, Noam Shazeer, and Jakob Uszkoreit. Block- wise parallel decoding for deep autoregressive models.Ad- vances in Neural Information Processing Systems, 31, 2018. 1
work page 2018
-
[26]
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, and Zehuan Yuan. Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation. arXiv.org, abs/2406.06525, 2024. 1, 6
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[27]
Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, and Felix Yu. Spectr: Fast spec- ulative decoding via optimal transport.Advances in Neural Information Processing Systems, 36:30222–30242, 2023. 8
work page 2023
-
[28]
Mixed-precision neural network quantization via learned layer-wise importance
Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Wen Ji, Yaowei Wang, and Wenwu Zhu. Mixed-precision neural network quantization via learned layer-wise importance. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XI, page 259–275, Berlin, Heidelberg, 2022. Springer-Verlag. 8
work page 2022
-
[29]
Chameleon: Mixed-modal early-fusion foundation models, 2025
Chameleon Team. Chameleon: Mixed-modal early-fusion foundation models, 2025. 1
work page 2025
-
[30]
Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, and Xihui Liu. Accelerating Auto-regressive Text-to-Image Generation with Training- free Speculative Jacobi Decoding. InInternational Confer- ence on Learning Representations. arXiv, 2024. 6, 8
work page 2024
-
[31]
Neural discrete representation learning,
Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning,
-
[32]
Emu3: Next-token prediction is all you need, 2024
Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, Bowen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, and Zhongyuan Wang. Emu3: Next-token prediction is all you need, 2024. 1
work page 2024
-
[33]
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis, 2023. 6
work page 2023
-
[34]
Speculative decoding: Exploiting spec- ulative execution for accelerating seq2seq generation
Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, and Zhifang Sui. Speculative decoding: Exploiting spec- ulative execution for accelerating seq2seq generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3909–3925, Singapore, 2023. Associa- tion for Computational Linguistics. 8
work page 2023
-
[35]
Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, and Zhifang Sui. Unlock- ing efficiency in large language model inference: A com- prehensive survey of speculative decoding. InFindings of the Association for Computational Linguistics: ACL 2024, pages 7655–7671, Bangkok, Thailand, 2024. Association for Computational...
work page 2024
-
[36]
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Im- age Modeling.arXiv.org, abs/2507.17801, 2025
Yi Xin, Juncheng Yan, Qi Qin, Zhen Li, Dongyang Liu, Shicheng Li, Victor Shea-Jay Huang, Yupeng Zhou, Ren- rui Zhang, Le Zhuo, Tiancheng Han, Xiaoqing Sun, Siqi Luo, Mengmeng Wang, Bin Fu, Yuewen Cao, Hongsheng Li, Guangtao Zhai, Xiaohong Liu, Yuting Qiao, and Peng Gao. Lumina-mGPT 2.0: Stand-Alone AutoRegressive Im- age Modeling.arXiv.org, abs/2507.17801...
-
[37]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, et al. Qwen3 technical report, 2025. 1
work page 2025
-
[38]
Vector-quantized image modeling with improved vqgan, 2022
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, and Yonghui Wu. Vector-quantized image modeling with improved vqgan, 2022. 2
work page 2022
-
[39]
Faster Speculative De- coding via Effective Draft Decoder with Pruned Candidate Tree
Huanran Zheng and Xiaoling Wang. Faster Speculative De- coding via Effective Draft Decoder with Pruned Candidate Tree. InAnnual Meeting of the Association for Computa- tional Linguistics, pages 9856–9868, 2025. 1 10 VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping Supplementary Material
work page 2025
-
[40]
Implement Details We present the pseudocode of the VVS framework in Algo- rithm 2 to further illustrate our design. Algorithm 2VVS with Partial Verification Skipping Require:℘: text prompt;M T : target model;M D: drafter model;L: max length of generated sequence;V last: whether last step was verified Ensure:Generated token sequenceSfor decoding to image 1...
-
[41]
Supplementary Results of Drafting Stage Analysis In Tab. 4, the results demonstrate that after substituting the verification results, both the acceleration performance and generation quality of SD remain highly stable. Tab. 5 and Tab. 6 further illustrate the impact of leveraging features various staleness for drafting, highlighting their reusability
-
[42]
Prompts used in Qualitative Experiment • A vast desert landscape under a starry sky, with a single tent illuminated by a warm campfire. Table 4. Verification redundancy experiment.rrepresents the pro- portion of verified results that are replaced for all iterations. We re- place the verified tokens of the target model with the same number of tokens from t...
-
[43]
Additional Experiments on Generalization We further validated our VVS framework on the Lumina- mGPT model. Fig. 9 offers a visual demonstration of the resulting image quality, using the same prompts as in Sec. 3. We observe that under the same relaxation thresh- oldδ= 0.2, VVS markedly cuts the target model’s forward passes while preserving generation fid...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.