BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.
Sdar-vl: Stable and efficient block-wise diffusion for vision-language understanding
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3representative citing papers
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.
citing papers explorer
-
BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation
BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.
-
DMax: Aggressive Parallel Decoding for dLLMs
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
-
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.