Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.
Decoupled weight decay regularization
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
Deep Discriminant Analysis (DDA) is a new loss that maximizes between-class variance and minimizes within-class variance to produce more compact and separable features for image segmentation.
EditTransfer++ delivers state-of-the-art faithfulness to visual editing examples and faster inference by removing text conditioning during fine-tuning and applying best-worst contrastive refinement plus condition compression.
A rate-distortion framework for lossy compression of transformer representations yields substantial bitrate savings on language tasks while preserving accuracy, with observed rates aligning to derived information-theoretic bounds.
S-MGAA adds pixel-channel enhancement and frequency compensation modules to improve audio deepfake detection on very short, degraded speech inputs.
citing papers explorer
-
Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers
Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.
-
Deep Image Segmentation via Discriminant Feature Learning
Deep Discriminant Analysis (DDA) is a new loss that maximizes between-class variance and minimizes within-class variance to produce more compact and separable features for image segmentation.
-
EditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editing
EditTransfer++ delivers state-of-the-art faithfulness to visual editing examples and faster inference by removing text conditioning during fine-tuning and applying best-worst contrastive refinement plus condition compression.
-
Rate-Distortion Optimization for Transformer Inference
A rate-distortion framework for lossy compression of transformer representations yields substantial bitrate savings on language tasks while preserving accuracy, with observed rates aligning to derived information-theoretic bounds.
-
Audio Deepfake Detection at the First Greeting: "Hi!"
S-MGAA adds pixel-channel enhancement and frequency compensation modules to improve audio deepfake detection on very short, degraded speech inputs.
- SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation