hub Mixed citations

Back to Basics: Let Denoising Generative Models Denoise

Tianhong Li, Kaiming He · 2025 · cs.CV · arXiv 2511.13720

Mixed citation behavior. Most common role is background (62%).

64 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 64 citing papers arXiv PDF

abstract

Today's denoising diffusion models do not "denoise" in the classical sense, i.e., they do not directly predict clean images. Rather, the neural networks predict noise or a noised quantity. In this paper, we suggest that predicting clean data and predicting noised quantities are fundamentally different. According to the manifold assumption, natural data should lie on a low-dimensional manifold, whereas noised quantities do not. With this assumption, we advocate for models that directly predict clean data, which allows apparently under-capacity networks to operate effectively in very high-dimensional spaces. We show that simple, large-patch Transformers on pixels can be strong generative models: using no tokenizer, no pre-training, and no extra loss. Our approach is conceptually nothing more than "Just image Transformers", or JiT, as we call it. We report competitive results using JiT with large patch sizes of 16 and 32 on ImageNet at resolutions of 256 and 512, where predicting high-dimensional noised quantities can fail catastrophically. With our networks mapping back to the basics of the manifold, our research goes back to basics and pursues a self-contained paradigm for Transformer-based diffusion on raw natural data.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 12 method 3 baseline 1

citation-polarity summary

background 10 use method 3 support 2 baseline 1

claims ledger

abstract Today's denoising diffusion models do not "denoise" in the classical sense, i.e., they do not directly predict clean images. Rather, the neural networks predict noise or a noised quantity. In this paper, we suggest that predicting clean data and predicting noised quantities are fundamentally different. According to the manifold assumption, natural data should lie on a low-dimensional manifold, whereas noised quantities do not. With this assumption, we advocate for models that directly predict clean data, which allows apparently under-capacity networks to operate effectively in very high-dimens

co-cited works

representative citing papers

Let EEG Models Learn EEG

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

JET is a conditional flow matching framework that generates EEG as continuous raw sequences with added constraints for spectral and temporal properties, achieving over 40% lower TS-FID than prior discrete denoising methods on three benchmarks.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes

cs.GR · 2026-05-19 · unverdicted · novelty 7.0

Proposes discretized Matérn process noise for triangulation-agnostic flow matching on meshes with PoissonNet denoiser, tested on elastic states and humanoid poses for meshes exceeding one million triangles.

Binomial flows: Denoising and flow matching for discrete ordinal data

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

Binomial flows close the gap between continuous flow matching and discrete ordinal data by using binomial distributions to enable unified denoising, sampling, and exact likelihoods in diffusion models.

Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.

Grokking of Diffusion Models: Case Study on Modular Addition

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.

Coevolving Representations in Joint Image-Feature Diffusion

cs.CV · 2026-04-19 · unverdicted · novelty 7.0

CoReDi coevolves semantic representations with the diffusion model via a jointly learned linear projection stabilized by stop-gradient, normalization, and regularization, yielding faster convergence and higher sample quality than fixed-representation baselines.

FARM: Foundational Aerial Radio Map for Intelligent Low-Altitude Networking

eess.SP · 2026-04-19 · unverdicted · novelty 7.0

FARM is a foundation model combining masked autoencoders and diffusion decoders to estimate high-resolution aerial radio maps from a new multi-band low-altitude dataset, claiming superior accuracy and generalization over prior methods.

Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Free-Range Gaussians uses flow matching over Gaussian parameters to predict non-grid-aligned 3D Gaussians from multi-view images, enabling synthesis of plausible content in unobserved regions with fewer primitives than grid-aligned methods.

Latent Generative Solvers for Generalizable Long-Term Physics Simulation

cs.AI · 2026-02-11 · unverdicted · novelty 7.0

LGS pretrained on 2.5M trajectories across 16 systems matches deterministic baselines at one step and halves 20-step error while using far less compute and adapting to held-out higher-resolution flows.

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

PiD is a pixel diffusion decoder that performs latent-to-pixel conversion and 4-8x upsampling in one generative step, enabling early stopping of latent diffusion and achieving sub-second 2048x2048 decoding with claimed better fidelity than cascaded baselines.

RiT: Vanilla Diffusion Transformers Suffice in Representation Space

cs.CV · 2026-05-21 · conditional · novelty 6.0

A vanilla Diffusion Transformer trained via x-prediction on frozen DINOv2 features reaches FID 1.14 on ImageNet 256x256 with fewer parameters and faster sampling than prior DiT variants.

Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.

Multi-Scale Generative Modeling with Heat Dissipation Flow Matching

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.

WavFlow: Audio Generation in Waveform Space

cs.SD · 2026-05-18 · conditional · novelty 6.0

WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.

Improved Baselines with Representation Autoencoders

cs.CV · 2026-05-18 · conditional · novelty 6.0

RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.

Registers Matter for Pixel-Space Diffusion Transformers

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.

L2P: Unlocking Latent Potential for Pixel Generation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

L2P repurposes pre-trained LDMs for direct pixel generation via large-patch tokenization and shallow-layer training on synthetic data, matching source performance with 8-GPU training and enabling native 4K output.

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.

Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies

physics.ao-ph · 2026-05-12 · unverdicted · novelty 6.0

A multivariate diffusion generative downscaling method preserves inter-variable correlations in climate data under large resolution increases, enabling more accurate compound risk assessment.

ELF: Embedded Language Flows

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

A pixel-space Diffusion Transformer with Unified Transformer architecture unifies image generation, editing, and personalization in an end-to-end model that maps all inputs to a shared token space and scales from 8B to over 200B parameters.

Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.

FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.

citing papers explorer

Showing 50 of 64 citing papers.

Let EEG Models Learn EEG cs.CV · 2026-05-20 · unverdicted · none · ref 37 · internal anchor
JET is a conditional flow matching framework that generates EEG as continuous raw sequences with added constraints for spectral and temporal properties, achieving over 40% lower TS-FID than prior discrete denoising methods on three benchmarks.
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation cs.LG · 2026-05-20 · unverdicted · none · ref 66 · internal anchor
CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes cs.GR · 2026-05-19 · unverdicted · none · ref 22 · internal anchor
Proposes discretized Matérn process noise for triangulation-agnostic flow matching on meshes with PoissonNet denoiser, tested on elastic states and humanoid poses for meshes exceeding one million triangles.
Binomial flows: Denoising and flow matching for discrete ordinal data cs.LG · 2026-05-01 · unverdicted · none · ref 25 · internal anchor
Binomial flows close the gap between continuous flow matching and discrete ordinal data by using binomial distributions to enable unified denoising, sampling, and exact likelihoods in diffusion models.
Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement cs.CV · 2026-04-20 · unverdicted · none · ref 15 · internal anchor
A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.
Grokking of Diffusion Models: Case Study on Modular Addition cs.LG · 2026-04-20 · unverdicted · none · ref 13 · internal anchor
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
Coevolving Representations in Joint Image-Feature Diffusion cs.CV · 2026-04-19 · unverdicted · none · ref 25 · internal anchor
CoReDi coevolves semantic representations with the diffusion model via a jointly learned linear projection stabilized by stop-gradient, normalization, and regularization, yielding faster convergence and higher sample quality than fixed-representation baselines.
FARM: Foundational Aerial Radio Map for Intelligent Low-Altitude Networking eess.SP · 2026-04-19 · unverdicted · none · ref 33 · internal anchor
FARM is a foundation model combining masked autoencoders and diffusion decoders to estimate high-resolution aerial radio maps from a new multi-band low-altitude dataset, claiming superior accuracy and generalization over prior methods.
Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction cs.CV · 2026-04-06 · unverdicted · none · ref 26 · internal anchor
Free-Range Gaussians uses flow matching over Gaussian parameters to predict non-grid-aligned 3D Gaussians from multi-view images, enabling synthesis of plausible content in unobserved regions with fewer primitives than grid-aligned methods.
Latent Generative Solvers for Generalizable Long-Term Physics Simulation cs.AI · 2026-02-11 · unverdicted · none · ref 21 · internal anchor
LGS pretrained on 2.5M trajectories across 16 systems matches deterministic baselines at one step and halves 20-step error while using far less compute and adapting to held-out higher-resolution flows.
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion cs.CV · 2026-05-22 · unverdicted · none · ref 21 · internal anchor
PiD is a pixel diffusion decoder that performs latent-to-pixel conversion and 4-8x upsampling in one generative step, enabling early stopping of latent diffusion and achieving sub-second 2048x2048 decoding with claimed better fidelity than cascaded baselines.
RiT: Vanilla Diffusion Transformers Suffice in Representation Space cs.CV · 2026-05-21 · conditional · none · ref 18 · internal anchor
A vanilla Diffusion Transformer trained via x-prediction on frozen DINOv2 features reaches FID 1.14 on ImageNet 256x256 with fewer parameters and faster sampling than prior DiT variants.
Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis cs.CV · 2026-05-20 · unverdicted · none · ref 33 · internal anchor
Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.
Multi-Scale Generative Modeling with Heat Dissipation Flow Matching cs.CV · 2026-05-19 · unverdicted · none · ref 15 · internal anchor
HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.
WavFlow: Audio Generation in Waveform Space cs.SD · 2026-05-18 · conditional · none · ref 12 · internal anchor
WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.
Improved Baselines with Representation Autoencoders cs.CV · 2026-05-18 · conditional · none · ref 35 · internal anchor
RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.
Registers Matter for Pixel-Space Diffusion Transformers cs.CV · 2026-05-15 · unverdicted · none · ref 24 · internal anchor
Register tokens enhance pixel-space DiT training and output quality via cleaner high-noise feature maps, and a dual-stream design adds further gains with little overhead.
L2P: Unlocking Latent Potential for Pixel Generation cs.CV · 2026-05-12 · unverdicted · none · ref 14 · internal anchor
L2P repurposes pre-trained LDMs for direct pixel generation via large-patch tokenization and shallow-layer training on synthetic data, matching source performance with 8-GPU training and enabling native 4K output.
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion cs.CL · 2026-05-12 · unverdicted · none · ref 14 · internal anchor
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies physics.ao-ph · 2026-05-12 · unverdicted · none · ref 36 · internal anchor
A multivariate diffusion generative downscaling method preserves inter-variable correlations in climate data under large resolution increases, enabling more accurate compound risk assessment.
ELF: Embedded Language Flows cs.CL · 2026-05-11 · unverdicted · none · ref 32 · internal anchor
ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.
HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer cs.CV · 2026-05-11 · unverdicted · none · ref 28 · internal anchor
A pixel-space Diffusion Transformer with Unified Transformer architecture unifies image generation, editing, and personalization in an end-to-end model that maps all inputs to a shared token space and scales from 8B to over 200B parameters.
Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction cs.CV · 2026-05-11 · unverdicted · none · ref 16 · internal anchor
Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.
FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation cs.CV · 2026-05-07 · unverdicted · none · ref 21 · internal anchor
FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.
Taming Outlier Tokens in Diffusion Transformers cs.CV · 2026-05-06 · unverdicted · none · ref 16 · internal anchor
Outlier tokens in DiTs are addressed with Dual-Stage Registers, which reduce artifacts and improve image generation on ImageNet and text-to-image tasks.
A Few-Step Generative Model on Cumulative Flow Maps cs.LG · 2026-05-05 · unverdicted · none · ref 10 · internal anchor
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
High-Dimensional Noise to Low-Dimensional Manifolds: A Manifold-Space Diffusion Framework for Degraded Hyperspectral Image Classification cs.CV · 2026-04-29 · unverdicted · none · ref 15 · internal anchor
MSDiff maps degraded hyperspectral data to a low-dimensional manifold and uses diffusion to regularize features for more robust classification under complex degradations.
CoreFlow: Low-Rank Matrix Generative Models cs.LG · 2026-04-27 · unverdicted · none · ref 22 · internal anchor
CoreFlow is a low-rank matrix generative model that trains normalizing flows on shared subspaces to improve efficiency and quality for high-dimensional limited-sample data, including incomplete matrices.
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation cs.CV · 2026-04-27 · unverdicted · none · ref 22 · 2 links · internal anchor
Tuna-2 shows that direct pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive generation and stronger understanding at scale.
V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think cs.LG · 2026-04-25 · unverdicted · none · ref 20 · internal anchor
V-GRPO makes ELBO surrogates stable and efficient for online RL alignment of denoising models, delivering SOTA text-to-image performance with 2-3x speedups over MixGRPO and DiffusionNFT.
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models cs.LG · 2026-04-20 · unverdicted · none · ref 226 · internal anchor
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
VOLT: Volumetric Wide-Field Microscopy via 3D-Native Probabilistic Transport eess.IV · 2026-04-20 · unverdicted · none · ref 45 · internal anchor
VOLT is a probabilistic transport method with a 3D anisotropic network that improves wide-field microscopy volume reconstruction in lateral and axial directions while supplying voxel-wise credibility estimates.
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing cs.LG · 2026-04-17 · unverdicted · none · ref 22 · internal anchor
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
Generative Refinement Networks for Visual Synthesis cs.CV · 2026-04-14 · unverdicted · none · ref 32 · internal anchor
GRN uses hierarchical binary quantization and entropy-guided refinement to set new ImageNet records of 0.56 rFID for reconstruction and 1.81 gFID for class-conditional generation while releasing code and models.
FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving cs.RO · 2026-04-14 · unverdicted · none · ref 9 · internal anchor
FeaXDrive improves end-to-end autonomous driving by shifting diffusion planning to a trajectory-centric formulation with curvature-constrained training, drivable-area guidance, and GRPO post-training, yielding stronger closed-loop performance and feasibility on NAVSIM.
CoD-Lite: Real-Time Diffusion-Based Generative Image Compression cs.CV · 2026-04-14 · unverdicted · none · ref 10 · internal anchor
CoD-Lite delivers real-time generative image compression via a lightweight convolution-based diffusion codec with compression-oriented pre-training and distillation, achieving substantial bitrate savings.
Continuous Adversarial Flow Models cs.LG · 2026-04-13 · unverdicted · none · ref 37 · internal anchor
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-image benchmarks.
From Clues to Generation: Language-Guided Conditional Diffusion for Cross-Domain Recommendation cs.IR · 2026-04-07 · unverdicted · none · ref 28 · internal anchor
LGCD creates pseudo-overlapping user data via LLM reasoning and uses conditional diffusion to generate target-domain user representations for inter-domain sequential recommendation without real overlapping users.
FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation cs.CL · 2026-04-06 · unverdicted · none · ref 8 · internal anchor
FlowLM converts diffusion LMs to flow matching via fine-tuning, achieving few-step generation that rivals or beats 2000-step diffusion and saturates faster than training flow models from scratch.
ML-based approach to classification and generation of structured light propagation in turbulent media physics.optics · 2026-04-04 · unverdicted · none · ref 31 · internal anchor
ML models classify and generate structured light in turbulence using CNNs and diffusion models enhanced by Bregman distance minimization.
What Does Flow Matching Bring To TD Learning? cs.LG · 2026-03-04 · conditional · none · ref 33 · internal anchor
Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving cs.RO · 2026-02-26 · unverdicted · none · ref 27 · internal anchor
The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 42 · 2 links · internal anchor
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning cs.CV · 2026-02-11 · unverdicted · none · ref 26 · internal anchor
ABot-M0 unifies heterogeneous robot data into a 6-million-trajectory dataset and introduces Action Manifold Learning to predict stable actions on a low-dimensional manifold using a DiT backbone.
Protein Autoregressive Modeling via Multiscale Structure Generation cs.LG · 2026-02-04 · unverdicted · none · ref 29 · internal anchor
PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.
Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design cs.LG · 2026-02-04 · conditional · none · ref 12 · internal anchor
An ELBO-based likelihood estimator from the final generated sample dominates other RL design factors for diffusion models, raising GenEval from 0.24 to 0.95 in 90 GPU hours with better efficiency than prior methods.
PixelGen: Improving Pixel Diffusion with Perceptual Supervision cs.CV · 2026-02-02 · accept · none · ref 11 · internal anchor
PixelGen augments pixel diffusion with gated perceptual supervision to reach FID 5.11 on ImageNet-256 and GenEval 0.79 in text-to-image, narrowing the gap to latent methods without VAEs.
Sampling-Free Diffusion Transformers for Low-Complexity MIMO Channel Estimation eess.SP · 2026-02-02 · unverdicted · none · ref 14 · internal anchor
A diffusion transformer directly maps noisy MIMO channel observations to clean estimates in a single pass by exploiting angular sparsity, achieving better accuracy and much lower complexity than iterative diffusion baselines.
PixelDiT: Pixel Diffusion Transformers for Image Generation cs.CV · 2025-11-25 · conditional · none · ref 16 · internal anchor
PixelDiT generates images in pixel space with a dual-level transformer and reaches 1.61 FID on ImageNet 256, outperforming prior pixel-space models.
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent cs.LG · 2026-05-20 · unverdicted · none · ref 29 · 2 links · internal anchor
Stochastic MeanFlow Policies enable one-step generative control in off-policy mirror descent by mapping noise through a MeanFlow transform, yielding tractable entropy and improved MuJoCo performance over Gaussian and generative baselines.

Back to Basics: Let Denoising Generative Models Denoise

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer