Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, Saining Xie · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.

Vision Foundation Models as Generalist Tokenizers for Image Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.

Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.

FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.

Protein Autoregressive Modeling via Multiscale Structure Generation

cs.LG · 2026-02-04 · unverdicted · novelty 6.0

PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.

Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value

cs.LG · 2025-06-16 · conditional · novelty 6.0

Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.

citing papers explorer

Showing 6 of 6 citing papers.

Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis cs.CV · 2026-05-20 · unverdicted · none · ref 32
Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.
Vision Foundation Models as Generalist Tokenizers for Image Generation cs.CV · 2026-05-18 · unverdicted · none · ref 53
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers cs.CV · 2026-05-16 · unverdicted · none · ref 22
sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.
FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation cs.CV · 2026-05-07 · unverdicted · none · ref 4
FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.
Protein Autoregressive Modeling via Multiscale Structure Generation cs.LG · 2026-02-04 · unverdicted · none · ref 35
PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.
Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value cs.LG · 2025-06-16 · conditional · none · ref 36
Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

fields

years

verdicts

representative citing papers

citing papers explorer