Title resolution pending

Scalable Diffusion Models with Transformers , author= · 2023

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

cs.CV · 2026-05-17 · unverdicted · novelty 7.0

HL-OutPaint enables high-resolution outpainting of long video sequences via a coarse-to-fine pipeline that first builds Global Coarse Guidance through global-local frame swapping then synthesizes details.

Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

DDE introduces a compact coordinator network that combines denoised outputs from pre-trained diffusion models to enable generation in larger domains and complex conditioning settings.

Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Text embeddings in MM-DiTs encode a detectable omission signal for missing concepts; amplifying it via OSI reduces concept omission in text-to-image outputs on FLUX.1-Dev and SD3.5-Medium.

Spherical Flows for Sampling Categorical Data

stat.ML · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Spherical flows on S^{d-1} with vMF noise reduce the continuity equation to a scalar ODE in cosine similarity, yielding posterior-weighted marginal velocity and score that enable ODE and predictor-corrector sampling for categorical sequences, with the posterior trained by cross-entropy and empirical

Scaling Categorical Flow Maps

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

cs.CV · 2025-02-14 · unverdicted · novelty 4.0

Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.

citing papers explorer

Showing 6 of 6 citing papers after filters.

HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos cs.CV · 2026-05-17 · unverdicted · none · ref 32
HL-OutPaint enables high-resolution outpainting of long video sequences via a coarse-to-fine pipeline that first builds Global Coarse Guidance through global-local frame swapping then synthesizes details.
Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models cs.LG · 2026-05-22 · unverdicted · none · ref 61
DDE introduces a compact coordinator network that combines denoised outputs from pre-trained diffusion models to enable generation in larger domains and complex conditioning settings.
Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers cs.CV · 2026-05-14 · unverdicted · none · ref 8
Text embeddings in MM-DiTs encode a detectable omission signal for missing concepts; amplifying it via OSI reduces concept omission in text-to-image outputs on FLUX.1-Dev and SD3.5-Medium.
Spherical Flows for Sampling Categorical Data stat.ML · 2026-05-07 · unverdicted · none · ref 62 · 2 links
Spherical flows on S^{d-1} with vMF noise reduce the continuity equation to a scalar ODE in cosine similarity, yielding posterior-weighted marginal velocity and score that enable ODE and predictor-corrector sampling for categorical sequences, with the posterior trained by cross-entropy and empirical
Scaling Categorical Flow Maps cs.LG · 2026-05-08 · unverdicted · none · ref 27
Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 288
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer