hub

Scalable diffusion models with transformers

· 2023

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

browse 18 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling

cs.CV · 2026-05-19 · conditional · novelty 7.0

MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.

4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segmentation, flow prediction, and motion forecasting.

Latent Space Probing for Adult Content Detection in Video Generative Models

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.

One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

DiffTSP applies discrete diffusion to knowledge graph triple set prediction, recovering all missing triples simultaneously via edge-masking noise reversal and a structure-aware transformer, achieving SOTA on three datasets.

TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics

cs.AI · 2026-04-20 · conditional · novelty 7.0

TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.

SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning

cs.LG · 2026-03-20 · unverdicted · novelty 7.0

SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.

MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies

cs.RO · 2025-09-17 · unverdicted · novelty 7.0

MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.

ShardTensor: Domain Parallelism for Scientific Machine Learning

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.

A Cold Diffusion Approach for Percussive Dereverberation

cs.SD · 2026-05-11 · unverdicted · novelty 6.0

A cold diffusion model with direct and delta-normalized reverse processes, using UNet and transformer backbones, outperforms diffusion baselines for dereverberating acoustic and electronic drum stems on in-domain and out-of-domain tests.

From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.

Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation

eess.AS · 2026-04-21 · unverdicted · novelty 6.0

Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.

Disentangled Point Diffusion for Precise Object Placement

cs.RO · 2026-04-13 · unverdicted · novelty 6.0

TAX-DPD combines a feed-forward dense GMM for global placement priors with disentangled point cloud diffusion for local geometry and pose to achieve precise robotic object placement.

A Noise Constrained Diffusion (NC-Diffusion) Framework for High Fidelity Image Compression

eess.IV · 2026-04-08 · unverdicted · novelty 6.0

NC-Diffusion matches quantization noise to the diffusion forward process, adds an adaptive frequency filter and zero-shot enhancement, and reports superior fidelity on benchmarks.

Primitive-based Truncated Diffusion for Efficient Trajectory Generation of Differential Drive Mobile Manipulators

cs.RO · 2026-04-05 · unverdicted · novelty 6.0

A primitive-based truncated diffusion model with keypoint attention encoding generates more efficient and diverse trajectories for mobile manipulators than vanilla diffusion in cluttered 3D simulations.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

Flow-Opt: Scalable Centralized Multi-Robot Trajectory Optimization with Flow Matching and Differentiable Optimization

cs.RO · 2025-10-10 · unverdicted · novelty 6.0

Flow-Opt combines a flow-matching DiT model with a custom differentiable safety filter and learned initialization to enable fast centralized trajectory optimization for tens of robots.

RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

RF-HiT uses rectified flow and a multi-scale hierarchical transformer to reach 91.27% Dice on ACDC and 87.40% on BraTS 2021 with only 10.14 GFLOPs, 13.6M parameters, and three inference steps.

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.

citing papers explorer

Showing 18 of 18 citing papers.

MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling cs.CV · 2026-05-19 · conditional · none · ref 34
MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving cs.RO · 2026-05-18 · unverdicted · none · ref 40
4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segmentation, flow prediction, and motion forecasting.
Latent Space Probing for Adult Content Detection in Video Generative Models cs.CV · 2026-04-25 · unverdicted · none · ref 44
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction cs.AI · 2026-04-20 · unverdicted · none · ref 41
DiffTSP applies discrete diffusion to knowledge graph triple set prediction, recovering all missing triples simultaneously via edge-masking noise reversal and a structure-aware transformer, achieving SOTA on three datasets.
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics cs.AI · 2026-04-20 · conditional · none · ref 21
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning cs.LG · 2026-03-20 · unverdicted · none · ref 8
SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.
MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies cs.RO · 2025-09-17 · unverdicted · none · ref 30
MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.
ShardTensor: Domain Parallelism for Scientific Machine Learning cs.DC · 2026-05-11 · unverdicted · none · ref 79
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
A Cold Diffusion Approach for Percussive Dereverberation cs.SD · 2026-05-11 · unverdicted · none · ref 22
A cold diffusion model with direct and delta-normalized reverse processes, using UNet and transformer backbones, outperforms diffusion baselines for dereverberating acoustic and electronic drum stems on in-domain and out-of-domain tests.
From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data cs.CV · 2026-05-08 · unverdicted · none · ref 32
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation eess.AS · 2026-04-21 · unverdicted · none · ref 40
Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.
Disentangled Point Diffusion for Precise Object Placement cs.RO · 2026-04-13 · unverdicted · none · ref 43
TAX-DPD combines a feed-forward dense GMM for global placement priors with disentangled point cloud diffusion for local geometry and pose to achieve precise robotic object placement.
A Noise Constrained Diffusion (NC-Diffusion) Framework for High Fidelity Image Compression eess.IV · 2026-04-08 · unverdicted · none · ref 61
NC-Diffusion matches quantization noise to the diffusion forward process, adds an adaptive frequency filter and zero-shot enhancement, and reports superior fidelity on benchmarks.
Primitive-based Truncated Diffusion for Efficient Trajectory Generation of Differential Drive Mobile Manipulators cs.RO · 2026-04-05 · unverdicted · none · ref 31
A primitive-based truncated diffusion model with keypoint attention encoding generates more efficient and diverse trajectories for mobile manipulators than vanilla diffusion in cluttered 3D simulations.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 15
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Flow-Opt: Scalable Centralized Multi-Robot Trajectory Optimization with Flow Matching and Differentiable Optimization cs.RO · 2025-10-10 · unverdicted · none · ref 15
Flow-Opt combines a flow-matching DiT model with a custom differentiable safety filter and learned initialization to enable fast centralized trajectory optimization for tens of robots.
RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation cs.CV · 2026-04-21 · unverdicted · none · ref 15
RF-HiT uses rectified flow and a multi-scale hierarchical transformer to reach 91.27% Dice on ACDC and 87.40% on BraTS 2021 with only 10.14 GFLOPs, 13.6M parameters, and three inference steps.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation cs.CV · 2026-04-29 · unverdicted · none · ref 73
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.

Scalable diffusion models with transformers

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer