Gligen: Open-set grounded text-to-image generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control

cs.CV · 2026-03-15 · unverdicted · novelty 7.0

ChArtist generates pictorial charts via a Diffusion Transformer using skeleton-based spatial control and reference-image subject control, supported by a new 30,000-triplet dataset and data accuracy metric.

Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

cs.GR · 2026-03-25 · conditional · novelty 6.0

Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

cs.CV · 2025-11-21 · unverdicted · novelty 6.0

Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.

MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation

cs.CV · 2025-08-29 · unverdicted · novelty 5.0

MedShift applies flow matching and Schrödinger bridges for class-conditional unpaired translation between synthetic and real skull X-rays, benchmarked on the new X-DigiSkull dataset.

citing papers explorer

Showing 4 of 4 citing papers.

ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control cs.CV · 2026-03-15 · unverdicted · none · ref 26
ChArtist generates pictorial charts via a Diffusion Transformer using skeleton-based spatial control and reference-image subject control, supported by a new 30,000-triplet dataset and data accuracy metric.
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning cs.GR · 2026-03-25 · conditional · none · ref 17
Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.
Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation cs.CV · 2025-11-21 · unverdicted · none · ref 18
Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.
MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation cs.CV · 2025-08-29 · unverdicted · none · ref 13
MedShift applies flow matching and Schrödinger bridges for class-conditional unpaired translation between synthetic and real skull X-rays, benchmarked on the new X-DigiSkull dataset.

Gligen: Open-set grounded text-to-image generation

fields

years

verdicts

representative citing papers

citing papers explorer