GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
Advances in neural information processing systems35, 36479–36494 (2022)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.
MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
The paper presents the first generative photomosaic framework that synthesizes tiles via structure-aligned diffusion models and few-shot personalization instead of color-based matching from large tile collections.
citing papers explorer
-
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
-
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation
DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.
-
MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation
MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
-
FluSplat: Sparse-View 3D Editing without Test-Time Optimization
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
-
Generative Phomosaic with Structure-Aligned and Personalized Diffusion
The paper presents the first generative photomosaic framework that synthesizes tiles via structure-aligned diffusion models and few-shot personalization instead of color-based matching from large tile collections.