hub

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter · 2017

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

browse 12 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

method 2 background 1 baseline 1

citation-polarity summary

use method 2 background 1 baseline 1

representative citing papers

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

cs.CV · 2025-09-29 · unverdicted · novelty 6.0 · 2 refs

Causal-Adapter adapts frozen diffusion backbones via structural causal modeling, prompt-aligned injection, and conditioned token contrastive loss to achieve faithful counterfactual generation with strong attribute control and identity preservation.

Guidance Watermarking for Diffusion Models

cs.CR · 2025-09-26 · unverdicted · novelty 6.0

Guidance watermarking steers diffusion denoising steps via gradients from an off-the-shelf watermark decoder to embed marks during generation, converting post-hoc schemes into in-generation ones while remaining complementary to VAE modifications.

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

cs.CV · 2025-07-10 · unverdicted · novelty 6.0

Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.

Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value

cs.LG · 2025-06-16 · conditional · novelty 6.0

Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.

2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching

cs.GR · 2025-06-03 · unverdicted · novelty 6.0

2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

cs.CV · 2024-05-14 · conditional · novelty 6.0

Hunyuan-DiT is a new multi-resolution diffusion transformer that achieves state-of-the-art Chinese text-to-image generation through custom architecture, data pipelines, and multimodal caption refinement.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

cs.CV · 2022-06-22 · unverdicted · novelty 6.0

Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

Lossless Anti-Distillation Sampling

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.

Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

cs.CV · 2025-05-14 · conditional · novelty 5.0

BLIP3-o uses a diffusion transformer to generate CLIP image features and a sequential pretraining strategy to build open models that perform strongly on both image understanding and generation benchmarks.

ModelScope Text-to-Video Technical Report

cs.CV · 2023-08-12 · unverdicted · novelty 4.0

ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

citing papers explorer

Showing 12 of 12 citing papers.

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation cs.CV · 2025-09-29 · unverdicted · none · ref 9 · 2 links
Causal-Adapter adapts frozen diffusion backbones via structural causal modeling, prompt-aligned injection, and conditioned token contrastive loss to achieve faithful counterfactual generation with strong attribute control and identity preservation.
Guidance Watermarking for Diffusion Models cs.CR · 2025-09-26 · unverdicted · none · ref 24
Guidance watermarking steers diffusion denoising steps via gradients from an off-the-shelf watermark decoder to embed marks during generation, converting post-hoc schemes into in-generation ones while remaining complementary to VAE modifications.
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling cs.CV · 2025-07-10 · unverdicted · none · ref 31
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.
Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value cs.LG · 2025-06-16 · conditional · none · ref 16
Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.
2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching cs.GR · 2025-06-03 · unverdicted · none · ref 16
2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.
Emu3: Next-Token Prediction is All You Need cs.CV · 2024-09-27 · unverdicted · none · ref 29
Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding cs.CV · 2024-05-14 · conditional · none · ref 14
Hunyuan-DiT is a new multi-resolution diffusion transformer that achieves state-of-the-art Chinese text-to-image generation through custom architecture, data pipelines, and multimodal caption refinement.
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation cs.CV · 2022-06-22 · unverdicted · none · ref 52
Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.
Lossless Anti-Distillation Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 85
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models cs.CV · 2026-05-07 · unverdicted · none · ref 24
Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset cs.CV · 2025-05-14 · conditional · none · ref 10
BLIP3-o uses a diffusion transformer to generate CLIP image features and a sequential pretraining strategy to build open models that perform strongly on both image understanding and generation benchmarks.
ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 15
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

Gans trained by a two time-scale update rule converge to a local nash equilibrium

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer