Photorealistic text-to- image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al · 2022

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

cs.CR · 2026-05-19 · conditional · novelty 7.0

ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.

DCR: Counterfactual Attractor Guidance for Rare Compositional Generation

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

DCR uses a counterfactual attractor and projection-based repulsion to suppress default completion bias in diffusion models, improving fidelity for rare compositional prompts while preserving quality.

Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

SafeMark integrates a thresholded watermark-decoding loss into diffusion editors to enable text-guided edits that preserve embedded watermarks with high bit accuracy.

One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.

MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift

cs.CV · 2025-05-26 · unverdicted · novelty 5.0

Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.

citing papers explorer

Showing 7 of 7 citing papers.

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models cs.CR · 2026-05-19 · conditional · none · ref 56
ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.
DCR: Counterfactual Attractor Guidance for Rare Compositional Generation cs.CV · 2026-05-07 · unverdicted · none · ref 31
DCR uses a counterfactual attractor and projection-based repulsion to suppress default completion bias in diffusion models, improving fidelity for rare compositional prompts while preserving quality.
Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing cs.CV · 2026-05-19 · unverdicted · none · ref 2
SafeMark integrates a thresholded watermark-decoding loss into diffusion editors to enable text-guided edits that preserve embedded watermarks with high bit accuracy.
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration cs.CV · 2026-05-20 · unverdicted · none · ref 40
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset cs.CV · 2026-05-20 · unverdicted · none · ref 79
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 102
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift cs.CV · 2025-05-26 · unverdicted · none · ref 9
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.

Photorealistic text-to- image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer