LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
hub
Pseudo numerical methods for diffusion models on manifolds
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
A platform using flow matching for real-world image generation and an adversarial policy creates challenging corner cases to evaluate end-to-end autonomous driving models like UniAD and VAD, showing performance degradation.
A diffusion framework decomposes images into intrinsic maps via an inverse renderer and renders controllable weather changes via a forward renderer with CLIP prompt interpolation and map-aware attention, outperforming pixel-space baselines on new 38k synthetic and 18k real datasets.
A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.
OmniPrism proposes a disentanglement method using a new paired dataset (PCD-200K), COD contrastive training, and block embeddings to inject separated concepts into diffusion models for multi-aspect image generation.
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.
A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with competitive quality.
AnimeAdapter is a pretrained lightweight adapter for Stable Diffusion that uses semantic-selective local attention from CLIP and pose-aware conditioning to enable zero-shot fine-grained consistent anime character generation from a single reference image.
I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harming generalization.
citing papers explorer
-
LAION-5B: An open large-scale dataset for training next generation image-text models
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
-
The two clocks and the innovation window: When and how generative models learn rules
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
-
A Few-Step Generative Model on Cumulative Flow Maps
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
-
Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving
A platform using flow matching for real-world image generation and an adversarial policy creates challenging corner cases to evaluate end-to-end autonomous driving models like UniAD and VAD, showing performance degradation.
-
IntrinsicWeather: Controllable Weather Editing in Intrinsic Space
A diffusion framework decomposes images into intrinsic maps via an inverse renderer and renders controllable weather changes via a forward renderer with CLIP prompt interpolation and map-aware attention, outperforming pixel-space baselines on new 38k synthetic and 18k real datasets.
-
Sampling-Aware Quantization for Diffusion Models
A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.
-
OmniPrism: Learning Disentangled Visual Concept for Image Generation
OmniPrism proposes a disentanglement method using a new paired dataset (PCD-200K), COD contrastive training, and block embeddings to inject separated concepts into diffusion models for multi-aspect image generation.
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
-
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.
-
Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation
A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with competitive quality.
-
AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation
AnimeAdapter is a pretrained lightweight adapter for Stable Diffusion that uses semantic-selective local attention from CLIP and pose-aware conditioning to enable zero-shot fine-grained consistent anime character generation from a single reference image.
-
Adaptive Forensic Feature Refinement via Intrinsic Importance Perception
I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harming generalization.
- Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations