Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
arXiv preprint arXiv:2210.04885 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9representative citing papers
AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
DiTs use either a two-stage cross-attention circuit or text-token fusion circuit for spatial relations depending on the text encoder, achieving near-perfect in-domain accuracy but differing out-of-domain robustness.
Introduces a fairness layer for deep learning models that guarantees output parity and an online primal-dual algorithm for aggregate fairness guarantees in streaming predictions with small batch sizes.
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
TaleDiffusion introduces an iterative framework using LLM-generated per-frame descriptions, bounded attention-based per-box masks, identity-consistent self-attention, region-aware cross-attention, and CLIPSeg-based dialogue rendering to produce consistent multi-character story visualizations.
SpatialBalancing is a system that turns revision trade-offs into spatial navigation so writers can iteratively balance scientific exposition and narrative engagement with LLM assistance.
DiT-ST converts complete-text captions into split-text primitives via LLMs and injects them hierarchically across denoising stages to reduce semantic confusion in DiT-based text-to-image generation.
Selective aggregation of cross-attention maps from the most relevant heads in diffusion-based T2I models yields higher mean IoU for visual interpretation than standard aggregation methods like DAAM.
citing papers explorer
-
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
-
AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
-
Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers
DiTs use either a two-stage cross-attention circuit or text-token fusion circuit for spatial relations depending on the text encoder, achieving near-perfect in-domain accuracy but differing out-of-domain robustness.
-
Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning
Introduces a fairness layer for deep learning models that guarantees output parity and an online primal-dual algorithm for aggregate fairness guarantees in streaming predictions with small batch sizes.
-
The two clocks and the innovation window: When and how generative models learn rules
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
-
TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering
TaleDiffusion introduces an iterative framework using LLM-generated per-frame descriptions, bounded attention-based per-box masks, identity-consistent self-attention, region-aware cross-attention, and CLIPSeg-based dialogue rendering to produce consistent multi-character story visualizations.
-
Spatial Balancing: Harnessing Spatial Reasoning to Balance Scientific Exposition and Narrative Engagement in LLM-assisted Science Communication Writing
SpatialBalancing is a system that turns revision trade-offs into spatial navigation so writers can iteratively balance scientific exposition and narrative engagement with LLM assistance.
-
Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning
DiT-ST converts complete-text captions into split-text primitives via LLMs and injects them hierarchically across denoising stages to reduce semantic confusion in DiT-based text-to-image generation.
-
Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation
Selective aggregation of cross-attention maps from the most relevant heads in diffusion-based T2I models yields higher mean IoU for visual interpretation than standard aggregation methods like DAAM.