MAST is a mask-guided attention allocation method that enables artifact-free multi-style transfer in diffusion models by anchoring layout, distributing attention mass, scaling sharpness, and injecting details.
A learned representation for artistic style
6 Pith papers cite this work. Polarity classification is still indexing.
abstract
The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.
citation-role summary
citation-polarity summary
representative citing papers
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
CanonCGT introduces a canonical pivot representation and dual-phase training (DP-CGT) for stable, photorealistic reference-based color grading that outperforms prior methods in consistency.
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
HP-VSR-ResFiLM adds a single residual FiLM modulation block conditioned on head pose to a CNN visual encoder, yielding WER of 25.0% on LRS2 and 33.2% on LRS3 under standard training conditions.
The book presents principles from optimization and information theory to explain deep network architectures and enable new interpretable models.
citing papers explorer
-
MAST: Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer
MAST is a mask-guided attention allocation method that enables artifact-free multi-style transfer in diffusion models by anchoring layout, distributing attention mass, scaling sharpness, and injecting details.
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
CanonCGT: Reference-Based Color Grading via Canonical Pivot Representation
CanonCGT introduces a canonical pivot representation and dual-phase training (DP-CGT) for stable, photorealistic reference-based color grading that outperforms prior methods in consistency.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
-
Head-Pose-Aware Visual Speech Recognition with FiLM Modulation
HP-VSR-ResFiLM adds a single residual FiLM modulation block conditioned on head pose to a CNN visual encoder, yielding WER of 25.0% on LRS2 and 33.2% on LRS3 under standard training conditions.
-
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory
The book presents principles from optimization and information theory to explain deep network architectures and enable new interpretable models.