MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.
Flickr30k entities: Collecting region-to- phrase correspondences for richer image-to-sentence models,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.