Decomposed Vision-Language Alignment framework factorizes prompts into concept and attribute tokens with Feature-Gated Cross-Attention for better compositional generalization in fine-grained open-vocabulary segmentation.
PAMI41(9), 2251–2265 (2019).https://doi
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it