A LoRA-fine-tuned Florence-2 model extracts structured fashion attributes from images, achieving 94.6% category accuracy and 63.0% material accuracy on a held-out test set, outperforming GPT-4o-mini and Gemini 2.5 Flash while producing valid JSON 99.8% of the time.
FashionVLP : Vision language transformer for fashion retrieval with feedback
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
A LoRA-fine-tuned Florence-2 model extracts structured fashion attributes from images, achieving 94.6% category accuracy and 63.0% material accuracy on a held-out test set, outperforming GPT-4o-mini and Gemini 2.5 Flash while producing valid JSON 99.8% of the time.