Ferret introduces a hybrid region representation and the GRIT dataset to let MLLMs refer to and ground arbitrary image regions, outperforming prior models on referring, grounding, and localization-aware chatting while reducing object hallucination.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2023 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret introduces a hybrid region representation and the GRIT dataset to let MLLMs refer to and ground arbitrary image regions, outperforming prior models on referring, grounding, and localization-aware chatting while reducing object hallucination.