InstAP introduces instance-aware pre-training with a new dual-granularity dataset InstVL that improves both fine-grained instance retrieval and global video understanding over standard VLP baselines.
Oscar: Object-semantics aligned pre-training for vision-language tasks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
DBAC is a new directional metric for bias amplification in image captions that is less sensitive to sentence encoders and more accurate than LIC, validated on COCO gender and race attributes.
citing papers explorer
-
InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
InstAP introduces instance-aware pre-training with a new dual-granularity dataset InstVL that improves both fine-grained instance retrieval and global video understanding over standard VLP baselines.
-
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
DBAC is a new directional metric for bias amplification in image captions that is less sensitive to sentence encoders and more accurate than LIC, validated on COCO gender and race attributes.