MoSA improves dynamic scene graph generation by fusing motion attributes with spatial features and aligning them cross-modally with relationship text embeddings, plus a weighted loss for rare classes, achieving top results on Action Genome.
Learning trans- ferable visual models from natural language supervision
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
AnimalBooth introduces an Animal Net, adaptive attention module, and frequency-controlled DCT feature integration to improve identity preservation and perceptual quality in personalized animal image generation, supported by a new high-resolution dataset AnimalBench.
Motion separation modules plus negative prompts improve CLIP-based zero-shot video action recognition on standard benchmarks.
citing papers explorer
-
MOSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation
MoSA improves dynamic scene graph generation by fusing motion attributes with spatial features and aligning them cross-modally with relationship text embeddings, plus a weighted loss for rare classes, achieving top results on Action Genome.
-
Animalbooth: multimodal feature enhancement for animal subject personalization
AnimalBooth introduces an Animal Net, adaptive attention module, and frequency-controlled DCT feature integration to improve identity preservation and perceptual quality in personalized animal image generation, supported by a new high-resolution dataset AnimalBench.
-
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
Motion separation modules plus negative prompts improve CLIP-based zero-shot video action recognition on standard benchmarks.