STAR improves 1-shot action recognition by up to 8.1% on SSv2-Full through semantic-temporal alignment and Mamba-based prototype refinement.
Learning transferable visual models from natural language supervision,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2representative citing papers
A text-semantics-guided multimodal framework with geometry-aware mapping and object-conditioned text adaptation achieves state-of-the-art unsupervised anomaly detection and localization on RGB-3D industrial datasets while enabling a single model for multiple classes.
citing papers explorer
-
STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition
STAR improves 1-shot action recognition by up to 8.1% on SSv2-Full through semantic-temporal alignment and Mamba-based prototype refinement.
-
Text-Guided Multimodal Unified Industrial Anomaly Detection
A text-semantics-guided multimodal framework with geometry-aware mapping and object-conditioned text adaptation achieves state-of-the-art unsupervised anomaly detection and localization on RGB-3D industrial datasets while enabling a single model for multiple classes.