Text distillation from BioCLIP-2 into BioLingual creates audio-image alignment for bird species retrieval without any audio-image training pairs.
The iNaturalist Sounds Dataset,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Compact binary hypercube embeddings enable efficient text-to-image and text-to-audio retrieval in wildlife databases with performance competitive to continuous embeddings but far lower memory and search costs.
citing papers explorer
-
Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation
Text distillation from BioCLIP-2 into BioLingual creates audio-image alignment for bird species retrieval without any audio-image training pairs.
-
Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval
Compact binary hypercube embeddings enable efficient text-to-image and text-to-audio retrieval in wildlife databases with performance competitive to continuous embeddings but far lower memory and search costs.