BDATP enhances generalization in audio-visual navigation by explicitly modeling interaural differences and using auxiliary action prediction, achieving up to 21.6 percentage point gains in success rate on unheard sounds in Replica dataset.
Advancing audio- visual navigation through multi-agent collaboration in 3d environments
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SD 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
RAVN improves audio-visual navigation by learning audio-derived reliability cues via an Acoustic Geometry Reasoner and using them to modulate visual features through Reliability-Aware Geometric Modulation.
SACF discretizes target direction and distance from audio-visual cues then applies conditioned fusion to improve navigation efficiency and generalization to unheard sounds.
Audio Spatially-Guided Fusion improves generalization in audio-visual navigation on unheard sound sources by extracting spatial audio features and adaptively fusing them with visual data.
citing papers explorer
-
Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction
BDATP enhances generalization in audio-visual navigation by explicitly modeling interaural differences and using auxiliary action prediction, achieving up to 21.6 percentage point gains in success rate on unheard sounds in Replica dataset.
-
Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation
RAVN improves audio-visual navigation by learning audio-derived reliability cues via an Acoustic Geometry Reasoner and using them to modulate visual features through Reliability-Aware Geometric Modulation.
-
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation
SACF discretizes target direction and distance from audio-visual cues then applies conditioned fusion to improve navigation efficiency and generalization to unheard sounds.
-
Audio Spatially-Guided Fusion for Audio-Visual Navigation
Audio Spatially-Guided Fusion improves generalization in audio-visual navigation on unheard sound sources by extracting spatial audio features and adaptively fusing them with visual data.