← back to paper
arxiv: 2605.11572 · 2 revisions
TB-AVA: Text as a Semantic Bridge for Audio-Visual Parameter Efficient Finetuning