LightAVSeg decouples semantic filtering and spatial grounding to achieve linear-cost cross-modal interaction in audio-visual segmentation, reaching 50.4 mIoU on MS3 with 20.5M parameters as a new lightweight state-of-the-art.
Artificial intelligence and statistics , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
citing papers explorer
-
LightAVSeg: Lightweight Audio-Visual Segmentation
LightAVSeg decouples semantic filtering and spatial grounding to achieve linear-cost cross-modal interaction in audio-visual segmentation, reaching 50.4 mIoU on MS3 with 20.5M parameters as a new lightweight state-of-the-art.
-
Temporal Aware Pruning for Efficient Diffusion-based Video Generation
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.