LAT-Audio introduces a global-to-local reasoning approach with TWA-CoT that outperforms prior models on temporal tasks for audio up to 30 minutes.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
DFAlign uses diffusion-based denoising to generate foreground knowledge prompts that improve cross-modal alignment for detecting unseen actions in untrimmed videos, reporting state-of-the-art results on OV-TAD benchmarks.
citing papers explorer
-
Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding
LAT-Audio introduces a global-to-local reasoning approach with TWA-CoT that outperforms prior models on temporal tasks for audio up to 30 minutes.
-
Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection
DFAlign uses diffusion-based denoising to generate foreground knowledge prompts that improve cross-modal alignment for detecting unseen actions in untrimmed videos, reporting state-of-the-art results on OV-TAD benchmarks.