PATSE is a DOA-guided target speaker extraction system that produces speaker-attributed streams for diarization-free ASR in multi-party conversations.
Tiger: Time-frequency in- terleaved gain extraction and reconstruction for efficient speech separation,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
TF-MoE uses dynamic per-frame and per-mel-band expert selection in time and frequency dimensions to improve speech separation performance at comparable compute cost to prior models.
citing papers explorer
-
Position-Aware Target Speaker Extraction for Long-Form Multi-Party Conversations: A Diarization-Free Framework for ASR
PATSE is a DOA-guided target speaker extraction system that produces speaker-attributed streams for diarization-free ASR in multi-party conversations.
-
TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech Separation
TF-MoE uses dynamic per-frame and per-mel-band expert selection in time and frequency dimensions to improve speech separation performance at comparable compute cost to prior models.