MiniCPM-o 4.5 uses the Omni-Flow streaming framework to deliver real-time full-duplex omni-modal interaction with proactive behavior in a 9B model that approaches Gemini 2.5 Flash performance.
Silero vad: pre-trained enterprise-grade voice activity detector (vad), number detector and language classifier.https://github.com/snakers4/silero-vad
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
UAF is the first unified audio front-end LLM that turns multiple front-end tasks into one sequence prediction model processing streaming audio chunks and reference prompts to output semantic and control tokens for full-duplex interaction.
citing papers explorer
-
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
MiniCPM-o 4.5 uses the Omni-Flow streaming framework to deliver real-time full-duplex omni-modal interaction with proactive behavior in a 9B model that approaches Gemini 2.5 Flash performance.
-
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
UAF is the first unified audio front-end LLM that turns multiple front-end tasks into one sequence prediction model processing streaming audio chunks and reference prompts to output semantic and control tokens for full-duplex interaction.