V ox-profile: A speech foundation model benchmark for characterizing diverse speaker and speech traits

· 2025 · arXiv 2505.14648

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation

cs.SD · 2026-04-09 · unverdicted · novelty 7.0

CapTalk unifies single-utterance and dialogue voice design via utterance- and speaker-level captions plus a hierarchical variational module for stable timbre with adaptive expression.

Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models

cs.CL · 2025-12-29 · accept · novelty 7.0

Spoken language models exhibit style amnesia and fail to maintain instructed paralinguistic styles across multi-turn conversations, with explicit recall offering partial mitigation.

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

eess.AS · 2026-05-12 · unverdicted · novelty 6.0

A data pipeline, 14-dimension benchmark, and decoupled fine-tuning model are presented to advance fine-grained multi-dimensional speech understanding in LLMs.

Smiling Regulates Emotion During Traumatic Recollection

cs.MM · 2026-04-21 · unverdicted · novelty 6.0

Smiles during intense negative affect in Holocaust survivor testimonies improve emotional valence trajectories across audio, eye gaze, and text modalities while reducing eye dynamics.

Exploring Speech Foundation Models for Speaker Diarization Across Lifespan

eess.AS · 2026-04-06 · unverdicted · novelty 4.0 · 2 refs

Cross-lifespan evaluation shows adult-trained speech foundation models degrade on child and older-adult data, with joint multi-age training and targeted adaptation improving robustness especially using Whisper encoder.

citing papers explorer

Showing 5 of 5 citing papers.

CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation cs.SD · 2026-04-09 · unverdicted · none · ref 10
CapTalk unifies single-utterance and dialogue voice design via utterance- and speaker-level captions plus a hierarchical variational module for stable timbre with adaptive expression.
Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models cs.CL · 2025-12-29 · accept · none · ref 13
Spoken language models exhibit style amnesia and fail to maintain instructed paralinguistic styles across multi-turn conversations, with explicit recall offering partial mitigation.
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model eess.AS · 2026-05-12 · unverdicted · none · ref 22
A data pipeline, 14-dimension benchmark, and decoupled fine-tuning model are presented to advance fine-grained multi-dimensional speech understanding in LLMs.
Smiling Regulates Emotion During Traumatic Recollection cs.MM · 2026-04-21 · unverdicted · none · ref 22
Smiles during intense negative affect in Holocaust survivor testimonies improve emotional valence trajectories across audio, eye gaze, and text modalities while reducing eye dynamics.
Exploring Speech Foundation Models for Speaker Diarization Across Lifespan eess.AS · 2026-04-06 · unverdicted · none · ref 24 · 2 links
Cross-lifespan evaluation shows adult-trained speech foundation models degrade on child and older-adult data, with joint multi-age training and targeted adaptation improving robustness especially using Whisper encoder.

V ox-profile: A speech foundation model benchmark for characterizing diverse speaker and speech traits

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer