Speech-based age and gender prediction with transformers,

· 2023 · arXiv 2306.16962

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SpeakerCard-1M: An Evidence-Grounded Corpus for In-the-Wild Speaker Verification

eess.AS · 2026-06-02 · unverdicted · novelty 6.0

SpeakerCard-1M supplies 56.7k evidence-grounded speaker cards, 1.78M captions, and new cross-modal protocols showing audio LMs lag a dual-encoder baseline on attribute-conditioned verification while joint training barely hurts standard EER.

Privacy-preserving Prosody Representation Learning

eess.AS · 2026-05-29 · unverdicted · novelty 5.0

A self-supervised prosody encoder with speaker disentanglement strategies outperforms raw prosody and HuBERT baselines on pitch reconstruction and prosodic event detection while achieving strong speaker separation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

SpeakerCard-1M: An Evidence-Grounded Corpus for In-the-Wild Speaker Verification eess.AS · 2026-06-02 · unverdicted · none · ref 26
SpeakerCard-1M supplies 56.7k evidence-grounded speaker cards, 1.78M captions, and new cross-modal protocols showing audio LMs lag a dual-encoder baseline on attribute-conditioned verification while joint training barely hurts standard EER.
Privacy-preserving Prosody Representation Learning eess.AS · 2026-05-29 · unverdicted · none · ref 54
A self-supervised prosody encoder with speaker disentanglement strategies outperforms raw prosody and HuBERT baselines on pitch reconstruction and prosodic event detection while achieving strong speaker separation.

Speech-based age and gender prediction with transformers,

fields

years

verdicts

representative citing papers

citing papers explorer