Wavjepa: Semantic learning unlocks robust audio foundation models for raw waveforms.arXiv preprint arXiv:2509.23238, 2025

Goksenin Yuksel, Pierre Guetschel, Michael Tangermann, Marcel van Gerven, Kiki van der Heijden · 2025 · arXiv 2509.23238

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

SpeechDx is a multi-task benchmark with 12 datasets and 27 tasks across health conditions, structured by conceptualization, formulation, and articulation stages, showing that no current audio encoder generalizes reliably.

Probing Spatial Structure in Pretrained Audio Representations

cs.SD · 2026-06-04 · unverdicted · novelty 7.0

Introduces SARL benchmark showing pretrained audio encoders encode source-level spatial factors more readily than room-level factors, with patterns shaped by input configuration and training paradigm.

OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL

cs.CL · 2026-06-29 · unverdicted · novelty 4.0

OLIVE is a new self-supervised speech representation framework that unifies view-augmented masked latent prediction with waveform reconstruction under one objective.

citing papers explorer

Showing 3 of 3 citing papers after filters.

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI cs.AI · 2026-06-15 · unverdicted · none · ref 84
SpeechDx is a multi-task benchmark with 12 datasets and 27 tasks across health conditions, structured by conceptualization, formulation, and articulation stages, showing that no current audio encoder generalizes reliably.
Probing Spatial Structure in Pretrained Audio Representations cs.SD · 2026-06-04 · unverdicted · none · ref 25
Introduces SARL benchmark showing pretrained audio encoders encode source-level spatial factors more readily than room-level factors, with patterns shaped by input configuration and training paradigm.
OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL cs.CL · 2026-06-29 · unverdicted · none · ref 44
OLIVE is a new self-supervised speech representation framework that unifies view-augmented masked latent prediction with waveform reconstruction under one objective.

Wavjepa: Semantic learning unlocks robust audio foundation models for raw waveforms.arXiv preprint arXiv:2509.23238, 2025

fields

years

verdicts

representative citing papers

citing papers explorer