Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning,

· 2025 · arXiv 2507.02915

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Frequency-Aware Self-Supervised Music Representation Learning

cs.SD · 2026-06-24 · unverdicted · novelty 6.0

PupuJEPA applies a visual JEPA framework to 2D spectrograms with music-specific adaptations and outperforms 1D SSL models on the MARBLE benchmark for multiple MIR tasks.

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

cs.SD · 2026-06-01 · unverdicted · novelty 5.0

A single ViT encoder with JEPA pretraining and staged specialization performs speaker diarization, phonetic encoding, and dynamic source separation in a shared latent space, reporting 15% DER and high separation accuracy on synthetic VoxCeleb2 mixtures.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Frequency-Aware Self-Supervised Music Representation Learning cs.SD · 2026-06-24 · unverdicted · none · ref 32
PupuJEPA applies a visual JEPA framework to 2D spectrograms with music-specific adaptations and outperforms 1D SSL models on the MARBLE benchmark for multiple MIR tasks.
Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space cs.SD · 2026-06-01 · unverdicted · none · ref 9
A single ViT encoder with JEPA pretraining and staged specialization performs speaker diarization, phonetic encoding, and dynamic source separation in a shared latent space, reporting 15% DER and high separation accuracy on synthetic VoxCeleb2 mixtures.

Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning,

fields

years

verdicts

representative citing papers

citing papers explorer