pith. sign in

Probing Spatial Structure in Pretrained Audio Representations

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Pretrained spatial audio encoders are increasingly used as general-purpose representations for perceptual tasks, yet their spatial encoding capabilities remain poorly understood. We introduce the Spatial Audio Representation Learning (SARL) benchmark, a controlled framework for evaluating spatial information in pretrained audio models. SARL probes source-level factors (azimuth, elevation, distance, class) and room-level factors (RT60, volume, shape). Experiments across diverse encoders reveal three patterns: input configuration and training paradigm shape spatial encoding; source factors are consistently easier to decode than room factors; and sensitivity analysis under controlled perturbations shows heterogeneous responses to source and room variation. These results reveal systematic biases in current pretrained audio representations. SARL is released as an open-source benchmark for reproducible evaluation of spatial audio representations.

fields

cs.SD 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Probing Spatial Structure in Pretrained Audio Representations

cs.SD · 2026-06-04 · unverdicted · novelty 7.0

Introduces SARL benchmark showing pretrained audio encoders encode source-level spatial factors more readily than room-level factors, with patterns shaped by input configuration and training paradigm.

citing papers explorer

Showing 1 of 1 citing paper.

  • Probing Spatial Structure in Pretrained Audio Representations cs.SD · 2026-06-04 · unverdicted · none · ref 2 · internal anchor

    Introduces SARL benchmark showing pretrained audio encoders encode source-level spatial factors more readily than room-level factors, with patterns shaped by input configuration and training paradigm.