pith. sign in

hub

arXiv preprint arXiv:2105.01051 , year=

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 11 2025 3

roles

background 1

polarities

background 1

representative citing papers

AudioMosaic: Contrastive Masked Audio Representation Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.

Alethia: A Foundational Encoder for Voice Deepfakes

cs.SD · 2026-04-30 · unverdicted · novelty 6.0

Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.

StressTest: Can YOUR Speech LM Handle the Stress?

cs.CL · 2025-05-28 · conditional · novelty 6.0

Speech language models fail at reasoning about sentence stress but improve after fine-tuning on a new 17k-example synthetic dataset that varies stress to alter meaning.

SIGMA: Saliency-Guided Sparse Mask Attacks for Speech Emotion Recognition

cs.SD · 2026-06-29 · unverdicted · novelty 5.0

SIGMA applies post-hoc XAI saliency maps to define reusable sparse masks for magnitude-bounded perturbations on self-supervised speech features, evaluated on IEMOCAP and TESS for competitive attack success with explanation consistency trade-offs.

AVEX: What Matters for Animal Vocalization Encoding

cs.SD · 2025-08-15 · unverdicted · novelty 5.0

Large empirical study finds self-supervised pre-training then supervised post-training on mixed bioacoustics and general audio data produces the strongest encoders across 26 datasets for species classification, detection, individual ID and repertoire discovery.

MOSS-Audio Technical Report

cs.SD · 2026-06-01 · unverdicted · novelty 4.0

MOSS-Audio is an audio-language model using a 12.5 Hz encoder, DeepStack cross-layer injection, time markers, and an event-preserving annotation pipeline for unified audio understanding.

citing papers explorer

Showing 14 of 14 citing papers.