archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 12

cs.CL 2019-07-09 reviewed

Grounding models yield phonetic features for speech recognition
Transfer Learning from Audio-Visual Grounding to Speech Recognition

Wei-Ning Hsu +2
cs.IR 2019-07-09 reviewed

Attention improves accuracy for all 20 instruments on OpenMIC
An Attention Mechanism for Musical Instrument Recognition

Siddharth Gururani +2
eess.AS 2019-07-09 reviewed

Domain teachers train one student model to cut ASR errors by 10.4%
Teach an all-rounder with experts in different domains

Zhao You +2
cs.CL 2019-07-09 reviewed

Joint model cuts speaker diarization error to 2.2%
Joint Speech Recognition and Speaker Diarization via Sequence Transduction

Laurent El Shafey +2
eess.AS 2019-07-08 reviewed

Seq2seq ASR cuts WER 25% with speaker adaptation
Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Felix Weninger +3
eess.AS 2019-07-08 reviewed

Cohort pruning enables private score normalisation in speaker recognition
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation

Andreas Nautsch +7
cs.CL 2019-07-07 reviewed

Adversarial method cuts speech recognition errors 5-14 percent
NIESR: Nuisance Invariant End-to-end Speech Recognition

I-Hung Hsu +2
cs.CV 2019-07-06 reviewed

Shared-layer DNN beats early and late fusion on emotion CCC
Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

Juan D. S. Ortega +5
eess.AS 2019-07-06 reviewed

Autoencoder codebook lifts audio emotion prediction scores
Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction

Mohammed Senoussaoui +2
cs.LG 2019-07-06 reviewed

Activation maximization yields class-specific speech from DNNs
Towards Debugging Deep Neural Networks by Generating Speech Utterances

Bilal Soomro +3
cs.CL 2019-07-06 reviewed

17.55 hours of unlabelled Somali audio cut ASR error by 7.74%
Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

Astik Biswas +3
cs.LG 2019-07-05 reviewed

CNN learns delays to align speech with emotion labels
Jointly Aligning and Predicting Continuous Emotion Annotations

Soheil Khorram +2
eess.AS 2019-07-05 reviewed

WaveNet upsamples 8 kHz GSM speech near AMR-WB quality
Speech bandwidth extension with WaveNet

Archit Gupta +3
eess.AS 2019-07-05 reviewed

Deep learning adds controllable emotion to synthetic speech
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach

No\'e Tits
eess.AS 2019-07-05 reviewed

Compensation protocol fixes Unity timing issues for AV research
Synchronizing Audio-Visual Film Stimuli in Unity (version 5.5.1f1): Game Engines as a Tool for Research

Javier Sanz +4
cs.SD 2019-07-05 reviewed

Transformer spots chords via adaptive attention segments
A Bi-directional Transformer for Musical Chord Recognition

Jonggwon Park +4
eess.AS 2019-07-05 reviewed

ResNet detects replays at 1.08% EER using perturbed group delay grams
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

Weicheng Cai +3
eess.AS 2019-07-04 reviewed

Phoneme timestamps stabilize prosody transfer from unseen speakers
Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Viacheslav Klimkov +3
cs.LG 2019-07-04 reviewed

Neural net turns any-length audio into full-pose lecture videos
Lumi\`ereNet: Lecture Video Synthesis from Audio

Byung-Hak Kim +1
cs.SD 2019-07-04 reviewed

Frame attention in convRNN sets ESC accuracy records
Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Zhichao Zhang +4
eess.AS 2019-07-04 reviewed

DKU pipeline reaches 4.96% EER on distant speaker task
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge

Danwei Cai +3
eess.AS 2019-07-04 reviewed

Multi-extractor speaker system reaches 0.392 and 0.494 detection costs
The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation

Danwei Cai +2
cs.SD 2019-07-03 reviewed

CNNs on cochlear features improve speech enhancement for implants
Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

Nursadul Mamun +2
eess.AS 2019-07-03 reviewed

High frame rates reduce ASR word error rates by up to 24%
End-to-End Speech Recognition with High-Frame-Rate Features Extraction

Cong-Thanh Do
cs.SD 2019-07-03 reviewed

CNN layers mirror classical audio features in instrument recognition
A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features

Olga Slizovskaia +2
cs.LG 2019-07-03 reviewed

Tuned receptive fields let ResNet beat VGG on audio scenes
The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

Khaled Koutini +3
cs.CL 2019-07-03 reviewed

Conditional net reaches 94.69% on Mandarin polyphone task
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features

Zexin Cai +4
cs.SD 2019-07-03 reviewed

CNNs outperform dense nets on noisy user labels for VoIP audio
Supervised Classifiers for Audio Impairments with Noisy Labels

Chandan K A Reddy +2
eess.AS 2019-07-02 reviewed

Hierarchical VAE-GAN generates 136-beat melodies with form
MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation

Xia Liang +2
eess.AS 2019-07-02 reviewed

Sub-band CNN cuts spoken term classification compute by up to 49%
Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification

Chieh-Chi Kao +4
eess.AS 2019-07-02 reviewed

Decoding trick trains attention models to detect speech features end-to-end
Attention model for articulatory features detection

Ievgen Karaulov +1
cs.SD 2019-07-02 reviewed

Image context lifts UAV voice command accuracy despite noisy pairings
Kite: Automatic speech recognition for unmanned aerial vehicles

Dan Oneata +1
cs.SD 2019-07-02 reviewed

Robot reveals full room geometry from random start using sound
Can a Robot Hear the Shape and Dimensions of a Room?

Linh Nguyen +2
cs.SD 2019-07-02 reviewed

Speech separation gains hold up under real ambient noise
WHAM!: Extending Speech Separation to Noisy Environments

Gordon Wichern +7
cs.MM 2019-07-02 reviewed

Cognitive models plus multi-agent rules raise game music immersion
Adaptive Music Composition for Games

Patrick Hutchings +1
eess.AS 2019-07-01 reviewed

Two-word recombination enables real-time LSTM LVCSR decoding
LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Eugen Beck +3
eess.AS 2019-07-01 reviewed

Distillation plus quantization shrinks AED models to 2% size
Compression of Acoustic Event Detection Models With Quantized Distillation

Bowen Shi +5
cs.CL 2019-07-01 reviewed

UltraSuite releases ultrasound data from child speech therapy
UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Aciel Eshky +6
cs.LG 2019-07-01 reviewed

Disentangling flows organize synthesizer latent space
Universal audio synthesizer control with normalizing flows

Philippe Esling +4
eess.AS 2019-07-01 reviewed

TTS data and neural denorming cut numeric ASR WER by up to 8x
Improving Performance of End-to-End ASR on Numeric Sequences

Cal Peyser +3
eess.AS 2019-07-01 reviewed

GAN vocoder beats classical methods on perceptual scores
Analysis by Adversarial Synthesis -- A Novel Approach for Speech Vocoding

Ahmed Mustafa +4
eess.AS 2019-07-01 reviewed

Mean frame lifts speaker-independent ultrasound classification
Speaker-independent classification of phonetic segments from raw ultrasound in child speech

Manuel Sam Ribeiro +3
cs.LG 2019-07-01 reviewed

Cosine similarity degrades subsidiary models more efficiently than cross-entropy
Cosine similarity-based adversarial process

Hee-Soo Heo +4
cs.CL 2019-06-30 reviewed

ResNet yields better multilingual bottleneck features for spoken term detection
Multilingual Bottleneck Features for Query by Example Spoken Term Detection

Dhananjay Ram +2
cs.CL 2019-06-30 reviewed

Bi-directional network raises joint intent-slot accuracy
A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling

Haihong E +3
cs.CL 2019-06-28 reviewed

Voice embeddings cut expression detection error by 60%
Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Vikramjit Mitra +8
eess.SP 2019-06-28 reviewed

Reflection paths extend image sources to curved boundaries
An Image Source Method Framework for Arbitrary Reflecting Boundaries

Pierre Quinton +2
eess.AS 2019-06-28 reviewed

Multi-view lip videos yield better speech from silence
Lipper: Synthesizing Thy Speech using Multi-View Lipreading

Yaman Kumar +5
eess.AS 2019-06-27 reviewed

SVD-PHAT cuts multi-source localization error by up to 0.0395 radians
Multiple Sound Source Localization with SVD-PHAT

Francois Grondin +1
cs.IR 2019-06-27 reviewed

Artist album track metadata trains music representations
Representation Learning of Music Using Artist, Album, and Track Information

Jongpil Lee +2