archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 13

cs.CL 2019-06-27 reviewed

Gated embeddings cut error in conversational speech recognition
Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

Suyoun Kim +2
cs.CL 2019-06-27 reviewed

Lattices enable acoustic model adaptation at over 50% error rate
Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

Ondrej Klejch +3
eess.AS 2019-06-27 reviewed

Re-annotation produces 1369 public cough events from AMI corpus
Re-annotation of cough events in the AMI corpus

Paul Leamy +3
eess.AS 2019-06-26 reviewed

Nonverbal speech features predict group performance on their own
Analyzing Verbal and Nonverbal Features for Predicting Group Performance

Uliyana Kubasova +2
cs.SD 2019-06-26 reviewed

One embedding space aligns monophonic vocals with full mixes
Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

Kyungyun Lee +1
cs.IR 2019-06-26 reviewed

Soft attention makes audio-to-sheet retrieval tempo-invariant
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

Stefan Balke +4
eess.AS 2019-06-26 reviewed

Russian corpus gives 31 hours of one-speaker speech for TTS
RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

Lenar Gabdrakhmanov +2
cs.CL 2019-06-26 reviewed

Auxiliary loss cuts target-speaker error by 6.6 percent
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

Naoyuki Kanda +5
eess.AS 2019-06-26 reviewed

Style tokens map to emotions with 5% labels
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training

Peng-fei Wu +5
cs.CL 2019-06-26 reviewed

Essence filtering lets single speech model beat its teacher ensemble
Essence Knowledge Distillation for Speech Recognition

Zhenchuan Yang +4
cs.LG 2019-06-25 reviewed

Fusion of audio and video features reaches 0.75 CCC for arousal
Emotion Recognition Using Fusion of Audio and Video Features

Juan D. S. Ortega +2
eess.AS 2019-06-25 reviewed

Teacher-student loop aligns 5358 tracks of audio with lyrics
DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm

Gabriel Meseguer-Brocal +2
cs.CV 2019-06-25 reviewed

Word CTC on 3D-2D-CNN-BLSTM hits 1.3% lipreading WER
LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

Dilip Kumar Margam +6
cs.SD 2019-06-25 reviewed

3D CNN ensemble beats baseline in AVA speaker detection
Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)

Joon Son Chung
eess.AS 2019-06-25 reviewed

Adapted solo-singing models cut polyphonic lyrics alignment errors
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Chitralekha Gupta +2
cs.CL 2019-06-24 reviewed

Contrastive loss transfers text knowledge to audio emotion models
Multimodal and Multi-view Models for Emotion Recognition

Gustavo Aguilar +3
cs.SD 2019-06-24 reviewed

Audio-visual enrollment improves speaker diarisation in meetings
Who said that?: Audio-visual speaker diarisation of real-world meetings

Joon Son Chung +2
cs.SD 2019-06-24 reviewed

Speaker embeddings raise single-channel separation to 4.79 dB SDR
Single-Channel Speech Separation with Auxiliary Speaker Embeddings

Shuo Liu +2
eess.AS 2019-06-22 reviewed

Balancing and MTL lift end-to-end ASR for Hindi-English code-switching
End-to-End ASR for Code-switched Hindi-English Speech

Brij Mohan Lal Srivastava +4
cs.SD 2019-06-22 reviewed

Multi-task network lifts KWS accuracy 32% for hearing aids
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Iv\'an L\'opez-Espejo +2
cs.SD 2019-06-21 reviewed

Scattering coefficients re-synthesize audio textures and enable new effects
The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering

Vincent Lostanlen +1
cs.CL 2019-06-21 reviewed

Phoneme biasing lifts foreign name accuracy 16% over grapheme baselines
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

Ke Hu +4
cs.SD 2019-06-21 reviewed

VAE predicts future music values to compose new pieces
Classical Music Prediction and Composition by means of Variational Autoencoders

Daniel Rivero +2
cs.SD 2019-06-21 reviewed

ADSR HMM fusion yields SOTA piano transcription on MAPS
Deep Polyphonic ADSR Piano Note Transcription

Rainer Kelz +2
cs.SD 2019-06-21 reviewed

Querying style-trained VAE with different music yields structured blends
Query-based Deep Improvisation

Shlomo Dubnov
eess.AS 2019-06-21 reviewed

TensorFlow model matches Kaldi accuracy in WFST decoder
Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

Minkyu Lim +1
cs.SD 2019-06-21 reviewed

Autoregressive models improve singing voice F0 prediction over RNNs
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

Yuan-Hao Yi +3
eess.AS 2019-06-21 reviewed

Echoes enable 2D localization from two microphones
Mirage: 2D Source Localization Using Microphone Pair Augmentation with Echoes

Diego Di Carlo (PANAMA) +2
cs.SD 2019-06-21 reviewed

Melody features classify Hindustani
Understanding and Classifying Cultural Music Using Melodic Features Case Of Hindustani, Carnatic And Turkish Music

Amruta Vidwans +2
eess.AS 2019-06-20 reviewed

Subspace rotation normalizes narrowband stats for wideband DOA
A Signal Subspace Rotation Method for Localization of Multiple Wideband Sound Sources

Kainan Chen +2
cs.LG 2019-06-20 reviewed

Updating UBM during i-vector training yields 1-2% gains
Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

Ville Vestman +3
q-bio.NC 2019-06-20 reviewed

Multi-brain fMRI embeddings outperform raw data in genre and topic tasks
Low-dimensional Embodied Semantics for Music and Language

Francisco Afonso Raposo +2
cs.SD 2019-06-20 reviewed

Adversarial training lowers error in music transcription
Adversarial Learning for Improved Onsets and Frames Music Transcription

Jong Wook Kim +1
cs.SD 2019-06-20 reviewed

Joint training boosts keyword spotting in noise
A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting

Yue Gu +3
eess.AS 2019-06-20 reviewed

DL enhances MELP codec parameters directly in noise
Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment

Min-Jae Hwang +1