archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 11

cs.SD 2019-07-25 reviewed

RL agent cuts lung exam time fourfold
Interactive Lungs Auscultation with Reinforcement Learning Agent

Tomasz Grzywalski +4
eess.AS 2019-07-24 reviewed

Cross-attention between speakers improves conversational ASR
Cross-Attention End-to-End ASR for Two-Party Conversations

Suyoun Kim +2
eess.SP 2019-07-24 reviewed

Neural network post-filter cleans synthetic head motions
A neural network based post-filter for speech-driven head motion synthesis

JinHong Lu +1
eess.AS 2019-07-24 reviewed

Cyclic VAE creates optimization targets for non-parallel voice conversion
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

Patrick Lumban Tobing +4
cs.SD 2019-07-23 reviewed

Color spectrogram encodes full sound wave for exact image recovery
Log Complex Color for Visual Pattern Recognition of Total Sound

Stephen Wedekind +1
cs.SD 2019-07-23 reviewed

DC embeddings fed to uPIT improve speaker-independent separation
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Cunhang Fan +4
cs.HC 2019-07-23 reviewed

Web interface lets any model drive music inpainting
NONOTO: A Model-agnostic Web Interface for Interactive Music Composition by Inpainting

Th\'eis Bazin +1
cs.LG 2019-07-23 reviewed

Crossmodal training boosts monomodal emotion recognition
EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings

Jing Han +3
eess.AS 2019-07-22 reviewed

New net identifies speakers from 250 ms voice clips
A Deep Neural Network for Short-Segment Speaker Recognition

Amirhossein Hajavi +1
cs.SD 2019-07-22 reviewed

Three-step process reduces errors in crowdsourced audio captions
Crowdsourcing a Dataset of Audio Captions

Samuel Lipping +2
cs.HC 2019-07-19 reviewed

Vocal imitation search beats text for hard-to-describe sounds
Sound Search by Text Description or Vocal Imitation?

Yichi Zhang +2
cs.SD 2019-07-19 reviewed

Augmenting with audio effects raises instrument classification accuracy on processed one-s
Data Augmentation for Instrument Classification Robust to Audio Effects

Ant\'onio Ramires +1
eess.AS 2019-07-19 reviewed

Embeddings trained on human similarity scores improve open-speaker synthesis
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

Yuki Saito +2
eess.AS 2019-07-19 reviewed

Density weighting equalizes anomaly scores for normal sounds
Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

Yuma Koizumi +4
cs.SD 2019-07-18 reviewed

Hybrid method best translates music genres across tag systems
Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation

Elena V. Epure +2
eess.AS 2019-07-18 reviewed

Bidirectional decoding cuts exposure bias in TTS
Forward-Backward Decoding for Regularizing End-to-End TTS

Yibin Zheng +6
cs.SD 2019-07-17 reviewed

Semi-supervised ensemble lifts sound event F-measure to 42%
HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods

Ziqiang Shi +4
cs.HC 2019-07-16 reviewed

Hybrid neural classifier improves help responses in personal assistants
Conversational Help for Task Completion and Feature Discovery in Personal Assistants

Madan Gopal Jhawar +5
eess.AS 2019-07-16 reviewed

The paper proposes combining total variability modeling with non-negative matrix…
Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

Kunal Dhawan +3
eess.AS 2019-07-15 reviewed

GAN data augmentation plus CNN fusion exceeds 85% accuracy
Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

Hangting Chen +4
eess.AS 2019-07-15 reviewed

Seq2seq voice conversion adapts via single-speaker autoencoder pretraining
Hierarchical Sequence to Sequence Voice Conversion with Limited Data

Praveen Narayanan +3
eess.AS 2019-07-15 reviewed

Target reduction trains E2E ASR reliably on limited code-switched data
Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

Kunal Dhawan +3
cs.CL 2019-07-15 reviewed

Attention E2E network improves LID accuracy on code-switched speech
Joint Language Identification of Code-Switching Speech using Attention based E2E Network

Sreeram Ganji +3
cs.SD 2019-07-14 reviewed

Bach Doodle harmonizes 55 million user melodies
The Bach Doodle: Approachable music composition with machine learning at scale

Cheng-Zhi Anna Huang +6
q-bio.NC 2019-07-14 reviewed

Autoencoders shorten sensory substitution training to hours
Autoencoding sensory substitution

Viktor T\'oth +1
cs.SD 2019-07-13 reviewed

Four databases merged for voice pathology detection at F1 0.733
Towards Robust Voice Pathology Detection

Pavol Harar +5
eess.AS 2019-07-13 reviewed

Fusion of three attentive CNNs raises DCASE scene accuracy
Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

Hossein Zeinali +2
eess.AS 2019-07-13 reviewed

X-vector fusion reaches 1.0% EER in VOiCES 2019 challenge
BUT VOiCES 2019 System Description

Hossein Zeinali +8
eess.AS 2019-07-13 reviewed

Digit-specific i-vectors hit 1.52% EER on random strings
Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

Nooshin Maghsoodi +3
eess.AS 2019-07-13 reviewed

Language model teaches speech recognizer via soft labels
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Ye Bai +4
cs.SD 2019-07-13 reviewed

Complex autoencoder yields invariant magnitude space for audio
Learning Complex Basis Functions for Invariant Representations of Audio

Stefan Lattner +2
eess.AS 2019-07-12 reviewed

CNN-LSTM detects voice pathology from raw audio at 68% accuracy
Voice Pathology Detection Using Deep Learning: a Preliminary Study

Pavol Harar +5
eess.AS 2019-07-12 reviewed

Relative modeling fixes inconsistent speaker labels in dialogs
Effective Incorporation of Speaker Information in Utterance Encoding in Dialog

Tianyu Zhao +1
cs.SD 2019-07-12 reviewed

Toeplitz MRF clustering reduces diarization error up to 43%
Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Harishchandra Dubey +2
cs.LG 2019-07-12 reviewed

R-Transformer outperforms SOTA on most sequence tasks without position embeddings
R-Transformer: Recurrent Neural Network Enhanced Transformer

Zhiwei Wang +3
eess.AS 2019-07-11 reviewed

RNNs outperform prior methods on lung-sound disease detection
Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks

Diego Perna +1
eess.SP 2019-07-11 reviewed

Two-stage grouping cuts filter bank resources by half
Optimized Sharing of Coefficients in Parallel Filter Banks

M. Tun\c{c} Arslan +2
cs.MM 2019-07-11 reviewed

GAN embeds secret audio inside carrier audio at high fidelity
Heard More Than Heard: An Audio Steganography Method Based on GAN

Dengpan Ye +2
cs.SD 2019-07-11 reviewed

Multichannel divergence loss trains DNNs for beamforming
Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

Yoshiki Masuyama +2
cs.CV 2019-07-11 reviewed

Network separates voice despite hidden lips
My lips are concealed: Audio-visual speech enhancement through obstructions

Triantafyllos Afouras +2
eess.AS 2019-07-10 reviewed

One DNN acoustic model handles both wideband and narrowband ASR
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

Khoi-Nguyen C. Mac +3
cs.CL 2019-07-10 reviewed

Anchored evolution improves speech recognition models
Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

Xiaodong Cui +1
cs.SD 2019-07-10 reviewed

Pre-training on MIDI improves NES music generation
LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

Chris Donahue +4
eess.AS 2019-07-10 reviewed

ADPSGD trains ASR models with 3x larger batches
A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

Wei Zhang +8
eess.AS 2019-07-10 reviewed

Latent space model detects dysarthria more accurately and reconstructs fluent speech
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Daniel Korzekwa +4
cs.LG 2019-07-10 reviewed

Multi-layer attention improves keyword spotting accuracy
Multi-layer Attention Mechanism for Speech Keyword Recognition

Ruisen Luo +7
cs.SD 2019-07-10 reviewed

Musical conditioning improves RNN melody generation
Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs

Benjamin Genchel +2
cs.CL 2019-07-09 reviewed

Multi-speaker ClariNet beats prior systems on naturalness
Multi-Speaker End-to-End Speech Synthesis

Jihyun Park +3
cs.CL 2019-07-09 reviewed

Model clones English voices into fluent Spanish and Mandarin
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Yu Zhang +8
cs.CV 2019-07-09 reviewed

One model translates between text
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

Shuang Ma +2