archive
Every paper Pith has read. Search by title, abstract, or pith.
623 papers in eess.AS · page 11
-
RL agent cuts lung exam time fourfold
Interactive Lungs Auscultation with Reinforcement Learning Agent
-
Cross-attention between speakers improves conversational ASR
Cross-Attention End-to-End ASR for Two-Party Conversations
-
Neural network post-filter cleans synthetic head motions
A neural network based post-filter for speech-driven head motion synthesis
-
Cyclic VAE creates optimization targets for non-parallel voice conversion
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
-
Color spectrogram encodes full sound wave for exact image recovery
Log Complex Color for Visual Pattern Recognition of Total Sound
-
DC embeddings fed to uPIT improve speaker-independent separation
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features
-
Web interface lets any model drive music inpainting
NONOTO: A Model-agnostic Web Interface for Interactive Music Composition by Inpainting
-
Crossmodal training boosts monomodal emotion recognition
EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings
-
New net identifies speakers from 250 ms voice clips
A Deep Neural Network for Short-Segment Speaker Recognition
-
Three-step process reduces errors in crowdsourced audio captions
Crowdsourcing a Dataset of Audio Captions
-
Vocal imitation search beats text for hard-to-describe sounds
Sound Search by Text Description or Vocal Imitation?
-
Augmenting with audio effects raises instrument classification accuracy on processed one-s
Data Augmentation for Instrument Classification Robust to Audio Effects
-
Embeddings trained on human similarity scores improve open-speaker synthesis
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis
-
Density weighting equalizes anomaly scores for normal sounds
Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds
-
Hybrid method best translates music genres across tag systems
Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation
-
Bidirectional decoding cuts exposure bias in TTS
Forward-Backward Decoding for Regularizing End-to-End TTS
-
Semi-supervised ensemble lifts sound event F-measure to 42%
HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods
-
Hybrid neural classifier improves help responses in personal assistants
Conversational Help for Task Completion and Feature Discovery in Personal Assistants
-
The paper proposes combining total variability modeling with non-negative matrix…
Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features
-
GAN data augmentation plus CNN fusion exceeds 85% accuracy
Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling
-
Seq2seq voice conversion adapts via single-speaker autoencoder pretraining
Hierarchical Sequence to Sequence Voice Conversion with Limited Data
-
Target reduction trains E2E ASR reliably on limited code-switched data
Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data
-
Attention E2E network improves LID accuracy on code-switched speech
Joint Language Identification of Code-Switching Speech using Attention based E2E Network
-
Bach Doodle harmonizes 55 million user melodies
The Bach Doodle: Approachable music composition with machine learning at scale
-
Autoencoders shorten sensory substitution training to hours
Autoencoding sensory substitution
-
Four databases merged for voice pathology detection at F1 0.733
Towards Robust Voice Pathology Detection
-
Fusion of three attentive CNNs raises DCASE scene accuracy
Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge
-
X-vector fusion reaches 1.0% EER in VOiCES 2019 challenge
BUT VOiCES 2019 System Description
-
Digit-specific i-vectors hit 1.52% EER on random strings
Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors
-
Language model teaches speech recognizer via soft labels
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
-
Complex autoencoder yields invariant magnitude space for audio
Learning Complex Basis Functions for Invariant Representations of Audio
-
CNN-LSTM detects voice pathology from raw audio at 68% accuracy
Voice Pathology Detection Using Deep Learning: a Preliminary Study
-
Relative modeling fixes inconsistent speaker labels in dialogs
Effective Incorporation of Speaker Information in Utterance Encoding in Dialog
-
Toeplitz MRF clustering reduces diarization error up to 43%
Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams
-
R-Transformer outperforms SOTA on most sequence tasks without position embeddings
R-Transformer: Recurrent Neural Network Enhanced Transformer
-
RNNs outperform prior methods on lung-sound disease detection
Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks
-
Two-stage grouping cuts filter bank resources by half
Optimized Sharing of Coefficients in Parallel Filter Banks
-
GAN embeds secret audio inside carrier audio at high fidelity
Heard More Than Heard: An Audio Steganography Method Based on GAN
-
Multichannel divergence loss trains DNNs for beamforming
Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming
-
Network separates voice despite hidden lips
My lips are concealed: Audio-visual speech enhancement through obstructions
-
One DNN acoustic model handles both wideband and narrowband ASR
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
-
Anchored evolution improves speech recognition models
Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition
-
Pre-training on MIDI improves NES music generation
LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
-
ADPSGD trains ASR models with 3x larger batches
A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
-
Latent space model detects dysarthria more accurately and reconstructs fluent speech
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
-
Multi-layer attention improves keyword spotting accuracy
Multi-layer Attention Mechanism for Speech Keyword Recognition
-
Musical conditioning improves RNN melody generation
Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs
-
Multi-speaker ClariNet beats prior systems on naturalness
Multi-Speaker End-to-End Speech Synthesis
-
Model clones English voices into fluent Spanish and Mandarin
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
-
One model translates between text
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention