pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 11

  1. cs.SD 2019-07-25 reviewed
    RL agent cuts lung exam time fourfold

    Interactive Lungs Auscultation with Reinforcement Learning Agent

    Tomasz Grzywalski +4

  2. eess.AS 2019-07-24 reviewed
    Cross-attention between speakers improves conversational ASR

    Cross-Attention End-to-End ASR for Two-Party Conversations

    Suyoun Kim +2

  3. eess.SP 2019-07-24 reviewed
    Neural network post-filter cleans synthetic head motions

    A neural network based post-filter for speech-driven head motion synthesis

    JinHong Lu +1

  4. eess.AS 2019-07-24 reviewed
    Cyclic VAE creates optimization targets for non-parallel voice conversion

    Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

    Patrick Lumban Tobing +4

  5. cs.SD 2019-07-23 reviewed
    Color spectrogram encodes full sound wave for exact image recovery

    Log Complex Color for Visual Pattern Recognition of Total Sound

    Stephen Wedekind +1

  6. cs.SD 2019-07-23 reviewed
    DC embeddings fed to uPIT improve speaker-independent separation

    Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

    Cunhang Fan +4

  7. cs.HC 2019-07-23 reviewed
    Web interface lets any model drive music inpainting

    NONOTO: A Model-agnostic Web Interface for Interactive Music Composition by Inpainting

    Th\'eis Bazin +1

  8. cs.LG 2019-07-23 reviewed
    Crossmodal training boosts monomodal emotion recognition

    EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings

    Jing Han +3

  9. eess.AS 2019-07-22 reviewed
    New net identifies speakers from 250 ms voice clips

    A Deep Neural Network for Short-Segment Speaker Recognition

    Amirhossein Hajavi +1

  10. cs.SD 2019-07-22 reviewed
    Three-step process reduces errors in crowdsourced audio captions

    Crowdsourcing a Dataset of Audio Captions

    Samuel Lipping +2

  11. cs.HC 2019-07-19 reviewed
    Vocal imitation search beats text for hard-to-describe sounds

    Sound Search by Text Description or Vocal Imitation?

    Yichi Zhang +2

  12. cs.SD 2019-07-19 reviewed
    Augmenting with audio effects raises instrument classification accuracy on processed one-s

    Data Augmentation for Instrument Classification Robust to Audio Effects

    Ant\'onio Ramires +1

  13. eess.AS 2019-07-19 reviewed
    Embeddings trained on human similarity scores improve open-speaker synthesis

    DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

    Yuki Saito +2

  14. eess.AS 2019-07-19 reviewed
    Density weighting equalizes anomaly scores for normal sounds

    Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

    Yuma Koizumi +4

  15. cs.SD 2019-07-18 reviewed
    Hybrid method best translates music genres across tag systems

    Leveraging Knowledge Bases And Parallel Annotations For Music Genre Translation

    Elena V. Epure +2

  16. eess.AS 2019-07-18 reviewed
    Bidirectional decoding cuts exposure bias in TTS

    Forward-Backward Decoding for Regularizing End-to-End TTS

    Yibin Zheng +6

  17. cs.SD 2019-07-17 reviewed
    Semi-supervised ensemble lifts sound event F-measure to 42%

    HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods

    Ziqiang Shi +4

  18. cs.HC 2019-07-16 reviewed
    Hybrid neural classifier improves help responses in personal assistants

    Conversational Help for Task Completion and Feature Discovery in Personal Assistants

    Madan Gopal Jhawar +5

  19. eess.AS 2019-07-16 reviewed
    The paper proposes combining total variability modeling with non-negative matrix…

    Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

    Kunal Dhawan +3

  20. eess.AS 2019-07-15 reviewed
    GAN data augmentation plus CNN fusion exceeds 85% accuracy

    Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

    Hangting Chen +4

  21. eess.AS 2019-07-15 reviewed
    Seq2seq voice conversion adapts via single-speaker autoencoder pretraining

    Hierarchical Sequence to Sequence Voice Conversion with Limited Data

    Praveen Narayanan +3

  22. eess.AS 2019-07-15 reviewed
    Target reduction trains E2E ASR reliably on limited code-switched data

    Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

    Kunal Dhawan +3

  23. cs.CL 2019-07-15 reviewed
    Attention E2E network improves LID accuracy on code-switched speech

    Joint Language Identification of Code-Switching Speech using Attention based E2E Network

    Sreeram Ganji +3

  24. cs.SD 2019-07-14 reviewed
    Bach Doodle harmonizes 55 million user melodies

    The Bach Doodle: Approachable music composition with machine learning at scale

    Cheng-Zhi Anna Huang +6

  25. q-bio.NC 2019-07-14 reviewed
    Autoencoders shorten sensory substitution training to hours

    Autoencoding sensory substitution

    Viktor T\'oth +1

  26. cs.SD 2019-07-13 reviewed
    Four databases merged for voice pathology detection at F1 0.733

    Towards Robust Voice Pathology Detection

    Pavol Harar +5

  27. eess.AS 2019-07-13 reviewed
    Fusion of three attentive CNNs raises DCASE scene accuracy

    Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

    Hossein Zeinali +2

  28. eess.AS 2019-07-13 reviewed
    X-vector fusion reaches 1.0% EER in VOiCES 2019 challenge

    BUT VOiCES 2019 System Description

    Hossein Zeinali +8

  29. eess.AS 2019-07-13 reviewed
    Digit-specific i-vectors hit 1.52% EER on random strings

    Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

    Nooshin Maghsoodi +3

  30. eess.AS 2019-07-13 reviewed
    Language model teaches speech recognizer via soft labels

    Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

    Ye Bai +4

  31. cs.SD 2019-07-13 reviewed
    Complex autoencoder yields invariant magnitude space for audio

    Learning Complex Basis Functions for Invariant Representations of Audio

    Stefan Lattner +2

  32. eess.AS 2019-07-12 reviewed
    CNN-LSTM detects voice pathology from raw audio at 68% accuracy

    Voice Pathology Detection Using Deep Learning: a Preliminary Study

    Pavol Harar +5

  33. eess.AS 2019-07-12 reviewed
    Relative modeling fixes inconsistent speaker labels in dialogs

    Effective Incorporation of Speaker Information in Utterance Encoding in Dialog

    Tianyu Zhao +1

  34. cs.SD 2019-07-12 reviewed
    Toeplitz MRF clustering reduces diarization error up to 43%

    Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

    Harishchandra Dubey +2

  35. cs.LG 2019-07-12 reviewed
    R-Transformer outperforms SOTA on most sequence tasks without position embeddings

    R-Transformer: Recurrent Neural Network Enhanced Transformer

    Zhiwei Wang +3

  36. eess.AS 2019-07-11 reviewed
    RNNs outperform prior methods on lung-sound disease detection

    Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks

    Diego Perna +1

  37. eess.SP 2019-07-11 reviewed
    Two-stage grouping cuts filter bank resources by half

    Optimized Sharing of Coefficients in Parallel Filter Banks

    M. Tun\c{c} Arslan +2

  38. cs.MM 2019-07-11 reviewed
    GAN embeds secret audio inside carrier audio at high fidelity

    Heard More Than Heard: An Audio Steganography Method Based on GAN

    Dengpan Ye +2

  39. cs.SD 2019-07-11 reviewed
    Multichannel divergence loss trains DNNs for beamforming

    Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

    Yoshiki Masuyama +2

  40. cs.CV 2019-07-11 reviewed
    Network separates voice despite hidden lips

    My lips are concealed: Audio-visual speech enhancement through obstructions

    Triantafyllos Afouras +2

  41. eess.AS 2019-07-10 reviewed
    One DNN acoustic model handles both wideband and narrowband ASR

    Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

    Khoi-Nguyen C. Mac +3

  42. cs.CL 2019-07-10 reviewed
    Anchored evolution improves speech recognition models

    Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

    Xiaodong Cui +1

  43. cs.SD 2019-07-10 reviewed
    Pre-training on MIDI improves NES music generation

    LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

    Chris Donahue +4

  44. eess.AS 2019-07-10 reviewed
    ADPSGD trains ASR models with 3x larger batches

    A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

    Wei Zhang +8

  45. eess.AS 2019-07-10 reviewed
    Latent space model detects dysarthria more accurately and reconstructs fluent speech

    Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Daniel Korzekwa +4

  46. cs.LG 2019-07-10 reviewed
    Multi-layer attention improves keyword spotting accuracy

    Multi-layer Attention Mechanism for Speech Keyword Recognition

    Ruisen Luo +7

  47. cs.SD 2019-07-10 reviewed
    Musical conditioning improves RNN melody generation

    Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs

    Benjamin Genchel +2

  48. cs.CL 2019-07-09 reviewed
    Multi-speaker ClariNet beats prior systems on naturalness

    Multi-Speaker End-to-End Speech Synthesis

    Jihyun Park +3

  49. cs.CL 2019-07-09 reviewed
    Model clones English voices into fluent Spanish and Mandarin

    Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

    Yu Zhang +8

  50. cs.CV 2019-07-09 reviewed
    One model translates between text

    M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

    Shuang Ma +2