pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 13

  1. cs.CL 2019-06-27 reviewed
    Gated embeddings cut error in conversational speech recognition

    Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

    Suyoun Kim +2

  2. cs.CL 2019-06-27 reviewed
    Lattices enable acoustic model adaptation at over 50% error rate

    Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

    Ondrej Klejch +3

  3. eess.AS 2019-06-27 reviewed
    Re-annotation produces 1369 public cough events from AMI corpus

    Re-annotation of cough events in the AMI corpus

    Paul Leamy +3

  4. eess.AS 2019-06-26 reviewed
    Nonverbal speech features predict group performance on their own

    Analyzing Verbal and Nonverbal Features for Predicting Group Performance

    Uliyana Kubasova +2

  5. cs.SD 2019-06-26 reviewed
    One embedding space aligns monophonic vocals with full mixes

    Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

    Kyungyun Lee +1

  6. cs.IR 2019-06-26 reviewed
    Soft attention makes audio-to-sheet retrieval tempo-invariant

    Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

    Stefan Balke +4

  7. eess.AS 2019-06-26 reviewed
    Russian corpus gives 31 hours of one-speaker speech for TTS

    RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

    Lenar Gabdrakhmanov +2

  8. cs.CL 2019-06-26 reviewed
    Auxiliary loss cuts target-speaker error by 6.6 percent

    Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

    Naoyuki Kanda +5

  9. eess.AS 2019-06-26 reviewed
    Style tokens map to emotions with 5% labels

    End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training

    Peng-fei Wu +5

  10. cs.CL 2019-06-26 reviewed
    Essence filtering lets single speech model beat its teacher ensemble

    Essence Knowledge Distillation for Speech Recognition

    Zhenchuan Yang +4

  11. cs.LG 2019-06-25 reviewed
    Fusion of audio and video features reaches 0.75 CCC for arousal

    Emotion Recognition Using Fusion of Audio and Video Features

    Juan D. S. Ortega +2

  12. eess.AS 2019-06-25 reviewed
    Teacher-student loop aligns 5358 tracks of audio with lyrics

    DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm

    Gabriel Meseguer-Brocal +2

  13. cs.CV 2019-06-25 reviewed
    Word CTC on 3D-2D-CNN-BLSTM hits 1.3% lipreading WER

    LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

    Dilip Kumar Margam +6

  14. cs.SD 2019-06-25 reviewed
    3D CNN ensemble beats baseline in AVA speaker detection

    Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)

    Joon Son Chung

  15. eess.AS 2019-06-25 reviewed
    Adapted solo-singing models cut polyphonic lyrics alignment errors

    Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

    Chitralekha Gupta +2

  16. cs.CL 2019-06-24 reviewed
    Contrastive loss transfers text knowledge to audio emotion models

    Multimodal and Multi-view Models for Emotion Recognition

    Gustavo Aguilar +3

  17. cs.SD 2019-06-24 reviewed
    Audio-visual enrollment improves speaker diarisation in meetings

    Who said that?: Audio-visual speaker diarisation of real-world meetings

    Joon Son Chung +2

  18. cs.SD 2019-06-24 reviewed
    Speaker embeddings raise single-channel separation to 4.79 dB SDR

    Single-Channel Speech Separation with Auxiliary Speaker Embeddings

    Shuo Liu +2

  19. eess.AS 2019-06-22 reviewed
    Balancing and MTL lift end-to-end ASR for Hindi-English code-switching

    End-to-End ASR for Code-switched Hindi-English Speech

    Brij Mohan Lal Srivastava +4

  20. cs.SD 2019-06-22 reviewed
    Multi-task network lifts KWS accuracy 32% for hearing aids

    Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

    Iv\'an L\'opez-Espejo +2

  21. cs.SD 2019-06-21 reviewed
    Scattering coefficients re-synthesize audio textures and enable new effects

    The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering

    Vincent Lostanlen +1

  22. cs.CL 2019-06-21 reviewed
    Phoneme biasing lifts foreign name accuracy 16% over grapheme baselines

    Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

    Ke Hu +4

  23. cs.SD 2019-06-21 reviewed
    VAE predicts future music values to compose new pieces

    Classical Music Prediction and Composition by means of Variational Autoencoders

    Daniel Rivero +2

  24. cs.SD 2019-06-21 reviewed
    ADSR HMM fusion yields SOTA piano transcription on MAPS

    Deep Polyphonic ADSR Piano Note Transcription

    Rainer Kelz +2

  25. cs.SD 2019-06-21 reviewed
  26. eess.AS 2019-06-21 reviewed
    TensorFlow model matches Kaldi accuracy in WFST decoder

    Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

    Minkyu Lim +1

  27. cs.SD 2019-06-21 reviewed
    Autoregressive models improve singing voice F0 prediction over RNNs

    Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

    Yuan-Hao Yi +3

  28. eess.AS 2019-06-21 reviewed
    Echoes enable 2D localization from two microphones

    Mirage: 2D Source Localization Using Microphone Pair Augmentation with Echoes

    Diego Di Carlo (PANAMA) +2

  29. cs.SD 2019-06-21 reviewed
    Melody features classify Hindustani

    Understanding and Classifying Cultural Music Using Melodic Features Case Of Hindustani, Carnatic And Turkish Music

    Amruta Vidwans +2

  30. eess.AS 2019-06-20 reviewed
    Subspace rotation normalizes narrowband stats for wideband DOA

    A Signal Subspace Rotation Method for Localization of Multiple Wideband Sound Sources

    Kainan Chen +2

  31. cs.LG 2019-06-20 reviewed
    Updating UBM during i-vector training yields 1-2% gains

    Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

    Ville Vestman +3

  32. q-bio.NC 2019-06-20 reviewed
    Multi-brain fMRI embeddings outperform raw data in genre and topic tasks

    Low-dimensional Embodied Semantics for Music and Language

    Francisco Afonso Raposo +2

  33. cs.SD 2019-06-20 reviewed
    Adversarial training lowers error in music transcription

    Adversarial Learning for Improved Onsets and Frames Music Transcription

    Jong Wook Kim +1

  34. cs.SD 2019-06-20 reviewed
    Joint training boosts keyword spotting in noise

    A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting

    Yue Gu +3

  35. eess.AS 2019-06-20 reviewed
    DL enhances MELP codec parameters directly in noise

    Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment

    Min-Jae Hwang +1