pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

623 papers in eess.AS · page 12

  1. cs.CL 2019-07-09 reviewed
    Grounding models yield phonetic features for speech recognition

    Transfer Learning from Audio-Visual Grounding to Speech Recognition

    Wei-Ning Hsu +2

  2. cs.IR 2019-07-09 reviewed
    Attention improves accuracy for all 20 instruments on OpenMIC

    An Attention Mechanism for Musical Instrument Recognition

    Siddharth Gururani +2

  3. eess.AS 2019-07-09 reviewed
    Domain teachers train one student model to cut ASR errors by 10.4%

    Teach an all-rounder with experts in different domains

    Zhao You +2

  4. cs.CL 2019-07-09 reviewed
    Joint model cuts speaker diarization error to 2.2%

    Joint Speech Recognition and Speaker Diarization via Sequence Transduction

    Laurent El Shafey +2

  5. eess.AS 2019-07-08 reviewed
    Seq2seq ASR cuts WER 25% with speaker adaptation

    Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

    Felix Weninger +3

  6. eess.AS 2019-07-08 reviewed
    Cohort pruning enables private score normalisation in speaker recognition

    Privacy-Preserving Speaker Recognition with Cohort Score Normalisation

    Andreas Nautsch +7

  7. cs.CL 2019-07-07 reviewed
    Adversarial method cuts speech recognition errors 5-14 percent

    NIESR: Nuisance Invariant End-to-end Speech Recognition

    I-Hung Hsu +2

  8. cs.CV 2019-07-06 reviewed
    Shared-layer DNN beats early and late fusion on emotion CCC

    Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

    Juan D. S. Ortega +5

  9. eess.AS 2019-07-06 reviewed
    Autoencoder codebook lifts audio emotion prediction scores

    Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction

    Mohammed Senoussaoui +2

  10. cs.LG 2019-07-06 reviewed
    Activation maximization yields class-specific speech from DNNs

    Towards Debugging Deep Neural Networks by Generating Speech Utterances

    Bilal Soomro +3

  11. cs.CL 2019-07-06 reviewed
    17.55 hours of unlabelled Somali audio cut ASR error by 7.74%

    Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

    Astik Biswas +3

  12. cs.LG 2019-07-05 reviewed
    CNN learns delays to align speech with emotion labels

    Jointly Aligning and Predicting Continuous Emotion Annotations

    Soheil Khorram +2

  13. eess.AS 2019-07-05 reviewed
    WaveNet upsamples 8 kHz GSM speech near AMR-WB quality

    Speech bandwidth extension with WaveNet

    Archit Gupta +3

  14. eess.AS 2019-07-05 reviewed
    Deep learning adds controllable emotion to synthetic speech

    A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach

    No\'e Tits

  15. eess.AS 2019-07-05 reviewed
    Compensation protocol fixes Unity timing issues for AV research

    Synchronizing Audio-Visual Film Stimuli in Unity (version 5.5.1f1): Game Engines as a Tool for Research

    Javier Sanz +4

  16. cs.SD 2019-07-05 reviewed
    Transformer spots chords via adaptive attention segments

    A Bi-directional Transformer for Musical Chord Recognition

    Jonggwon Park +4

  17. eess.AS 2019-07-05 reviewed
    ResNet detects replays at 1.08% EER using perturbed group delay grams

    The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

    Weicheng Cai +3

  18. eess.AS 2019-07-04 reviewed
    Phoneme timestamps stabilize prosody transfer from unseen speakers

    Fine-grained robust prosody transfer for single-speaker neural text-to-speech

    Viacheslav Klimkov +3

  19. cs.LG 2019-07-04 reviewed
    Neural net turns any-length audio into full-pose lecture videos

    Lumi\`ereNet: Lecture Video Synthesis from Audio

    Byung-Hak Kim +1

  20. cs.SD 2019-07-04 reviewed
    Frame attention in convRNN sets ESC accuracy records

    Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

    Zhichao Zhang +4

  21. eess.AS 2019-07-04 reviewed
    DKU pipeline reaches 4.96% EER on distant speaker task

    The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge

    Danwei Cai +3

  22. eess.AS 2019-07-04 reviewed
    Multi-extractor speaker system reaches 0.392 and 0.494 detection costs

    The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation

    Danwei Cai +2

  23. cs.SD 2019-07-03 reviewed
    CNNs on cochlear features improve speech enhancement for implants

    Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

    Nursadul Mamun +2

  24. eess.AS 2019-07-03 reviewed
    High frame rates reduce ASR word error rates by up to 24%

    End-to-End Speech Recognition with High-Frame-Rate Features Extraction

    Cong-Thanh Do

  25. cs.SD 2019-07-03 reviewed
    CNN layers mirror classical audio features in instrument recognition

    A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features

    Olga Slizovskaia +2

  26. cs.LG 2019-07-03 reviewed
    Tuned receptive fields let ResNet beat VGG on audio scenes

    The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

    Khaled Koutini +3

  27. cs.CL 2019-07-03 reviewed
    Conditional net reaches 94.69% on Mandarin polyphone task

    Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features

    Zexin Cai +4

  28. cs.SD 2019-07-03 reviewed
    CNNs outperform dense nets on noisy user labels for VoIP audio

    Supervised Classifiers for Audio Impairments with Noisy Labels

    Chandan K A Reddy +2

  29. eess.AS 2019-07-02 reviewed
    Hierarchical VAE-GAN generates 136-beat melodies with form

    MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation

    Xia Liang +2

  30. eess.AS 2019-07-02 reviewed
    Sub-band CNN cuts spoken term classification compute by up to 49%

    Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification

    Chieh-Chi Kao +4

  31. eess.AS 2019-07-02 reviewed
    Decoding trick trains attention models to detect speech features end-to-end

    Attention model for articulatory features detection

    Ievgen Karaulov +1

  32. cs.SD 2019-07-02 reviewed
    Image context lifts UAV voice command accuracy despite noisy pairings

    Kite: Automatic speech recognition for unmanned aerial vehicles

    Dan Oneata +1

  33. cs.SD 2019-07-02 reviewed
    Robot reveals full room geometry from random start using sound

    Can a Robot Hear the Shape and Dimensions of a Room?

    Linh Nguyen +2

  34. cs.SD 2019-07-02 reviewed
    Speech separation gains hold up under real ambient noise

    WHAM!: Extending Speech Separation to Noisy Environments

    Gordon Wichern +7

  35. cs.MM 2019-07-02 reviewed
    Cognitive models plus multi-agent rules raise game music immersion

    Adaptive Music Composition for Games

    Patrick Hutchings +1

  36. eess.AS 2019-07-01 reviewed
    Two-word recombination enables real-time LSTM LVCSR decoding

    LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

    Eugen Beck +3

  37. eess.AS 2019-07-01 reviewed
    Distillation plus quantization shrinks AED models to 2% size

    Compression of Acoustic Event Detection Models With Quantized Distillation

    Bowen Shi +5

  38. cs.CL 2019-07-01 reviewed
    UltraSuite releases ultrasound data from child speech therapy

    UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

    Aciel Eshky +6

  39. cs.LG 2019-07-01 reviewed
    Disentangling flows organize synthesizer latent space

    Universal audio synthesizer control with normalizing flows

    Philippe Esling +4

  40. eess.AS 2019-07-01 reviewed
    TTS data and neural denorming cut numeric ASR WER by up to 8x

    Improving Performance of End-to-End ASR on Numeric Sequences

    Cal Peyser +3

  41. eess.AS 2019-07-01 reviewed
    GAN vocoder beats classical methods on perceptual scores

    Analysis by Adversarial Synthesis -- A Novel Approach for Speech Vocoding

    Ahmed Mustafa +4

  42. eess.AS 2019-07-01 reviewed
    Mean frame lifts speaker-independent ultrasound classification

    Speaker-independent classification of phonetic segments from raw ultrasound in child speech

    Manuel Sam Ribeiro +3

  43. cs.LG 2019-07-01 reviewed
    Cosine similarity degrades subsidiary models more efficiently than cross-entropy

    Cosine similarity-based adversarial process

    Hee-Soo Heo +4

  44. cs.CL 2019-06-30 reviewed
    ResNet yields better multilingual bottleneck features for spoken term detection

    Multilingual Bottleneck Features for Query by Example Spoken Term Detection

    Dhananjay Ram +2

  45. cs.CL 2019-06-30 reviewed
    Bi-directional network raises joint intent-slot accuracy

    A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling

    Haihong E +3

  46. cs.CL 2019-06-28 reviewed
    Voice embeddings cut expression detection error by 60%

    Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

    Vikramjit Mitra +8

  47. eess.SP 2019-06-28 reviewed
    Reflection paths extend image sources to curved boundaries

    An Image Source Method Framework for Arbitrary Reflecting Boundaries

    Pierre Quinton +2

  48. eess.AS 2019-06-28 reviewed
    Multi-view lip videos yield better speech from silence

    Lipper: Synthesizing Thy Speech using Multi-View Lipreading

    Yaman Kumar +5

  49. eess.AS 2019-06-27 reviewed
    SVD-PHAT cuts multi-source localization error by up to 0.0395 radians

    Multiple Sound Source Localization with SVD-PHAT

    Francois Grondin +1

  50. cs.IR 2019-06-27 reviewed
    Artist album track metadata trains music representations

    Representation Learning of Music Using Artist, Album, and Track Information

    Jongpil Lee +2