pith. sign in

arxiv: 1811.00403 · v2 · pith:VRLBLHJCnew · submitted 2018-11-01 · 💻 cs.CL · cs.LG· cs.SD· eess.AS

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

classification 💻 cs.CL cs.LGcs.SDeess.AS
keywords wordunsupervisedencdec-caeencoder-decoderspeechacousticautoencoderdiscovery
0
0 comments X
read the original abstract

We investigate unsupervised models that can map a variable-duration speech segment to a fixed-dimensional representation. In settings where unlabelled speech is the only available resource, such acoustic word embeddings can form the basis for "zero-resource" speech search, discovery and indexing systems. Most existing unsupervised embedding methods still use some supervision, such as word or phoneme boundaries. Here we propose the encoder-decoder correspondence autoencoder (EncDec-CAE), which, instead of true word segments, uses automatically discovered segments: an unsupervised term discovery system finds pairs of words of the same unknown type, and the EncDec-CAE is trained to reconstruct one word given the other as input. We compare it to a standard encoder-decoder autoencoder (AE), a variational AE with a prior over its latent embedding, and downsampling. EncDec-CAE outperforms its closest competitor by 24% relative in average precision on two languages in a word discrimination task.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.