Extrapolation in NLP
classification
💻 cs.CL
keywords
extrapolationmodelstrainingargueattentioncapturedatadecomposable
read the original abstract
We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.