arxiv: 1506.06726 · v1 · pith:LD2DOP5Pnew · submitted 2015-06-22 · 💻 cs.CL · cs.LG

Skip-Thought Vectors

Ryan Kiros , Yukun Zhu , Ruslan Salakhutdinov , Richard S. Zemel , Antonio Torralba , Raquel Urtasun , Sanja Fidler This is my paper

classification 💻 cs.CL cs.LG

keywords encodergenericmodelrepresentationssemanticsentencesentencestraining

0 comments

read the original abstract

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets. The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice. We will make our encoder publicly available.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scaling Laws and Interpretability of Learning from Repeated Data
cs.LG 2022-05 accept novelty 6.0

Repeating 0.1% of training data 100 times degrades an 800M parameter model's performance to that of a 400M model by damaging copying mechanisms and induction heads associated with generalization.