pith. machine review for the scientific record. sign in

arxiv: 1907.04307 · v1 · submitted 2019-07-09 · 💻 cs.CL

Recognition: unknown

Multilingual Universal Sentence Encoder for Semantic Retrieval

Authors on Pith no claims yet
classification 💻 cs.CL
keywords modelsretrievalsemanticsentenceenglishmultilingualperformancetasks
0
0 comments X
read the original abstract

We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using a multi-task trained dual-encoder that learns tied representations using translation based bridge tasks (Chidambaram al., 2018). The models provide performance that is competitive with the state-of-the-art on: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On English transfer learning tasks, our sentence-level embeddings approach, and in some cases exceed, the performance of monolingual, English only, sentence embedding models. Our models are made available for download on TensorFlow Hub.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

    cs.IR 2026-04 unverdicted novelty 7.0

    An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.

  2. OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

    cs.LG 2026-05 unverdicted novelty 6.0

    OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.