EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
Language models are universal embedders
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.CL 3representative citing papers
Causal2Vec prepends a BERT-generated contextual token to decoder-only LLMs and pools its hidden state with the EOS token to reach new SOTA on MTEB among public-data-trained embedding models.
Data-CUBE applies a two-level curriculum (TSP-based task ordering via simulated annealing plus difficulty-sorted mini-batches) to multi-task instruction tuning and reports gains on MTEB sentence representation tasks.
citing papers explorer
-
Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token
Causal2Vec prepends a BERT-generated contextual token to decoder-only LLMs and pools its hidden state with the EOS token to reach new SOTA on MTEB among public-data-trained embedding models.