hub

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models , url =

· 2022 · DOI 10.18653/v1/2022.findings-acl.146

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open at publisher browse 12 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Embeddings, Except In Heavy Truncation Scenarios

cs.LG · 2026-05-15 · conditional · novelty 6.0

Text embeddings are robust to truncation without MRL except when reducing size by at least 80%.

From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space

cs.IR · 2026-04-28 · unverdicted · novelty 6.0

GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.

Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer

cs.IR · 2026-02-27 · unverdicted · novelty 6.0

SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.

Nomic Embed: Training a Reproducible Long Context Text Embedder

cs.CL · 2024-02-02 · conditional · novelty 6.0

Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.

Testing the Assumptions of Active Learning for Translation Tasks with Few Samples

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.

Search-R3: Unifying Reasoning and Embedding in Large Language Models

cs.CL · 2025-10-08 · unverdicted · novelty 5.0

Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.

Multilingual E5 Text Embeddings: A Technical Report

cs.CL · 2024-02-08 · unverdicted · novelty 5.0

Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.

Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

cs.CL · 2024-01-07 · unverdicted · novelty 5.0

Data-CUBE applies a two-level curriculum (TSP-based task ordering via simulated annealing plus difficulty-sorted mini-batches) to multi-task instruction tuning and reports gains on MTEB sentence representation tasks.

Towards General Text Embeddings with Multi-stage Contrastive Learning

cs.CL · 2023-08-07 · unverdicted · novelty 5.0

GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.

Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data

cs.CL · 2026-04-20 · conditional · novelty 4.0

Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting Recall@50 from 68.89% to 77.55% and Recall@200 from 0.5969 to 0.7047.

citing papers explorer

Showing 12 of 12 citing papers.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 79 · 2 links
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 81
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Embeddings, Except In Heavy Truncation Scenarios cs.LG · 2026-05-15 · conditional · none · ref 34
Text embeddings are robust to truncation without MRL except when reducing size by at least 80%.
From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space cs.IR · 2026-04-28 · unverdicted · none · ref 34
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baselines on benchmarks and industrial data.
Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer cs.IR · 2026-02-27 · unverdicted · none · ref 19
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.
Nomic Embed: Training a Reproducible Long Context Text Embedder cs.CL · 2024-02-02 · conditional · none · ref 90
Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples cs.CL · 2026-04-10 · unverdicted · none · ref 18
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 50
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
Multilingual E5 Text Embeddings: A Technical Report cs.CL · 2024-02-08 · unverdicted · none · ref 37
Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning cs.CL · 2024-01-07 · unverdicted · none · ref 35
Data-CUBE applies a two-level curriculum (TSP-based task ordering via simulated annealing plus difficulty-sorted mini-batches) to multi-task instruction tuning and reports gains on MTEB sentence representation tasks.
Towards General Text Embeddings with Multi-stage Contrastive Learning cs.CL · 2023-08-07 · unverdicted · none · ref 122
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data cs.CL · 2026-04-20 · conditional · none · ref 16
Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting Recall@50 from 68.89% to 77.55% and Recall@200 from 0.5969 to 0.7047.

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models , url =

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer