hub

S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Lucia Specia · 2017 · DOI 10.18653/v1/s17-2001

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

open at publisher browse 20 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.

PRIMETIME : Limits of LLMs in Temporal Primitives

cs.NE · 2025-04-22 · unverdicted · novelty 7.0

PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

cs.CV · 2024-10-07 · conditional · novelty 7.0

VLM2Vec converts state-of-the-art vision-language models into universal multimodal embedders via contrastive training on the new MMEB benchmark, delivering 10-20% absolute gains over prior models on both in-distribution and out-of-distribution tasks.

Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer

cs.CL · 2024-08-02 · unverdicted · novelty 7.0

Task prompt vectors, formed by subtracting random initialization from tuned soft prompts, support low-resource initialization and arithmetic combination across tasks on 12 NLU datasets while remaining independent of initialization seed on two model architectures.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

cs.CR · 2026-05-06 · conditional · novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.

PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark

cs.CL · 2025-11-26 · unverdicted · novelty 6.0

PEFT-Bench is a standardized end-to-end benchmark for 7 PEFT methods across 27 NLP datasets on autoregressive LLMs, accompanied by the PSCP metric that penalizes based on trainable parameters, inference speed, and training memory.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

cs.LG · 2024-12-26 · unverdicted · novelty 6.0

SyMerge merges models via single-layer adaptation and expert-guided self-labeling to achieve task synergy, reporting SOTA results on vision, dense prediction, and NLP tasks.

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

cs.CL · 2024-05-27 · accept · novelty 6.0

NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

cs.CL · 2023-05-23 · conditional · novelty 6.0

UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

cs.CL · 2019-05-02 · accept · novelty 6.0

SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.

PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling

cs.CL · 2026-05-19 · unverdicted · novelty 5.0 · 2 refs

PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.

A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Re-evaluating controlled text generation systems under standardized conditions reveals that many published performance claims do not hold, highlighting the need for consistent evaluation practices.

Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Augmenting commonsense knowledge corpora with negation produces over 2M new triples that benefit LLM negation understanding when used for pre-training.

PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models

cs.CL · 2025-12-02 · unverdicted · novelty 5.0

PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.

Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

cs.CL · 2024-01-07 · unverdicted · novelty 5.0

Data-CUBE applies a two-level curriculum (TSP-based task ordering via simulated annealing plus difficulty-sorted mini-batches) to multi-task instruction tuning and reports gains on MTEB sentence representation tasks.

citing papers explorer

Showing 20 of 20 citing papers.

SimCSE: Simple Contrastive Learning of Sentence Embeddings cs.CL · 2021-04-18 · conditional · none · ref 72
SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring cs.CL · 2026-04-20 · unverdicted · none · ref 42
LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.
PRIMETIME : Limits of LLMs in Temporal Primitives cs.NE · 2025-04-22 · unverdicted · none · ref 71
PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks cs.CV · 2024-10-07 · conditional · none · ref 6
VLM2Vec converts state-of-the-art vision-language models into universal multimodal embedders via contrastive training on the new MMEB benchmark, delivering 10-20% absolute gains over prior models on both in-distribution and out-of-distribution tasks.
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer cs.CL · 2024-08-02 · unverdicted · none · ref 6
Task prompt vectors, formed by subtracting random initialization from tuned soft prompts, support low-resource initialization and arithmetic combination across tasks on 12 NLU datasets while remaining independent of initialization seed on two model architectures.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 9
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation cs.CL · 2026-05-12 · unverdicted · none · ref 24
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference cs.CR · 2026-05-06 · conditional · none · ref 63
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 15
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark cs.CL · 2025-11-26 · unverdicted · none · ref 10
PEFT-Bench is a standardized end-to-end benchmark for 7 PEFT methods across 27 NLP datasets on autoregressive LLMs, accompanied by the PSCP metric that penalizes based on trainable parameters, inference speed, and training memory.
HyperAdapt: Simple High-Rank Adaptation cs.LG · 2025-09-23 · unverdicted · none · ref 6
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation cs.LG · 2024-12-26 · unverdicted · none · ref 2
SyMerge merges models via single-layer adaptation and expert-guided self-labeling to achieve task synergy, reporting SOTA results on vision, dense prediction, and NLP tasks.
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models cs.CL · 2024-05-27 · accept · none · ref 148
NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 169
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems cs.CL · 2019-05-02 · accept · none · ref 88
SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling cs.CL · 2026-05-19 · unverdicted · none · ref 58 · 2 links
PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.
A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles cs.CL · 2026-05-12 · unverdicted · none · ref 7
Re-evaluating controlled text generation systems under standardized conditions reveals that many published performance claims do not hold, highlighting the need for consistent evaluation practices.
Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding cs.CL · 2026-04-21 · unverdicted · none · ref 39
Augmenting commonsense knowledge corpora with negation produces over 2M new triples that benefit LLM negation understanding when used for pre-training.
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models cs.CL · 2025-12-02 · unverdicted · none · ref 15
PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning cs.CL · 2024-01-07 · unverdicted · none · ref 11
Data-CUBE applies a two-level curriculum (TSP-based task ordering via simulated annealing plus difficulty-sorted mini-batches) to multi-task instruction tuning and reports gains on MTEB sentence representation tasks.

S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer