Recognition: 2 theorem links
· Lean TheoremSentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Pith reviewed 2026-05-10 14:47 UTC · model grok-4.3
The pith
Sentence-BERT uses siamese and triplet training on BERT to create fixed sentence embeddings that support fast cosine-similarity comparisons while matching original accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sentence-BERT modifies the pretrained BERT network by applying siamese and triplet network structures to derive semantically meaningful sentence embeddings. These embeddings can be compared using cosine similarity. The method reduces the computational cost of finding the most similar pair in a collection of 10,000 sentences from approximately 50 million inference computations with BERT to a few seconds with SBERT, while maintaining the accuracy achieved by the original BERT model on semantic textual similarity tasks.
What carries the argument
Siamese and triplet network structures applied to BERT for producing standalone sentence embeddings.
If this is right
- Semantic similarity search over large sentence collections becomes feasible in seconds rather than hours.
- Unsupervised tasks such as clustering become practical with BERT-derived embeddings.
- SBERT and SRoBERTa outperform prior state-of-the-art sentence embedding methods on standard STS benchmarks and transfer learning tasks.
- The same accuracy as full BERT pairwise inference is retained on sentence-pair regression tasks.
Where Pith is reading between the lines
- The same siamese training approach could be applied to other pretrained transformer models to generate efficient embeddings.
- Independent sentence embeddings may serve as a practical approximation for many semantic comparison tasks that originally required joint inference.
- Combining SBERT-style embeddings with domain-specific fine-tuning could further improve performance on specialized corpora without reintroducing pairwise computation costs.
Load-bearing premise
Fine-tuning BERT with siamese and triplet networks produces sentence embeddings whose cosine similarities accurately reflect semantic similarity at the level of the original pairwise BERT inference.
What would settle it
A held-out semantic textual similarity dataset where the ranking of sentence pairs by SBERT cosine similarity differs substantially from the ranking obtained by direct BERT pairwise inference on the same pairs.
read the original abstract
BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Sentence-BERT (SBERT), a modification of pre-trained BERT (and RoBERTa) that employs siamese and triplet network structures to produce fixed-length sentence embeddings. These embeddings can be compared efficiently via cosine similarity, reducing the cost of finding the most similar pair among 10,000 sentences from ~65 hours (pairwise BERT inference) to ~5 seconds while claiming to maintain BERT-level accuracy on semantic textual similarity (STS) tasks and to outperform prior sentence embedding methods on transfer learning tasks.
Significance. If the empirical claims hold, the work is significant because it makes contextualized transformer representations practical for large-scale semantic search, clustering, and retrieval pipelines that were previously infeasible due to quadratic inference costs. The approach has influenced subsequent efficient embedding research and provides a reproducible recipe for adapting pre-trained models to standalone sentence encoding.
major comments (2)
- [§3] §3 (SBERT Architecture): The central claim that siamese/triplet fine-tuning on NLI data produces embeddings whose cosine similarities recover the semantic judgments of BERT's joint [CLS] encoding is load-bearing for the 'maintaining the accuracy' assertion, yet the manuscript provides no ablation or diagnostic test on phenomena that rely on cross-sentence attention (e.g., negation scope, coreference resolution, or subtle entailment). A controlled comparison on such cases would be required to substantiate that independent encoding plus learned pooling fully compensates for the removed token-level interactions.
- [§4] §4 (Evaluation): The STS and transfer-task results are presented without reporting run-to-run variance, statistical significance tests, or direct side-by-side numbers for the original BERT/RoBERTa pairwise baseline on the identical splits and metrics; this weakens the quantitative support for the efficiency-accuracy tradeoff claim.
minor comments (2)
- [Abstract] Abstract: subject-verb agreement error ('BERT and RoBERTa has set') and subject-verb mismatch ('that use siamese').
- [§3] Notation: the pooling operation (mean/max/[CLS]) and the exact form of the triplet loss are described but not given explicit equations; adding numbered equations would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation of minor revision. We address each major comment below, clarifying our position and indicating changes to the manuscript where appropriate.
read point-by-point responses
-
Referee: [§3] §3 (SBERT Architecture): The central claim that siamese/triplet fine-tuning on NLI data produces embeddings whose cosine similarities recover the semantic judgments of BERT's joint [CLS] encoding is load-bearing for the 'maintaining the accuracy' assertion, yet the manuscript provides no ablation or diagnostic test on phenomena that rely on cross-sentence attention (e.g., negation scope, coreference resolution, or subtle entailment). A controlled comparison on such cases would be required to substantiate that independent encoding plus learned pooling fully compensates for the removed token-level interactions.
Authors: We agree that phenomena relying on cross-sentence attention represent an important test case for the claim that SBERT embeddings recover BERT-level semantic judgments. The NLI training data used for fine-tuning explicitly requires modeling entailment and contradiction relations, which frequently involve negation, coreference, and subtle semantic distinctions. The strong results on STS benchmarks, which contain many such examples, provide supporting evidence that the learned pooling and siamese objective capture the necessary information in the fixed embeddings. Nevertheless, the original manuscript did not include targeted diagnostic ablations or controlled comparisons isolating these phenomena. In the revised version we have added a paragraph in §3 discussing this point, along with qualitative examples illustrating SBERT's handling of negation and coreference in similarity tasks. A full controlled study would require new experiments outside the scope of the current work focused on efficient sentence encoding. revision: partial
-
Referee: [§4] §4 (Evaluation): The STS and transfer-task results are presented without reporting run-to-run variance, statistical significance tests, or direct side-by-side numbers for the original BERT/RoBERTa pairwise baseline on the identical splits and metrics; this weakens the quantitative support for the efficiency-accuracy tradeoff claim.
Authors: The BERT and RoBERTa pairwise numbers reported in the paper are taken directly from the same standard STS and transfer-task benchmarks and splits used in the original BERT/RoBERTa publications and subsequent leaderboard evaluations, enabling direct comparison on identical metrics. To strengthen the presentation, we have updated the evaluation section and tables to report run-to-run standard deviations (computed over five random seeds) for SBERT and SRoBERTa, and we have added paired statistical significance tests against the strongest baselines. The side-by-side BERT/RoBERTa figures already appear in Tables 1 and 2 using the same evaluation protocol. revision: yes
- A dedicated controlled ablation isolating cross-sentence attention phenomena (negation scope, coreference, subtle entailment) was not performed in the original experiments.
Circularity Check
No circularity: SBERT is an empirical fine-tuning method with external validation
full rationale
The paper describes a practical modification of BERT using siamese and triplet networks to produce fixed sentence embeddings for cosine similarity, followed by direct evaluation on STS and transfer tasks. No derivation chain exists that reduces a claimed result to its own inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or imported uniqueness theorems appear. The core claim rests on standard transfer learning from an external pre-trained model (BERT) and is tested against independent benchmarks, making the procedure self-contained rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption BERT's pre-trained representations can be adapted via siamese and triplet training to produce standalone sentence embeddings that preserve semantic information.
Forward citations
Cited by 60 Pith papers
-
A Unified Geometric Framework for Weighted Contrastive Learning
Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preser...
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
-
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
Benign fine-tuning on audio data breaks safety alignment in Audio LLMs by raising jailbreak success rates up to 87%, with the dominant risk axis depending on model architecture and embedding proximity to harmful content.
-
PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users
Preference fine-tuning outperforms prompting for personalisation but amplifies sycophancy and relationship-seeking, while simulated users recover aggregate rankings yet show far lower self-consistency and different to...
-
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deploy...
-
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
-
Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs
PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.
-
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
TabEmbed is the first generalist embedding model for tabular data that unifies classification and retrieval in one space via contrastive learning and outperforms text embedding models on the new TabBench benchmark.
-
Automated Large-scale CVRP Solver Design via LLM-assisted Flexible MCTS
LaF-MCTS uses LLM-assisted flexible MCTS with a three-tier hierarchy, semantic pruning, and branch regrowth to automatically compose decomposition-enhanced CVRP solvers that outperform state-of-the-art methods on CVRP...
-
ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
ContextualJailbreak uses evolutionary search over simulated primed dialogues with novel mutations to reach 90-100% attack success on open LLMs and transfers to some closed frontier models at 15-90% rates.
-
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates
RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than p...
-
Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models
XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning...
-
Similar Users-Augmented Interest Network
SUIN improves CTR prediction by augmenting target user sequences with similar users' behaviors via embedding-based retrieval, user-specific position encoding, and user-aware target attention.
-
Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems
PUDA enables effective promotion of unpopular target items in black-box LLM sequential recommenders by using evolutionary LLM refinement to infer hidden prompts, training a surrogate model, and combining adversarial t...
-
R2Code: A Self-Reflective LLM Framework for Requirements-to-Code Traceability
R2Code improves requirement-to-code traceability with a bidirectional alignment network, self-reflective consistency verification, and dynamic context-adaptive retrieval, yielding 7.4% average F1 gain and up to 41.7% ...
-
Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.
-
Semantic Recall for Vector Search
Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.
-
HumanScore: Benchmarking Human Motions in Generated Videos
HumanScore defines six metrics for kinematic plausibility, temporal stability, and biomechanical consistency to benchmark human motions in videos from thirteen state-of-the-art generation models, revealing gaps betwee...
-
LLM-Viterbi: Semantic-Aware Decoding for Convolutional Codes
An LLM-enhanced Viterbi decoder achieves roughly 1.5 dB extra coding gain in block error rate and over 50% better semantic similarity than conventional Viterbi for constraint-length-3 convolutional codes on AWGN channels.
-
DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion
Adaptive trie-guided decoding with document context and tunable penalties improves in-document query auto-completion, outperforming baselines and larger models like LLaMA-3 on seen queries.
-
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval
BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical b...
-
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
-
Efficient Personalization of Generative User Interfaces
A dataset revealing high inter-designer disagreement on UI preferences motivates a sample-efficient method that personalizes generative interfaces by embedding new users in the space of prior designers, outperforming ...
-
Skill-Conditioned Visual Geolocation for Vision-Language Models
GeoSkill uses an evolving Skill-Graph initialized from expert trajectories and grown via autonomous analysis of successful and failed reasoning rollouts to boost geolocation accuracy, faithfulness, and generalization ...
-
Skill-Conditioned Visual Geolocation for Vision-Language Models
GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
-
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
HyPE detects harmful prompts as outliers in hyperbolic space and HyPS sanitizes them using explainable attribution, outperforming prior defenses in accuracy and robustness across datasets and adversarial scenarios.
-
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis
LLM4Log is a systematic review of 145 papers on LLM-based log analysis that delivers a unified taxonomy, design patterns, and open challenges for reliable adoption in AIOps.
-
WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain
WorkRB is the first open community-driven benchmark for AI in the work domain, organizing 13 tasks from 7 groups with dynamic multilingual ontology loading and modular design for proprietary task integration.
-
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual,...
-
C-Pack: Packed Resources For General Chinese Embeddings
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
-
Steering Language Models With Activation Engineering
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework lets agents learn when and how to ask probing questions in U.S. Supreme Court arguments, outperforming baselines on a court dataset.
-
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
-
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox
ComplexMCP benchmark shows current LLM agents achieve at most 60% success on interdependent tool tasks versus 90% for humans, due to tool retrieval saturation, over-confidence, and strategic defeatism.
-
Sanity Checks for Long-Form Hallucination Detection
Hallucination detectors on LLM reasoning traces often rely on final-answer artifacts rather than reasoning validity; once controlled, lightweight lexical trajectory features suffice for robust detection.
-
WeatherSyn: An Instruction Tuning MLLM For Weather Forecasting Report Generation
WeatherSyn is the first instruction-tuned MLLM for weather forecasting report generation, outperforming closed-source models on a new dataset of 31 US cities across 8 weather aspects.
-
Structural Rationale Distillation via Reasoning Space Compression
D-RPC compresses reasoning into a dynamic bank of reusable paths to produce consistent teacher rationales, outperforming standard distillation baselines on five reasoning benchmarks while using fewer tokens.
-
RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation
RRCM trains an LLM to dynamically retrieve from collaborative and meta memories using group relative policy optimization driven by final top-k recommendation quality.
-
Query-efficient model evaluation using cached responses
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
-
On the Role of Language Representations in Auto-Bidding: Findings and Implications
SemBid injects LLM-encoded Task, History, and Strategy semantics as tokens into offline bidding trajectories and uses self-attention to outperform numerical-only baselines in performance, constraint satisfaction, and ...
-
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
PersonaTeaming Workflow improves automated red-teaming attack success rates over RainbowPlus using personas while maintaining diversity, and PersonaTeaming Playground supports human-AI collaboration in red-teaming as ...
-
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while r...
-
Anticipating Innovation Using Large Language Models
TechToken uses transformer embeddings of IPC codes to measure linguistic convergence in patents and predict future technological combinations.
-
Revisiting Graph-Tokenizing Large Language Models: A Systematic Evaluation of Graph Token Understanding
GTokenLLMs do not fully understand graph tokens, exhibiting over-sensitivity or insensitivity to instruction changes and relying heavily on text for reasoning even when graph information is preserved.
-
RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions
RECAP captures, replays, and analyzes AI-assisted programming sessions by linking prompts, edits, and developer actions in a single timeline.
-
A Replicability Study of XTR
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
-
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
-
Make Any Collection Navigable: Methods for Constructing and Evaluating Hypergraph of Text
Methods for constructing Hypergraphs of Text are proposed with a new effort ratio metric where TF-IDF baselines match LLM methods in experiments.
-
LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images
LatentDiff scales semantic dataset comparison to millions of images using latent spaces of vision encoders combined with sparse autoencoders and density ratio estimation, showing better accuracy and robustness than ca...
-
MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining
MIPIC trains nested Matryoshka representations via self-distilled intra-relational alignment with top-k CKA and progressive information chaining across depths, yielding competitive performance especially at extreme lo...
-
When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs
Hallucinations in LVLMs largely arise from textual priors in prompts, and can be reduced by fine-tuning with preference optimization on grounded vs. hallucinated response pairs.
-
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
-
Text Steganography with Dynamic Codebook and Multimodal Large Language Model
A black-box text steganography method using a dynamic codebook generated by multimodal LLMs and reject-sampling feedback achieves higher embedding capacity and text quality than prior white-box and fixed-codebook blac...
-
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest
LLMs show mixed results on authorship verification, post generation, and attribute inference from Twitter data, with new frameworks and user studies establishing benchmarks for these analytics tasks.
-
Reasoning Structure Matters for Safety Alignment of Reasoning Models
Changing the internal reasoning structure of large reasoning models through simple supervised fine-tuning on 1K examples produces strong safety alignment that generalizes across tasks and languages.
-
HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents
HiGMem combines hierarchical event-turn memory with LLM-guided selection to retrieve concise relevant evidence from long dialogues, improving F1 scores and cutting retrieved turns by an order of magnitude on the LoCoM...
-
Identifying Ethical Biases in Action Recognition Models
The authors create a synthetic video auditing framework that detects statistically significant skin color biases in popular human action recognition models even when actions are identical.
-
DuConTE: Dual-Granularity Text Encoder with Topology-Constrained Attention for Text-attributed Graphs
DuConTE is a dual-granularity text encoder that incorporates graph topology into language model attention for improved node representations in text-attributed graphs.
-
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning
REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
-
Lorentz Framework for Semantic Segmentation
A Lorentz-model hyperbolic framework for semantic segmentation that integrates with Euclidean networks, provides free uncertainty maps, and is validated on ADE20K, COCO-Stuff, Pascal-VOC and Cityscapes using DeepLabV3...
Reference graph
Works this paper leans on
-
[1]
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. http://www.aclweb.org/anthology/S15-2045 SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability . In Procee...
work page 2015
-
[2]
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. https://doi.org/10.3115/v1/S14-2010 S em E val-2014 Task 10: Multilingual Semantic Textual Similarity . In Proceedings of the 8th International Workshop on Semantic Evaluation ( S em E val 2014) , pages ...
-
[3]
Eneko Agirre, Carmen Banea, Daniel M. Cer, Mona T. Diab, Aitor Gonzalez - Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. http://aclweb.org/anthology/S/S16/S16-1081.pdf SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation . In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@...
work page 2016
-
[4]
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. https://www.aclweb.org/anthology/S13-1004 * SEM 2013 shared task: Semantic Textual Similarity . In Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity , pages 3...
work page 2013
-
[5]
Eneko Agirre, Mona Diab, Daniel Cer, and Aitor Gonzalez-Agirre. 2012. http://dl.acm.org/citation.cfm?id=2387636.2387697 SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity . In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedin...
-
[6]
and Angeli, Gabor and Potts, Christopher and Manning, Christopher D
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. https://doi.org/10.18653/v1/D15-1075 A large annotated corpus for learning natural language inference . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632--642, Lisbon, Portugal. Association for Computational Linguistics
-
[7]
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. 2017. http://arxiv.org/abs/1708.00055 SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1--14, Vancouver, Canada
work page Pith review arXiv 2017
-
[8]
Daniel Cer, Yinfei Yang, Sheng - yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo - Cespedes, Steve Yuan, Chris Tar, Yun - Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. http://arxiv.org/abs/1803.11175 Universal Sentence Encoder . arXiv preprint arXiv:1803.11175
work page Pith review arXiv 2018
- [9]
-
[10]
Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo\" i c Barrault, and Antoine Bordes. 2017. https://www.aclweb.org/anthology/D17-1070 Supervised Learning of Universal Sentence Representations from Natural Language Inference Data . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670--680, Copenhagen, Denmark. ...
work page 2017
-
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. https://arxiv.org/abs/1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . arXiv preprint arXiv:1810.04805
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
Bill Dolan, Chris Quirk, and Chris Brockett. 2004. https://doi.org/10.3115/1220355.1220406 Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources . In Proceedings of the 20th International Conference on Computational Linguistics, COLING '04, Stroudsburg, PA, USA. Association for Computational Linguistics
-
[13]
Liat Ein Dor, Yosi Mass, Alon Halfon, Elad Venezian, Ilya Shnayderman, Ranit Aharonov, and Noam Slonim. 2018. https://doi.org/10.18653/v1/P18-2009 Learning Thematic Similarity Metric from Article Sections Using Triplet Networks . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 49--...
-
[14]
Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. https://doi.org/10.18653/v1/N16-1162 Learning Distributed Representations of Sentences from Unlabelled Data . In Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 1367--1377, San Diego, California. Assoc...
-
[15]
Minqing Hu and Bing Liu. 2004. https://doi.org/10.1145/1014052.1014073 Mining and Summarizing Customer Reviews . In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 168--177, New York, NY, USA. ACM
- [16]
-
[17]
Jeff Johnson, Matthijs Douze, and Herv \'e J \'e gou. 2017. https://arxiv.org/abs/1702.08734 Billion-scale similarity search with GPUs . arXiv preprint arXiv:1702.08734
work page Pith review arXiv 2017
-
[18]
Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. http://papers.nips.cc/paper/5950-skip-thought-vectors.pdf Skip-Thought Vectors . In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3294--3302. Curra...
work page 2015
-
[19]
Xin Li and Dan Roth. 2002. https://doi.org/10.3115/1072228.1072378 Learning Question Classifiers . In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1, COLING '02, pages 1--7, Stroudsburg, PA, USA. Association for Computational Linguistics
-
[20]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. http://arxiv.org/abs/1907.11692 RoBERTa: A Robustly Optimized BERT Pretraining Approach . arXiv preprint arXiv:1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[21]
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf A SICK cure for the evaluation of compositional distributional semantic models . In Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC ' ...
work page 2014
-
[22]
Bowman and Rachel Rudinger , title =
Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. 2019. http://arxiv.org/abs/1903.10561 On Measuring Social Biases in Sentence Encoders . arXiv preprint arXiv:1903.10561
-
[23]
Amita Misra, Brian Ecker, and Marilyn A. Walker. 2016. http://aclweb.org/anthology/W/W16/W16-3636.pdf Measuring the Similarity of Sentential Arguments in Dialogue . In Proceedings of the SIGDIAL 2016 Conference, The 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 13-15 September 2016, Los Angeles, CA, USA , pages 276--287
work page 2016
-
[24]
Bo Pang and Lillian Lee. 2004. https://doi.org/10.3115/1218955.1218990 A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts . In Proceedings of the 42nd Meeting of the Association for Computational Linguistics ( ACL ' 04), Main Volume , pages 271--278, Barcelona, Spain
-
[25]
Bo Pang and Lillian Lee. 2005. https://doi.org/10.3115/1219840.1219855 Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales . In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics ( ACL ' 05) , pages 115--124, Ann Arbor, Michigan. Association for Computational Linguistics
-
[26]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. https://www.aclweb.org/anthology/D14-1162 GloVe: Global Vectors for Word Representation . In Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543
work page 2014
- [27]
-
[28]
Nils Reimers, Philip Beyer, and Iryna Gurevych. 2016. https://www.aclweb.org/anthology/C16-1009 Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity . In Proceedings of the 26th International Conference on Computational Linguistics (COLING), pages 87--96
work page 2016
- [29]
-
[30]
Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. https://www.aclweb.org/anthology/P19-1054 Classification and Clustering of Arguments with Contextualized Word Embeddings . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 567--578, Florence, Italy....
work page 2019
- [31]
-
[32]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. https://www.aclweb.org/anthology/D13-1170 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank . In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631--1642, Seattle...
work page 2013
-
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Attention is All you Need . In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information P...
work page 2017
-
[34]
Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. https://doi.org/10.1007/s10579-005-7880-9 Annotating Expressions of Opinions and Emotions in Language . Language Resources and Evaluation, 39(2):165--210
-
[35]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. http://aclweb.org/anthology/N18-1101 A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112--...
work page 2018
-
[36]
Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-Yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. https://www.aclweb.org/anthology/W18-3022 Learning Semantic Textual Similarity from Conversations . In Proceedings of The Third Workshop on Representation Learning for NLP , pages 164--174, Melbourne, Australia. A...
work page 2018
-
[37]
Xlnet: Generalized autoregressive pretraining for language understanding
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. http://arxiv.org/abs/1906.08237 XLNet: Generalized Autoregressive Pretraining for Language Understanding . arXiv preprint arXiv:1906.08237, abs/1906.08237
-
[38]
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2019. http://arxiv.org/abs/1904.09675 BERTScore: Evaluating Text Generation with BERT . arXiv preprint arXiv:1904.09675
work page internal anchor Pith review arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.