arxiv: 2604.17632 · v1 · submitted 2026-04-19 · 💻 cs.IR

Recognition: unknown

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

Qingcheng Zeng , Yuheng Lu , Zeqi Zhou , Heli Qi , Puxuan Yu , Fuheng Zhao , Hitomi Yanaka , Weihao Xuan

show 1 more author

Naoto Yokoya

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:13 UTC · model grok-4.3

classification 💻 cs.IR

keywords code-switchinginformation retrievalmultilingual modelsembedding divergenceretrieval benchmarksCSR-LCS-MTEBperformance degradation

0 comments

The pith

Code-switching acts as a performance bottleneck for retrieval systems because mixed-language queries create large divergences in embedding spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that code-switching in queries harms retrieval effectiveness across statistical, dense, and late-interaction retrievers, even for strong multilingual models. Authors create a human-annotated dataset called CSR-L to test natural mixed-language cases and scale it to CS-MTEB covering eleven tasks. They trace the problem to measurable separation between pure-language and code-switched text in embedding space. Standard multilingual fixes such as vocabulary expansion do not close the gap. If correct, this means real-world search in bilingual settings underperforms what monolingual or clean multilingual tests predict.

Core claim

Code-switching is a fundamental performance bottleneck in information retrieval. Evaluations on the new CSR-L benchmark and the broader CS-MTEB show effectiveness drops of up to 27 percent for current models. The root cause is substantial divergence between the embeddings of pure-language text and code-switched text. Common multilingual techniques such as vocabulary expansion fail to resolve these deficits completely.

What carries the argument

The CSR-L human-annotated benchmark and the measured divergence in embedding space between pure and code-switched queries.

If this is right

Retrieval effectiveness on real global queries is lower than monolingual benchmarks indicate.
Vocabulary expansion and similar multilingual adaptations leave residual deficits in code-switched settings.
New model designs must target alignment of pure and mixed-language representations in embedding space.
Future IR systems need dedicated benchmarks like CS-MTEB to measure progress on mixed-language inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Search engines serving bilingual populations would gain from query rewriting or hybrid indexes that detect and handle switches explicitly.
The embedding divergence finding suggests similar hidden weaknesses may exist in other multilingual tasks such as question answering or summarization.
Synthetic code-switched data generated during pre-training could be tested as a direct mitigation strategy.
Performance gaps may widen further when code-switching involves low-resource language pairs not well represented in current training corpora.

Load-bearing premise

The human-annotated CSR-L queries reflect authentic natural code-switching and that embedding divergence is the primary driver of the observed performance drops rather than annotation artifacts or other factors.

What would settle it

A retrieval model trained to eliminate embedding divergence on mixed-language text that shows no effectiveness drop on CSR-L or CS-MTEB compared with pure-language queries would falsify the bottleneck claim.

Figures

Figures reproduced from arXiv: 2604.17632 by Fuheng Zhao, Heli Qi, Hitomi Yanaka, Naoto Yokoya, Puxuan Yu, Qingcheng Zeng, Weihao Xuan, Yuheng Lu, Zeqi Zhou.

**Figure 2.** Figure 2: The visualization of e5 and Qwen 0.6B embeddings on two IR datasets [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Code-switching is a pervasive linguistic phenomenon in global communication, yet modern information retrieval systems remain predominantly designed for, and evaluated within, monolingual contexts. To bridge this critical disconnect, we present a holistic study dedicated to code-switching IR. We introduce CSR-L (Code-Switching Retrieval benchmark-Lite), constructing a dataset via human annotation to capture the authentic naturalness of mixed-language queries. Our evaluation across statistical, dense, and late-interaction paradigms reveals that code-switching acts as a fundamental performance bottleneck, degrading the effectiveness of even robust multilingual models. We demonstrate that this failure stems from substantial divergence in the embedding space between pure and code-switched text. Scaling this investigation, we propose CS-MTEB, a comprehensive benchmark covering 11 diverse tasks, where we observe performance declines of up to 27%. Finally, we show that standard multilingual techniques like vocabulary expansion are insufficient to resolve these deficits completely. These findings underscore the fragility of current systems and establish code-switching as a crucial frontier for future IR optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds two new code-switching benchmarks and shows consistent retrieval drops tied to embedding divergence, but the causal link and dataset naturalness need more checks.

read the letter

The key point is that current retrievers lose ground on mixed-language queries, and the authors quantify this with new test sets plus an embedding-space analysis. They built CSR-L through human annotation to create realistic code-switched queries, ran evaluations across statistical, dense, and late-interaction models, and found clear performance hits. They then scaled up to CS-MTEB across 11 tasks and reported drops as large as 27 percent. Vocabulary expansion alone did not close the gap, which matches what many multilingual IR people have suspected but not measured this directly before. The embedding divergence observation is a concrete step beyond just reporting accuracy numbers. It gives a plausible mechanism for why the models struggle when languages mix inside the same query. That part of the work is straightforward and useful for anyone tuning embeddings or training data mixtures. The soft spots sit mainly in the dataset and the causal claim. Human annotation can still produce queries that feel off from real user behavior—switch points might be too clean or too frequent compared with natural search logs. The paper shows the embeddings separate, yet it does not run the obvious follow-up of closing that gap through targeted training and checking whether retrieval recovers. Without that, other factors like tokenization mismatches or simple data scarcity could explain part of the drop. The evaluations look solid on the surface, with multiple paradigms tested, but the abstract-level numbers leave room for questions about statistical significance and exact splits once the full tables are examined. This work is aimed at IR researchers who handle multilingual or global user traffic and want concrete benchmarks rather than another general multilingual paper. Anyone building production search for mixed-language regions will find the numbers and the new test collections worth looking at. It is worth sending to peer review because the benchmarks are fresh and the problem is real, even though the authors will likely need to strengthen the dataset validation and add a mechanism test before acceptance.

Referee Report

3 major / 2 minor

Summary. The paper introduces CSR-L, a human-annotated benchmark for code-switching retrieval queries, and evaluates statistical, dense, and late-interaction retrievers to show that code-switching creates a performance bottleneck via embedding-space divergence. It scales the analysis with CS-MTEB (11 tasks) reporting declines up to 27% and finds that vocabulary expansion fails to fully mitigate the deficits, positioning code-switching as a key challenge for multilingual IR.

Significance. If the central findings hold, the work provides valuable new benchmarks (CSR-L and CS-MTEB) and empirical evidence of model fragility in mixed-language settings, which could guide targeted improvements in multilingual embeddings and retrieval. The multi-paradigm evaluation and scale of the new benchmark are clear strengths that enable reproducible follow-up research.

major comments (3)

[Dataset construction] Dataset construction section: The claim that CSR-L captures 'authentic naturalness' of mixed-language queries rests on human annotation, but no inter-annotator agreement statistics, annotation guidelines, or comparison to naturally occurring code-switched queries (e.g., from social media corpora) are provided; without these, annotation artifacts cannot be ruled out as a contributor to the reported performance drops, which is load-bearing for the bottleneck conclusion.
[Embedding analysis] Embedding divergence analysis: The paper links retrieval degradation to 'substantial divergence in the embedding space' between pure and code-switched text, yet presents only correlational evidence (similarity metrics or visualizations) without an ablation that isolates or corrects the divergence (e.g., via fine-tuning on code-switched pairs) to test whether closing the gap restores performance; this leaves open alternative explanations such as tokenization mismatches or training-data scarcity.
[CS-MTEB evaluation] CS-MTEB results: The 'up to 27%' performance decline is reported across 11 tasks, but the manuscript does not specify per-task breakdowns, exact models evaluated, or statistical significance tests (e.g., paired t-tests or confidence intervals); without these details the consistency of the bottleneck claim across paradigms cannot be fully verified.

minor comments (2)

[Introduction and benchmarks] Clarify the exact definition and examples of code-switching types (e.g., intra-sentential vs. inter-sentential) used in both CSR-L and CS-MTEB to aid reproducibility.
[Conclusion] Add a limitations paragraph explicitly discussing potential domain shift between the annotated queries and real user code-switched searches.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments have helped us identify areas where additional clarity and evidence can strengthen the manuscript. We address each major comment below and indicate the revisions made.

read point-by-point responses

Referee: [Dataset construction] Dataset construction section: The claim that CSR-L captures 'authentic naturalness' of mixed-language queries rests on human annotation, but no inter-annotator agreement statistics, annotation guidelines, or comparison to naturally occurring code-switched queries (e.g., from social media corpora) are provided; without these, annotation artifacts cannot be ruled out as a contributor to the reported performance drops, which is load-bearing for the bottleneck conclusion.

Authors: We agree that these details strengthen the claims. In the revised manuscript we have added the complete annotation guidelines to Appendix A. We also report inter-annotator agreement statistics computed during dataset creation. Furthermore, we include a qualitative and quantitative comparison of switch-point distributions and language ratios between CSR-L and a sample of naturally occurring code-switched text from social media, demonstrating close alignment. These additions confirm that the observed retrieval drops are not attributable to annotation artifacts. revision: yes
Referee: [Embedding analysis] Embedding divergence analysis: The paper links retrieval degradation to 'substantial divergence in the embedding space' between pure and code-switched text, yet presents only correlational evidence (similarity metrics or visualizations) without an ablation that isolates or corrects the divergence (e.g., via fine-tuning on code-switched pairs) to test whether closing the gap restores performance; this leaves open alternative explanations such as tokenization mismatches or training-data scarcity.

Authors: The referee is correct that the primary evidence is correlational. In the revision we have expanded the analysis section to explicitly discuss alternative explanations, including tokenization mismatches and training-data scarcity, and provide supporting measurements that control for tokenization effects. A full ablation via fine-tuning on code-switched pairs lies beyond the scope of the current work due to computational cost and is noted as future research; however, the additional controls we present reinforce embedding divergence as a central factor in the performance bottleneck. revision: partial
Referee: [CS-MTEB evaluation] CS-MTEB results: The 'up to 27%' performance decline is reported across 11 tasks, but the manuscript does not specify per-task breakdowns, exact models evaluated, or statistical significance tests (e.g., paired t-tests or confidence intervals); without these details the consistency of the bottleneck claim across paradigms cannot be fully verified.

Authors: We thank the referee for highlighting this omission. The revised manuscript now contains a dedicated table with per-task results for all 11 CS-MTEB tasks, explicitly listing the models evaluated under each retrieval paradigm. We have also added paired t-tests together with 95% confidence intervals, confirming that the performance declines are statistically significant and consistent across statistical, dense, and late-interaction retrievers. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark construction with no circular derivations or self-referential reductions

full rationale

This is an empirical IR paper that introduces CSR-L through human annotation of mixed-language queries, evaluates statistical/dense/late-interaction retrievers on it, observes performance drops up to 27% on the expanded CS-MTEB benchmark, and notes that vocabulary expansion does not fully resolve issues. No mathematical equations, fitted parameters, or predictive models are presented that reduce by construction to the inputs. Claims about embedding divergence as a bottleneck are observational from the new data rather than derived via self-definition, self-citation chains, or renaming of prior results. The analysis is self-contained against external benchmarks and does not rely on load-bearing self-citations or ansatzes smuggled from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard IR evaluation assumptions and human annotation quality rather than new mathematical derivations or invented entities.

axioms (1)

domain assumption Human annotations of query relevance and naturalness are accurate and representative of real code-switched usage.
Invoked when constructing CSR-L and interpreting performance results.

pith-pipeline@v0.9.0 · 5506 in / 1229 out tokens · 36535 ms · 2026-05-10T05:13:57.938434+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 23 canonical work pages · 2 internal anchors

[1]

2025 , eprint=

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning , author=. 2025 , eprint=

2025
[2]

Robertson, Stephen and Zaragoza, Hugo , title =. Found. Trends Inf. Retr. , month = apr, pages =. 2009 , issue_date =. doi:10.1561/1500000019 , abstract =

work page doi:10.1561/1500000019 2009
[3]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
[4]

Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Litschko, Robert and Kraus, Oliver and Blaschke, Verena and Plank, Barbara. Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[5]

C ontrastive M ix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval

Do, Junggeun and Lee, Jaeseong and Hwang, Seung-won. C ontrastive M ix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.naacl...

work page doi:10.18653/v1/2024.naacl-short.17 2024
[6]

Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

Litschko, Robert and Artemova, Ekaterina and Plank, Barbara. Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.193

work page doi:10.18653/v1/2023.findings-acl.193 2023
[7]

M i LQ : Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

Kim, Jonghwi and Kang, Deokhyung and Hwang, Seonjeong and Kim, Yunsu and Ok, Jungseul and Lee, Gary. M i LQ : Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1153

work page doi:10.18653/v1/2025.emnlp-main.1153 2025
[8]

SimCSE: Simple Contrastive Learning of Sentence Embeddings , booktitle =

Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.552

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[9]

ColBERT: Efficient and effective passage search via con- textualized late interaction over bert

Khattab, Omar and Zaharia, Matei , title =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2020 , isbn =. doi:10.1145/3397271.3401075 , abstract =

work page doi:10.1145/3397271.3401075 2020
[10]

2026 , eprint=

MiMo-V2-Flash Technical Report , author=. 2026 , eprint=

2026
[11]

Evaluating Large Language Models for Cross-Lingual Retrieval

Zuo, Longfei and Hong, Pingjun and Kraus, Oliver and Plank, Barbara and Litschko, Robert. Evaluating Large Language Models for Cross-Lingual Retrieval. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.612

work page doi:10.18653/v1/2025.findings-emnlp.612 2025
[12]

2024 , eprint=

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise , author=. 2024 , eprint=

2024
[13]

2021 , eprint=

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author=. 2021 , eprint=

2021
[14]

Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation , pages =

Chanda, Supriya and Pal, Sukomal , title =. Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation , pages =. 2025 , isbn =. doi:10.1145/3734947.3735670 , abstract =

work page doi:10.1145/3734947.3735670 2025
[15]

The bilingualism reader / edited by Li Wei

Li, Wei , address =. The bilingualism reader / edited by Li Wei. , year =. The bilingualism reader , edition =
[16]

, title =

Ahmed, Yusuf M. , title =. Journal of International English Research Studies (JIERS) , volume =. 2024 , month =

2024
[17]

and Choudhury, Monojit and Rosso, Paolo , title =

Gupta, Parth and Bali, Kalika and Banchs, Rafael E. and Choudhury, Monojit and Rosso, Paolo , title =. Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval , pages =. 2014 , isbn =. doi:10.1145/2600428.2609622 , abstract =

work page doi:10.1145/2600428.2609622 2014
[18]

2021 , eprint=

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , author=. 2021 , eprint=

2021
[19]

2023 , eprint=

MTEB: Massive Text Embedding Benchmark , author=. 2023 , eprint=

2023
[20]

2025 , eprint=

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval , author=. 2025 , eprint=

2025
[21]

2025 , eprint=

MMTEB: Massive Multilingual Text Embedding Benchmark , author=. 2025 , eprint=

2025
[22]

Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part II , pages =

Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Samuel and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn , title =. Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part II , pag...

work page doi:10.1007/978-3-031-88711-6_19 2025
[23]

MINERS : Multilingual Language Models as Semantic Retrievers

Winata, Genta Indra and Zhang, Ruochen and Adelani, David Ifeoluwa. MINERS : Multilingual Language Models as Semantic Retrievers. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.155

work page doi:10.18653/v1/2024.findings-emnlp.155 2024
[24]

Sometimes I'll start a sentence in Spanish y termino en espa

Poplack, Shana , booktitle=. Sometimes I'll start a sentence in Spanish y termino en espa. 2020 , publisher=

2020
[25]

1997 , publisher=

Duelling languages: Grammatical structure in codeswitching , author=. 1997 , publisher=

1997
[26]

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

Pratapa, Adithya and Bhat, Gayatri and Choudhury, Monojit and Sitaram, Sunayana and Dandapat, Sandipan and Bali, Kalika. Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1143

work page doi:10.18653/v1/p18-1143 2018
[27]

Code-Switched Text Synthesis in Unseen Language Pairs

Hsu, I-Hung and Ray, Avik and Garg, Shubham and Peng, Nanyun and Huang, Jing. Code-Switched Text Synthesis in Unseen Language Pairs. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.318

work page doi:10.18653/v1/2023.findings-acl.318 2023
[28]

Overview of Touch \'e 2020: Argument Retrieval

Bondarenko, Alexander and Fr \"o be, Maik and Beloucif, Meriem and Gienapp, Lukas and Ajjour, Yamen and Panchenko, Alexander and Biemann, Chris and Stein, Benno and Wachsmuth, Henning and Potthast, Martin and Hagen, Matthias. Overview of Touch \'e 2020: Argument Retrieval. Experimental IR Meets Multilinguality, Multimodality, and Interaction. 2020

2020
[29]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021
[30]

2021 , eprint=

Searching for Scientific Evidence in a Pandemic: An Overview of TREC-COVID , author=. 2021 , eprint=

2021
[31]

F ollow IR : Evaluating and teaching information retrieval models to follow instructions

Weller, Orion and Chang, Benjamin and MacAvaney, Sean and Lo, Kyle and Cohan, Arman and Van Durme, Benjamin and Lawrie, Dawn and Soldaini, Luca. F ollow IR : Evaluating and Teaching Information Retrieval Models to Follow Instructions. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics...

work page doi:10.18653/v1/2025.naacl-long.597 2025
[32]

2024 , eprint=

Text Embeddings by Weakly-Supervised Contrastive Pre-training , author=. 2024 , eprint=

2024
[33]

2025 , eprint=

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. 2025 , eprint=

2025
[34]

2025 , eprint=

Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks , author=. 2025 , eprint=

2025
[35]

2024 , url=

SFR-Embedding-2: Advanced Text Embedding with Multi-stage Training , author=. 2024 , url=

2024
[36]

2025 , eprint=

jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking , author=. 2025 , eprint=

2025
[37]

2024 , eprint=

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation , author=. 2024 , eprint=

2024
[38]

2022 , eprint=

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction , author=. 2022 , eprint=

2022
[39]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =

Zeng, Qingcheng and Garay, Lucas and Zhou, Peilin and Chong, Dading and Hua, Yining and Wu, Jiageng and Pan, Yikang and Zhou, Han and Voigt, Rob and Yang, Jie , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/698 , abstract =

work page doi:10.24963/ijcai.2023/698 2023
[40]

Retrieval of the Best Counterargument without Prior Topic Knowledge

Wachsmuth, Henning and Syed, Shahbaz and Stein, Benno. Retrieval of the Best Counterargument without Prior Topic Knowledge. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1023

work page doi:10.18653/v1/p18-1023 2018
[41]

2021 , eprint=

CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims , author=. 2021 , eprint=

2021
[42]

Semi-supervised Question Retrieval with Gated Convolutions

Lei, Tao and Joshi, Hrishikesh and Barzilay, Regina and Jaakkola, Tommi and Tymoshenko, Kateryna and Moschitti, Alessandro and M \`a rquez, Llu \'i s. Semi-supervised Question Retrieval with Gated Convolutions. Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2...

work page doi:10.18653/v1/n16-1153 2016
[43]

S em E val-2015 Task 1: Paraphrase and Semantic Similarity in T witter ( PIT )

Xu, Wei and Callison-Burch, Chris and Dolan, Bill. S em E val-2015 Task 1: Paraphrase and Semantic Similarity in T witter ( PIT ). Proceedings of the 9th International Workshop on Semantic Evaluation ( S em E val 2015). 2015. doi:10.18653/v1/S15-2001

work page doi:10.18653/v1/s15-2001 2015
[44]

2025 , url=

MiMo-V2-Flash Technical Report , author=. 2025 , url=

2025
[45]

Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation

Wang, Xinyi and Ruder, Sebastian and Neubig, Graham. Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.61

work page doi:10.18653/v1/2022.acl-long.61 2022
[46]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[47]

2018 , eprint=

Word Translation Without Parallel Data , author=. 2018 , eprint=

2018
[48]

arXiv preprint arXiv:2509.00303 , year=

Access Paths for Efficient Ordering with Large Language Models , author=. arXiv preprint arXiv:2509.00303 , year=

work page arXiv
[49]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[50]

Multilingual E5 Text Embeddings: A Technical Report

Multilingual e5 text embeddings: A technical report , author=. arXiv preprint arXiv:2402.05672 , year=

work page internal anchor Pith review arXiv
[51]

V -Measure: A Conditional Entropy-Based External Cluster Evaluation Measure

Rosenberg, Andrew and Hirschberg, Julia. V -Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ( EMNLP - C o NLL ). 2007

2007