Recognition: 1 theorem link
· Lean TheoremSwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents
Pith reviewed 2026-05-17 00:47 UTC · model grok-4.3
The pith
A new benchmark shows that LLMs and encoder models perform poorly on token-level semantic difference recognition in cross-lingual documents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce SwissGov-RSD as the first naturalistic, document-level, cross-lingual dataset for semantic difference recognition, consisting of 224 multi-parallel documents across three language pairs with token-level human annotations, and show that both LLMs and encoder models achieve considerably lower performance on this benchmark than on monolingual, sentence-level, and synthetic alternatives.
What carries the argument
SwissGov-RSD, the human-annotated dataset of multi-parallel government documents that supplies token-level labels for semantic differences across languages.
If this is right
- Text generation evaluation metrics will need to incorporate document-level cross-lingual checks to remain reliable.
- Content alignment systems for multilingual corpora must be validated on naturalistic data rather than synthetic or monolingual proxies.
- Model training objectives should target token-level semantic distinctions that appear only when documents are compared across languages.
- Benchmarking practices for LLMs should include cross-lingual document pairs to avoid overestimating readiness for practical use.
Where Pith is reading between the lines
- The same annotation approach could be applied to other language pairs or domains to test whether the performance gap is specific to government text or more general.
- Closing the gap on this benchmark might directly improve the quality of multilingual summarization and translation quality estimation.
- Future work could explore whether the dataset's structure supports new pre-training signals that emphasize cross-lingual semantic invariance at the token level.
Load-bearing premise
Human annotators can reliably and consistently identify all meaningful token-level semantic differences, and the chosen Swiss government documents represent typical cross-lingual variation in real documents.
What would settle it
A model or training procedure that reaches performance levels comparable to its monolingual or sentence-level results when evaluated on the SwissGov-RSD test set would falsify the claimed performance gap.
Figures
read the original abstract
Recognizing semantic differences across documents is crucial for text generation evaluation and content alignment, especially in cross-lingual settings. However, as a standalone task, it has received little attention. We address this by introducing SwissGov-RSD, the first naturalistic, document-level, cross-lingual dataset for semantic difference recognition. It encompasses a total of 224 multi-parallel documents in English--German, English--French, and English--Italian with token-level difference annotations by human annotators. We evaluate a variety of open-source and closed-source large language models as well as encoder models across different fine-tuning settings on this new benchmark. Our results show that current automatic approaches perform poorly compared to their performance on monolingual, sentence-level, and synthetic benchmarks, revealing a considerable gap for both LLMs and encoder models. We make our code and dataset publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SwissGov-RSD, the first naturalistic document-level cross-lingual benchmark for token-level semantic difference recognition. It consists of 224 multi-parallel Swiss government documents in English--German, English--French, and English--Italian pairs, with human token-level difference annotations. The authors evaluate open- and closed-source LLMs plus encoder models across fine-tuning settings and report that current automatic approaches perform substantially worse than on monolingual, sentence-level, or synthetic benchmarks, revealing a considerable gap.
Significance. If the annotations are reliable, the work provides a valuable new resource for an underexplored task relevant to text generation evaluation and cross-lingual content alignment. The public release of the dataset and code is a clear strength that supports reproducibility and future model development. The empirical finding of a performance gap on naturalistic data could usefully guide research priorities, provided the gold-standard quality is established.
major comments (2)
- [§3] §3 (Dataset Construction and Annotation): No inter-annotator agreement statistics, annotation guidelines, or disagreement-resolution procedure are reported. This is load-bearing for the central claim, because the reported performance gap for LLMs and encoders is interpreted as evidence that models struggle with naturalistic cross-lingual semantic differences; without IAA or a validation subset, low scores could partly reflect annotation noise rather than model shortcomings.
- [§4] §4 (Experiments and Results): The cross-benchmark comparison (monolingual/synthetic vs. SwissGov-RSD) does not control for document length, domain specificity, or annotation granularity differences. This weakens the attribution of the gap specifically to the cross-lingual naturalistic setting.
minor comments (2)
- [Table 1] Table 1 or equivalent: Clarify the exact distribution of documents across the three language pairs and the total number of annotated tokens to allow readers to assess scale immediately.
- [§2] §2 (Related Work): A brief discussion of how token-level semantic difference annotation differs from standard semantic textual similarity or entailment tasks would help readers appreciate the novelty.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on SwissGov-RSD. The comments highlight important aspects of annotation reliability and comparative analysis that we address below. We provide point-by-point responses to the major comments.
read point-by-point responses
-
Referee: [§3] §3 (Dataset Construction and Annotation): No inter-annotator agreement statistics, annotation guidelines, or disagreement-resolution procedure are reported. This is load-bearing for the central claim, because the reported performance gap for LLMs and encoders is interpreted as evidence that models struggle with naturalistic cross-lingual semantic differences; without IAA or a validation subset, low scores could partly reflect annotation noise rather than model shortcomings.
Authors: We agree that inter-annotator agreement (IAA) statistics, annotation guidelines, and the disagreement-resolution procedure are essential to establish annotation quality and support the interpretation of the performance gap. The current manuscript does not report these details. In the revised version, we will include the full annotation guidelines as supplementary material, report IAA using token-level agreement metrics (e.g., Krippendorff's alpha or pairwise F1), and describe the resolution process (e.g., adjudication by a third annotator). We will also note any validation subset used during annotation. revision: yes
-
Referee: [§4] §4 (Experiments and Results): The cross-benchmark comparison (monolingual/synthetic vs. SwissGov-RSD) does not control for document length, domain specificity, or annotation granularity differences. This weakens the attribution of the gap specifically to the cross-lingual naturalistic setting.
Authors: We acknowledge that the benchmarks differ in document length, domain, and annotation granularity, and that these factors are not explicitly controlled in the current comparisons. These differences are inherent to contrasting controlled synthetic/sentence-level settings with naturalistic document-level cross-lingual data. To strengthen the analysis, we will add a dedicated discussion section in the revision that explicitly addresses these confounders, include length-stratified performance results where feasible, and clarify that the observed gap reflects the combined challenges of the naturalistic cross-lingual document setting rather than isolating a single variable. We maintain that the direct comparison remains informative for highlighting real-world difficulties. revision: partial
Circularity Check
No circularity: empirical benchmark with external dataset and model evaluations
full rationale
The paper introduces SwissGov-RSD as a new human-annotated dataset of 224 multi-parallel documents with token-level semantic difference labels and reports empirical performance of LLMs and encoder models on it. No mathematical derivations, equations, or parameter-fitting steps are present that could reduce to self-definition or fitted inputs by construction. The central claim (poor model performance relative to monolingual/synthetic benchmarks) rests on direct comparison to the external annotations rather than any internal loop or self-citation chain. This is a standard benchmark paper whose results are falsifiable against the released dataset and do not rely on renaming prior results or smuggling ansatzes.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human annotators can reliably identify token-level semantic differences in parallel documents
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Recognizing semantic differences (RSD) concerns identifying which parts of two texts differ in meaning... token-level regression problem
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016 a . https://doi.org/10.18653/v1/S16-1081 S em E val-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation . In Proceedings of the 10th International Workshop on Semantic Evaluation ( S em E val-2016) , ...
-
[4]
Eneko Agirre, Aitor Gonzalez-Agirre, I \ n igo Lopez-Gazpio, Montse Maritxalar, German Rigau, and Larraitz Uria. 2016 b . https://doi.org/10.18653/v1/S16-1082 S em E val-2016 task 2: Interpretable semantic textual similarity . In Proceedings of the 10th International Workshop on Semantic Evaluation ( S em E val-2016) , pages 512--524, San Diego, Californi...
-
[5]
Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2015. https://doi.org/10.3115/v1/P15-1039 Generating high quality proposition B anks for multilingual semantic role labeling . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confe...
-
[6]
Chantal Amrhein, Nikita Moghe, and Liane Guillou. 2022. https://aclanthology.org/2022.wmt-1.44/ ACES : Translation accuracy challenge sets for evaluating machine translation metrics . In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 479--513, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics
work page 2022
-
[7]
Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves, Andre Martins, Ayoub Hammal, Caio Corro, CELINE HUDELOT, Emmanuel Malherbe, Etienne Malaboeuf, Fanny Jourdan, Gabriel Hautreux, Jo \ a o Alves, Kevin El Haddad, Manuel Faysse, Maxime Peyrard, Nuno M Guerreiro, Patrick Fernandes, Ricardo Rei, and Pierre Colombo. 2025. https://openreview.net...
work page 2025
-
[8]
and Angeli, Gabor and Potts, Christopher and Manning, Christopher D
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. https://doi.org/10.18653/v1/D15-1075 A large annotated corpus for learning natural language inference . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632--642, Lisbon, Portugal. Association for Computational Linguistics
-
[9]
Aljoscha Burchardt. 2013. https://aclanthology.org/2013.tc-1.6 Multidimensional quality metrics: a flexible system for assessing translation quality . In Proceedings of Translating and the Computer 35, London, UK. Aslib
work page 2013
-
[10]
Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. https://doi.org/10.18653/v1/2024.findings-acl.137 M 3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation . In Findings of the Association for Computational Linguistics: ACL 2024, pages 2318--2335, Bangkok,...
-
[11]
Yang Chen, Chao Jiang, Alan Ritter, and Wei Xu. 2023. https://doi.org/10.18653/v1/2023.findings-acl.357 Frustratingly easy label projection for cross-lingual transfer . In Findings of the Association for Computational Linguistics: ACL 2023, pages 5775--5796, Toronto, Canada. Association for Computational Linguistics
-
[12]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm \'a n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. https://doi.org/10.18653/v1/2020.acl-main.747 Unsupervised cross-lingual representation learning at scale . In Proceedings of the 58th Annual Meeting of the Association for Comp...
-
[13]
Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. https://doi.org/10.18653/v1/D18-1269 XNLI : Evaluating cross-lingual sentence representations . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475--2485, Brussels, Belgium. Association...
-
[14]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA : Efficient finetuning of quantized LLMs . In Thirty-seventh Conference on Neural Information Processing Systems
work page 2023
-
[15]
Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. https://doi.org/10.18653/v1/2022.acl-long.62 Language-agnostic BERT sentence embedding . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878--891, Dublin, Ireland. Association for Computational Linguistics
-
[16]
Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. https://doi.org/10.1162/tacl_a_00437 Experts, errors, and context: A large-scale study of human evaluation for machine translation . Transactions of the Association for Computational Linguistics, 9:1460--1474
-
[17]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.552 S im CSE : Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894--6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics
-
[18]
Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, and Alexis Conneau. 2021. https://doi.org/10.18653/v1/2021.repl4nlp-1.4 Larger-scale transformers for multilingual masked language modeling . In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 29--33, Online. Association for Computational Linguistics
-
[19]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z F Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, and 175 others. 2025. https://doi.org/10.1038/s41586-025-09422-z DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learni...
-
[21]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 Lora: Low-rank adaptation of large language models . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net
work page 2022
-
[22]
Tom Kocmi, Vilém Zouhar, Eleftherios Avramidis, Roman Grundkiewicz, Marzena Karpinska, Maja Popović, Mrinmaya Sachan, and Mariya Shmatova. 2024. Error span annotation: A balanced approach for human evaluation of machine translation. In Proceedings of the Ninth Conference on Machine Translation, pages 1440--1453, Stroudsburg, PA, USA. Association for Compu...
work page 2024
-
[23]
Duong Minh Le, Yang Chen, Alan Ritter, and Wei Xu. 2024. https://openreview.net/forum?id=DayPQKXaQk Constrained decoding for cross-lingual label projection . In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net
work page 2024
-
[24]
Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, and Bill Dolan. 2022. https://doi.org/10.18653/v1/2022.acl-long.464 A token-level reference-free hallucination detection benchmark for free-form text generation . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pag...
-
[25]
Arle Lommel, Serge Gladkoff, Alan Melby, Sue Ellen Wright, Ingemar Strandvik, Katerina Gasova, Angelika Vaasa, Andy Benzo, Romina Marazzato Sparano, Monica Foresi, Johani Innis, Lifeng Han, and Goran Nenadic. 2024. https://aclanthology.org/2024.amta-presentations.6/ The multi-range theory of translation quality measurement: MQM scoring models and statisti...
work page 2024
-
[26]
Marone Marc, Weller Orion, Fleshman William, Yang Eugene, Lawrie Dawn, and Benjamin Van Durme. 2025. MmBERT : A modern multilingual encoder with annealed language learning. arXiv [cs.CL]
work page 2025
-
[27]
Timothee Mickus, Elaine Zosa, Raul Vazquez, Teemu Vahtola, J \"o rg Tiedemann, Vincent Segonne, Alessandro Raganato, and Marianna Apidianaki. 2024. https://doi.org/10.18653/v1/2024.semeval-1.273 S em E val-2024 task 6: SHROOM , a shared-task on hallucinations and related observable overgeneration mistakes . In Proceedings of the 18th International Worksho...
-
[28]
Nikita Moghe, Arnisa Fazla, Chantal Amrhein, Tom Kocmi, Mark Steedman, Alexandra Birch, Rico Sennrich, and Liane Guillou. 2025. https://doi.org/10.1162/coli_a_00537 Machine translation meta evaluation through translation accuracy challenge sets . Computational Linguistics, 51(1):73--137
-
[29]
OpenAI, :, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaiev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, and 7 others. 2025. https://arxiv.org/abs/2502.06807 Competitive programming with large re...
-
[30]
OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, and 401 others. 2024. https://arxiv.org/abs/2410.21276 Gpt-4o system card . Preprint, arXiv:2410.21276
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, and Nanyun Peng. 2024. https://doi.org/10.18653/v1/2024.naacl-long.321 Contextual label projection for cross-lingual structured prediction . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: L...
-
[32]
Guerreiro, Marcos Treviso, Luisa Coheur, Alon Lavie, and Andr \'e Martins
Ricardo Rei, Nuno M. Guerreiro, Marcos Treviso, Luisa Coheur, Alon Lavie, and Andr \'e Martins. 2023. https://doi.org/10.18653/v1/2023.acl-short.94 The inside story: Towards better understanding of machine translation neural evaluation metrics . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Pap...
-
[33]
Gabriele Sarti, Vilém Zouhar, Malvina Nissim, and Arianna Bisazza. 2025. Unsupervised word-level quality estimation for machine translation through the lens of annotators (dis)agreement. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18320--18337
work page 2025
-
[34]
Yves Scherrer, Luka Nerima, Lorenza Russo, Maria Ivanova, and Eric Wehrli. 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/772_Paper.pdf S wiss A dmin: A multilingual tagged parallel corpus of press releases . In Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14) , pages 1832--1836, Reykjavik, Icelan...
work page 2014
-
[35]
Lucia Specia, Kim Harris, Fr \'e d \'e ric Blain, Aljoscha Burchardt, Viviven Macketanz, Inguna Skadin, Matteo Negri, and Marco Turchi. 2017. https://aclanthology.org/2017.mtsummit-papers.5 Translation quality and productivity: A study on rich morphology languages . In Proceedings of Machine Translation Summit XVI: Research Track, pages 55--71, Nagoya Japan
work page 2017
-
[36]
Jannis Vamvas and Rico Sennrich. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.835 Towards unsupervised recognition of token-level semantic differences in related documents . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13543--13552, Singapore. Association for Computational Linguistics
-
[37]
Raul Vazquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, J \"o rg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sanchez Vega, Alessandro Raganato, Jind r ich Libovick \'y , Jussi Karlgren, Shaoxiong Ji, Jind r ich Helcl, Liane Guillou, Ona De Gibert, Jaione Bengoetxea, Joseph Attieh, and Marianna Apidianaki. 2025. https://aclanthology.org/2025.semeva...
work page 2025
-
[38]
Martin Volk, Chantal Amrhein, Noëmi Aepli, Mathias Müller, and Phillip Ströbel. 2016. Building a parallel corpus on the world's oldest banking magazine. In KONVENS. s.n
work page 2016
-
[39]
Yaushian Wang, Ashley Wu, and Graham Neubig. 2022. English contrastive learning can learn universal cross-lingual sentence embeddings. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9122--9133, Stroudsburg, PA, USA. Association for Computational Linguistics
work page 2022
-
[40]
Benjamin Warner, Antoine Chaffin, Benjamin Clavi \'e , Orion Weller, Oskar Hallstr \"o m, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Griffin Thomas Adams, Jeremy Howard, and Iacopo Poli. 2025. https://aclanthology.org/2025.acl-long.127/ Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory effici...
work page 2025
- [41]
-
[42]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. https://doi.org/10.18653/v1/N18-1101 A broad-coverage challenge corpus for sentence understanding through inference . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages 1112...
-
[43]
Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. 2019. https://doi.org/10.18653/v1/D19-1382 PAWS - X : A cross-lingual adversarial dataset for paraphrase identification . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)...
-
[44]
Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, and 1 others. 2024. mgte: Generalized long-context text representation and reranking models for multilingual text retrieval. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, page...
work page 2024
-
[45]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025. Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv [cs.CL]
work page 2025
-
[46]
Yuan Zhang, Jason Baldridge, and Luheng He. 2019. https://doi.org/10.18653/v1/N19-1131 PAWS : Paraphrase adversaries from word scrambling . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages 1298--1308, Minneapolis, Min...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.