Recognition: unknown
Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation
Pith reviewed 2026-05-09 21:42 UTC · model grok-4.3
The pith
Combining differential privacy with LLM redaction improves the privacy-utility trade-off for Dutch clinical notes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that differential privacy mechanisms applied alone to Dutch clinical text cause large drops in utility on entity and relation classification tasks, while hybrid strategies that first redact protected information using NER or especially LLM-based methods before applying DP deliver markedly better privacy-utility trade-offs as measured by both leakage metrics and downstream task performance.
What carries the argument
Hybrid pipelines that apply linguistic preprocessing (NER or LLM redaction) before differential privacy mechanisms.
Load-bearing premise
That performance on entity and relation classification tasks accurately reflects the real-world usefulness of the de-identified notes for secondary healthcare research.
What would settle it
A follow-up evaluation that applies the same de-identified notes to an actual secondary research task such as outcome prediction and finds no utility advantage for the hybrid methods over pure DP.
Figures
read the original abstract
Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivating the need for automated methods that combine privacy guarantees with high utility. Most automated text de-identification pipelines employed named entity recognition (NER) to identify protected entities for redaction. Although methods based on differential privacy (DP) provide formal privacy guarantees, more recently also large language models (LLMs) are increasingly used for text de-identification in the clinical domain. In this work, we present the first comparative study of DP, NER, and LLMs for Dutch clinical text de-identification. We investigate these methods separately as well as hybrid strategies that apply NER or LLM preprocessing prior to DP, and assess performance in terms of privacy leakage and extrinsic evaluation (entity and relation classification). We show that DP mechanisms alone degrade utility substantially, but combining them with linguistic preprocessing, especially LLM-based redaction, significantly improves the privacy-utility trade-off.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper presents the first comparative evaluation of differentially private (DP) mechanisms, named entity recognition (NER), and large language model (LLM)-based methods for de-identifying Dutch clinical notes. It examines standalone approaches as well as hybrid pipelines that apply NER or LLM preprocessing before DP, measuring performance via privacy leakage metrics and extrinsic utility on entity and relation classification tasks. The central finding is that DP alone substantially degrades utility, whereas hybrid strategies—particularly LLM-based redaction followed by DP—yield a meaningfully better privacy-utility trade-off.
Significance. If the empirical results hold under scrutiny, the work supplies timely, language-specific evidence on practical de-identification strategies for Dutch clinical text, an under-studied setting relative to English. The hybrid LLM-DP approach is shown to mitigate the utility penalty of pure DP while retaining formal privacy guarantees, which could directly inform GDPR-compliant secondary-use pipelines in healthcare. The inclusion of extrinsic downstream tasks adds relevance beyond intrinsic privacy metrics, though the paper's own evaluation design limits the strength of claims about broader clinical utility.
major comments (1)
- Evaluation section (extrinsic tasks): The central claim that LLM preprocessing improves the privacy-utility trade-off rests on entity and relation classification performance. These tasks are semantically close to the NER/LLM redaction step itself, so measured gains may reflect task alignment rather than preserved semantic content for secondary clinical uses (e.g., cohort studies or outcome modeling). No results are reported on more distant tasks such as diagnosis prediction or temporal event extraction, leaving the generalizability of the improvement untested and weakening support for the headline conclusion.
minor comments (2)
- Abstract: The abstract states the evaluation approach and main finding but provides no quantitative results (e.g., specific privacy leakage rates, F1 scores, or DP parameters such as ε), making it difficult for readers to gauge the magnitude of the reported improvements without reading the full results section.
- Dataset and experimental details: The manuscript would benefit from an explicit table or subsection listing the Dutch clinical corpus size, number of notes, protected entity types, and the exact DP mechanisms and privacy budgets (ε, δ) used in each condition.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the single major comment below.
read point-by-point responses
-
Referee: Evaluation section (extrinsic tasks): The central claim that LLM preprocessing improves the privacy-utility trade-off rests on entity and relation classification performance. These tasks are semantically close to the NER/LLM redaction step itself, so measured gains may reflect task alignment rather than preserved semantic content for secondary clinical uses (e.g., cohort studies or outcome modeling). No results are reported on more distant tasks such as diagnosis prediction or temporal event extraction, leaving the generalizability of the improvement untested and weakening support for the headline conclusion.
Authors: We thank the referee for highlighting this important consideration. Entity and relation classification were chosen as extrinsic tasks because they are standard benchmarks in clinical NLP literature for assessing de-identification utility and because Dutch-annotated datasets are available for them, enabling direct comparison across methods. The relation classification task requires contextual inference and semantic linking beyond entity detection alone, offering evidence that hybrid LLM-DP approaches preserve more than surface-level information. We agree, however, that more distant tasks such as diagnosis prediction or temporal event extraction would better demonstrate generalizability to broader secondary uses. Such evaluations would require additional annotated data and resources beyond the current study scope. In the revised manuscript we will add an explicit limitations paragraph in the Discussion section that acknowledges the scope of the chosen tasks, qualifies the headline claims accordingly, and identifies these more distant tasks as valuable directions for future work. revision: partial
Circularity Check
No circularity: purely empirical comparative evaluation with independent benchmarks
full rationale
The paper conducts a comparative study of DP, NER, and LLM-based de-identification methods on Dutch clinical notes, measuring privacy leakage and utility via standard extrinsic tasks (entity and relation classification). No mathematical derivations, equations, fitted parameters, or predictions are present. Central claims rest on direct experimental results rather than any self-referential reduction, self-citation chains, or ansatz smuggling. Evaluations use established metrics and tasks that do not reduce to the preprocessing steps by construction. This is a standard empirical setup with no load-bearing circular elements.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Differential privacy mechanisms provide formal privacy guarantees when correctly implemented.
- domain assumption NER and LLM models can reliably identify protected health information in clinical text.
Reference graph
Works this paper leans on
-
[1]
Catalan Speecon database
Speecon Consortium. Catalan Speecon database. 2011
2011
-
[2]
The EMILLE/CIIL Corpus
Anthony McEnery and others. The EMILLE/CIIL Corpus. 2004
2004
-
[3]
The OrienTel Moroccan MCA (Modern Colloquial Arabic) database
Khalid Choukri and Niklas Paullson. The OrienTel Moroccan MCA (Modern Colloquial Arabic) database. 2004
2004
-
[4]
ItalWordNet v.2
Roventini, Adriana and Marinelli, Rita and Bertagna, Francesca. ItalWordNet v.2
-
[5]
Dwork, Cynthia , title =. Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II , pages =. 2006 , isbn =. doi:10.1007/11787006_1 , abstract =
-
[6]
Dwork, Cynthia and Roth, Aaron , title =. Found. Trends Theor. Comput. Sci. , month = aug, pages =. 2014 , issue_date =. doi:10.1561/0400000042 , abstract =
-
[7]
Meystre, Stephane M. and Friedlin, F. Jeffrey and South, Brett R. and Shen, Shuying and Samore, Matthew H. , title=. BMC Medical Research Methodology , year=. doi:10.1186/1471-2288-10-70 , url=
-
[8]
International Symposium on Privacy Enhancing Technologies , year=
Broadening the Scope of Differential Privacy Using Metrics , author=. International Symposium on Privacy Enhancing Technologies , year=
-
[9]
2019 IEEE International Conference on Data Mining (ICDM) , year=
Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text , author=. 2019 IEEE International Conference on Data Mining (ICDM) , year=
2019
-
[10]
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke. BART : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguisti...
-
[11]
Conference of the European Chapter of the Association for Computational Linguistics , year=
ADePT: Auto-encoder based Differentially Private Text Transformation , author=. Conference of the European Chapter of the Association for Computational Linguistics , year=
-
[12]
2025 , eprint=
InferDPT: Privacy-Preserving Inference for Black-box Large Language Model , author=. 2025 , eprint=
2025
-
[13]
Yue, Xiang and Du, Minxin and Wang, Tianhao and Li, Yaliang and Sun, Huan and Chow, Sherman S. M. Differential Privacy for Text Analytics via Natural Text Sanitization. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.337
-
[14]
A Customized Text Sanitization Mechanism with Differential Privacy
Chen, Sai and Mo, Fengran and Wang, Yanhao and Chen, Cen and Nie, Jian-Yun and Wang, Chengyu and Cui, Jamie. A Customized Text Sanitization Mechanism with Differential Privacy. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.355
-
[15]
2023 , eprint=
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 , author=. 2023 , eprint=
2023
-
[16]
2020 , eprint=
Language Models are Few-Shot Learners , author=. 2020 , eprint=
2020
-
[17]
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus
Stubbs, Amber and Uzuner, \"O zlem. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. J Biomed Inform
2014
-
[18]
Department of Health and Human Services
U.S. Department of Health and Human Services. 45 CFR § 164.514 – de-identification of health information
-
[19]
European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Official Journal of the European Union
2016
-
[20]
2019 , eprint=
Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations , author=. 2019 , eprint=
2019
-
[21]
Language Resources and Evaluation , pages=
Creation of a gold standard Dutch corpus of clinical notes for adverse drug event detection: the Dutch ADE corpus , author=. Language Resources and Evaluation , pages=. 2025 , publisher=
2025
-
[22]
arXiv preprint arXiv:2211.01147 , year=
An easy-to-use and robust approach for the differentially private de-identification of clinical textual documents , author=. arXiv preprint arXiv:2211.01147 , year=
-
[23]
Digital Health , volume=
Data privacy in healthcare: Global challenges and solutions , author=. Digital Health , volume=. 2025 , publisher=
2025
-
[24]
Journal of medical Internet research , volume=
Use and understanding of anonymization and de-identification in the biomedical literature: scoping review , author=. Journal of medical Internet research , volume=. 2019 , publisher=
2019
-
[25]
arXiv preprint arXiv:1912.09582 , year=
Bertje: A dutch bert model , author=. arXiv preprint arXiv:1912.09582 , year=
-
[26]
nl: a language model for Dutch electronic health records , author=
MedRoBERTa. nl: a language model for Dutch electronic health records , author=. Computational Linguistics in the Netherlands , volume=. 2021 , organization=
2021
-
[27]
How to successfully recycle English GPT-2 to make models for other languages , author=
As good as new. How to successfully recycle English GPT-2 to make models for other languages , author=. 2020 , eprint=
2020
-
[28]
De-identification of patient notes with recurrent neural networks
Dernoncourt, Franck and Lee, Ji Young and Uzuner, Ozlem and Szolovits, Peter. De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association (JAMIA)
-
[29]
The International FLAIRS Conference Proceedings , author=
De-identification of Emergency Medical Records in French: Survey and Comparison of State-of-the-Art Automated Systems , volume=. The International FLAIRS Conference Proceedings , author=. 2021 , month=. doi:10.32473/flairs.v34i1.128480 , abstractNote=
-
[30]
An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation
Wang, Peng and Li, Yong and Yang, Liang and Li, Simin and Li, Linfeng and Zhao, Zehan and Long, Shaopei and Wang, Fei and Wang, Hongqian and Li, Ying and Wang, Chengliang. An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation. JMIR Med Inform
-
[32]
Enhancing text anonymization via re-identification risk-based explainability , journal =
Benet Manzanares-Salor and David Sánchez , keywords =. Enhancing text anonymization via re-identification risk-based explainability , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.knosys.2024.112945 , url =
-
[33]
C-sanitized: A privacy model for document redaction and sanitization , year =
S\'. C-sanitized: A privacy model for document redaction and sanitization , year =. J. Assoc. Inf. Sci. Technol. , month = jan, pages =. doi:10.1002/asi.23363 , abstract =
-
[34]
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe
Yue, Xiang and Inan, Huseyin and Li, Xuechen and Kumar, Girish and McAnallen, Julia and Shajari, Hoda and Sun, Huan and Levitan, David and Sim, Robert. Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. do...
-
[35]
International Conference on Learning Representations , year=
Differentially Private Fine-tuning of Language Models , author=. International Conference on Learning Representations , year=
-
[36]
Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=
2022
-
[37]
Liu, Xiao-Yang and Zhu, Rongyi and Zha, Daochen and Gao, Jiechao and Zhong, Shan and White, Matt and Qiu, Meikang , title =. ACM Trans. Manage. Inf. Syst. , month = aug, keywords =. 2024 , publisher =. doi:10.1145/3682068 , abstract =
-
[38]
arXiv preprint arXiv:2209.09631 , year=
De-identification of French unstructured clinical notes for machine learning tasks , author=. arXiv preprint arXiv:2209.09631 , year=
-
[39]
arXiv preprint arXiv:2507.19396 , year=
Detection of Adverse Drug Events in Dutch clinical free text documents using Transformer Models: benchmark study , author=. arXiv preprint arXiv:2507.19396 , year=
-
[40]
Robust Utility-Preserving Text Anonymization Based on Large Language Models
Yang, Tianyu and Zhu, Xiaodan and Gurevych, Iryna. Robust Utility-Preserving Text Anonymization Based on Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1404
-
[41]
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[43]
arXiv e-prints , pages=
The llama 3 herd of models , author=. arXiv e-prints , pages=
-
[44]
Medgemma technical report , author=. arXiv preprint arXiv:2507.05201 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
, author=
De-identification of Personal Information:. , author=. 2015 , publisher=
2015
-
[46]
Health (San Francisco) , volume=
Simple demographics often identify people uniquely , author=. Health (San Francisco) , volume=
-
[47]
Mamma Mia! Where ' s My Name? De-Identifying I talian Clinical Notes with Large Language Models
Miranda, Michele and Brati \`e res, S \'e bastien and Patarnello, Stefano and Lilli, Livia. Mamma Mia! Where ' s My Name? De-Identifying I talian Clinical Notes with Large Language Models. Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025). 2025
2025
-
[48]
Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[49]
Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts
Buhnila, Ioana and Cislaru, Georgeta and Todirascu, Amalia. Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts. 2025
2025
-
[50]
Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities
Shi, Ken and Penn, Gerald. Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities. 2025
2025
-
[51]
Reading Between the Lines: A dataset and a study on why some texts are tougher than others
Khallaf, Nouran and Eugeni, Carlo and Sharoff, Serge. Reading Between the Lines: A dataset and a study on why some texts are tougher than others. 2025
2025
-
[52]
P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction
Jourdan, L \'e ane and Boudin, Florian and Dufour, Richard and Hernandez, Nicolas and Aizawa, Akiko. P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction. 2025
2025
-
[53]
Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts
Maggi, Chiara and Vitaletti, Andrea. Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts. 2025
2025
-
[54]
Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models
Sato, Anna and Kobayashi, Ichiro. Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models. 2025
2025
-
[55]
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4). 2025
2025
-
[56]
A rabic S ense: A Benchmark for Evaluating Commonsense Reasoning in A rabic with Large Language Models
Lamsiyah, Salima and Zeinalipour, Kamyar and El amrany, Samir and Brust, Matthias and Maggini, Marco and Bouvry, Pascal and Schommer, Christoph. A rabic S ense: A Benchmark for Evaluating Commonsense Reasoning in A rabic with Large Language Models. 2025
2025
-
[57]
Lahjawi: A rabic Cross-Dialect Translator
Hamed, Mohamed Motasim and Hreden, Muhammad and Hennara, Khalil and Aldallal, Zeina and Chrouf, Sara and AlModhayan, Safwan. Lahjawi: A rabic Cross-Dialect Translator. 2025
2025
-
[58]
Lost in Variation: An Unsupervised Methodology for Mining Lexico-syntactic Patterns in Middle A rabic Texts
Bezan. Lost in Variation: An Unsupervised Methodology for Mining Lexico-syntactic Patterns in Middle A rabic Texts. 2025
2025
-
[59]
SADSL y C : A Corpus for Saudi A rabian Multi-dialect Identification through Song Lyrics
Alahmari, Salwa Saad. SADSL y C : A Corpus for Saudi A rabian Multi-dialect Identification through Song Lyrics. 2025
2025
-
[60]
Enhancing Dialectal A rabic Intent Detection through Cross-Dialect Multilingual Input Augmentation
Hossain, Shehenaz and Shammary, Fouad and Shammary, Bahaulddin and Afli, Haithem. Enhancing Dialectal A rabic Intent Detection through Cross-Dialect Multilingual Input Augmentation. 2025
2025
-
[61]
D ial2 MSA -Verified: A Multi-Dialect A rabic Social Media Dataset for Neural Machine Translation to M odern S tandard A rabic
Khered, Abdullah and Benkhedda, Youcef and Batista-Navarro, Riza. D ial2 MSA -Verified: A Multi-Dialect A rabic Social Media Dataset for Neural Machine Translation to M odern S tandard A rabic. 2025
2025
-
[62]
Web-Based Corpus Compilation of the Emirati A rabic Dialect
El-Ghawi, Yousra A. Web-Based Corpus Compilation of the Emirati A rabic Dialect. 2025
2025
-
[63]
Evaluating Calibration of A rabic Pre-trained Language Models on Dialectal Text
Al-Laith, Ali and Kebdani, Rachida. Evaluating Calibration of A rabic Pre-trained Language Models on Dialectal Text. 2025
2025
-
[64]
Empirical Evaluation of Pre-trained Language Models for Summarizing M oroccan D arija News Articles
Aftiss, Azzedine and Lamsiyah, Salima and Schommer, Christoph and El Alaoui, Said Ouatik. Empirical Evaluation of Pre-trained Language Models for Summarizing M oroccan D arija News Articles. 2025
2025
-
[65]
D ialect2 SQL : A Novel Text-to- SQL Dataset for A rabic Dialects with a Focus on M oroccan D arija
Chafik, Salmane and Ezzini, Saad and Berrada, Ismail. D ialect2 SQL : A Novel Text-to- SQL Dataset for A rabic Dialects with a Focus on M oroccan D arija. 2025
2025
-
[66]
A ra S im: Optimizing A rabic Dialect Translation in Children`s Literature with LLM s and Similarity Scores
Bouomar, Alaa Hassan and Abbas, Noorhan. A ra S im: Optimizing A rabic Dialect Translation in Children`s Literature with LLM s and Similarity Scores. 2025
2025
-
[67]
Navigating Dialectal Bias and Ethical Complexities in L evantine A rabic Hate Speech Detection
Haj Ahmed, Ahmed and Yew, Rui-Jie and Minocher, Xerxes and Venkatasubramanian, Suresh. Navigating Dialectal Bias and Ethical Complexities in L evantine A rabic Hate Speech Detection. 2025
2025
-
[68]
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects. 2025
2025
-
[69]
Findings of the V ar D ial Evaluation Campaign 2025: The N or SID Shared Task on N orwegian Slot, Intent and Dialect Identification
Scherrer, Yves and van der Goot, Rob and M hlum, Petter. Findings of the V ar D ial Evaluation Campaign 2025: The N or SID Shared Task on N orwegian Slot, Intent and Dialect Identification. 2025
2025
-
[70]
Information Theory and Linguistic Variation: A Study of B razilian and E uropean P ortuguese
Alves, Diego. Information Theory and Linguistic Variation: A Study of B razilian and E uropean P ortuguese. 2025
2025
-
[71]
Leveraging Open-Source Large Language Models for Native Language Identification
Ng, Yee Man and Markov, Ilia. Leveraging Open-Source Large Language Models for Native Language Identification. 2025
2025
-
[72]
and Tayyar Madabushi, Harish
Torgbi, Melissa and Clayman, Andrew and Speight, Jordan J. and Tayyar Madabushi, Harish. Adapting Whisper for Regional Dialects: Enhancing Public Services for Vulnerable Populations in the U nited K ingdom. 2025
2025
-
[73]
Large Language Models as a Normalizer for Transliteration and Dialectal Translation
Alam, Md Mahfuz Ibn and Anastasopoulos, Antonios. Large Language Models as a Normalizer for Transliteration and Dialectal Translation. 2025
2025
-
[74]
Testing the Boundaries of LLM s: Dialectal and Language-Variety Tasks
Faisal, Fahim and Anastasopoulos, Antonios. Testing the Boundaries of LLM s: Dialectal and Language-Variety Tasks. 2025
2025
-
[75]
Text Generation Models for L uxembourgish with Limited Data: A Balanced Multilingual Strategy
Plum, Alistair and Ranasinghe, Tharindu and Purschke, Christoph. Text Generation Models for L uxembourgish with Limited Data: A Balanced Multilingual Strategy. 2025
2025
-
[76]
Retrieval of Parallelizable Texts Across C hurch S lavic Variants
Lendvai, Piroska and Reichel, Uwe and Jouravel, Anna and Rabus, Achim and Renje, Elena. Retrieval of Parallelizable Texts Across C hurch S lavic Variants. 2025
2025
-
[77]
Neural Text Normalization for L uxembourgish Using Real-Life Variation Data
Lutgen, Anne-Marie and Plum, Alistair and Purschke, Christoph and Plank, Barbara. Neural Text Normalization for L uxembourgish Using Real-Life Variation Data. 2025
2025
-
[78]
Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal B avarian Case Study
Kr. Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal B avarian Case Study. 2025
2025
-
[79]
Regional Distribution of the /el/-/ l/ Merger in A ustralian E nglish
Coats, Steven and Diskin-Holdaway, Chlo \'e and Loakes, Debbie. Regional Distribution of the /el/-/ l/ Merger in A ustralian E nglish. 2025
2025
-
[80]
Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints
Khalifa, Salam and Qaddoumi, Abdelrahim and Kodner, Jordan and Rambow, Owen. Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints. 2025
2025
-
[81]
and Riabi, Arij and Seddah, Djam \'e
Lopetegui, Javier A. and Riabi, Arij and Seddah, Djam \'e. Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in S panish Varieties. 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.