Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

Anthony Nguyen; Jinghui Liu

arxiv: 2605.17755 · v1 · pith:E777OQ26new · submitted 2026-05-18 · 💻 cs.CL · cs.AI

Bridging the Version Gap: Multi-version Training Improves ICD Code Prediction, Especially for Rare Codes

Jinghui Liu , Anthony Nguyen This is my paper

Pith reviewed 2026-05-20 11:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords clinical codingICD-9ICD-10multi-version trainingrare codeslabel-wise attentionmedical NLPlong-tail problem

0 comments

The pith

Combining ICD-9 and ICD-10 training data raises micro F1 on rare ICD-10 codes by 27 percent without any code mapping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Clinical coding turns free-text notes into standardized ICD codes, but new versions arrive regularly and rare codes remain difficult to predict. The paper tests whether a single model can be trained on annotations from both ICD-9 and ICD-10 at once. Despite differences in code definitions and granularity, the combined data improves ICD-10 prediction. The gain is largest for the long tail of rare codes, and the same approach also lifts performance on frequent codes while using fewer parameters.

Core claim

A modified label-wise attention model trained on mixed ICD-9 and ICD-10 data outperforms an ICD-10-only model on ICD-10 prediction. For roughly 18,000 rare ICD-10 codes the micro F1 score rises 27 percent; for 8,000 frequent codes macro metrics also improve. These gains occur without explicit alignment between the two code sets and with a smaller total parameter count.

What carries the argument

Label-wise attention model trained on pooled ICD-9 and ICD-10 annotations that learns shared representations across versions without mapping steps.

If this is right

Historical ICD-9 data can be reused to improve current ICD-10 models without version-specific retraining.
The long-tail problem in clinical coding becomes less severe when older annotations supplement newer ones.
Fewer parameters suffice for strong coverage of both rare and common codes when multi-version data is used.
Models may generalize across future ICD releases if the same joint-training pattern continues to work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Semantic overlap between versions appears large enough that explicit cross-version mapping may often be unnecessary.
The same joint-training idea could be tested on other evolving medical terminologies such as SNOMED CT or procedure codes.
Regions or hospitals that still hold large ICD-9 archives could immediately improve their ICD-10 systems without new labeling campaigns.

Load-bearing premise

The attention model can extract useful shared signals from ICD-9 and ICD-10 labels even though the two versions differ in definition, granularity, and annotation habits.

What would settle it

If a model trained only on ICD-10 data matches or exceeds the combined model's micro F1 on the 18,000 rare ICD-10 codes, the claimed benefit of multi-version training would not hold.

Figures

Figures reproduced from arXiv: 2605.17755 by Anthony Nguyen, Jinghui Liu.

**Figure 1.** Figure 1: (a) ICD coding faces two intertwined challenges: the ICD system evolves continuously and the code distribution is heavily long-tailed. (b) We mix three MIMIC-derived datasets spanning ICD-9 and ICD10 to train a single version-agnostic model, DUALLAAT. (c) Adding ICD-9 to ICD-10 training yields a 27% relative gain in micro F1 on rare ICD-10 codes than training on ICD-10 alone. Current best-performing ICD… view at source ↗

read the original abstract

Clinical coding maps clinical documentation to standardized medical codes, an essential yet time-consuming administrative task that could benefit from automation. Current models on ICD coding are typically optimized for codes from a specific ICD version. However, in reality, ICD systems evolve continuously, and different versions are adopted across time periods and regions. Moreover, ICD coding suffers from the long-tail problem, and rare code performance can be a bottleneck for developing implementable models. We examine whether it is viable to train version-independent models by combining data annotated in different ICD versions, which may help address these challenges. We add ICD-9 data to the training of a modified label-wise attention model for ICD-10 prediction, and find that despite the version mismatch, adding ICD-9 yields a 27% increase in micro F1 for 18K rare ICD codes compared to training on ICD-10 alone. On 8K frequent ICD-10 codes, the multi-version training also substantially improves macro metrics, with far fewer model parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adding ICD-9 data to an ICD-10 label-wise attention model gives a 27% micro-F1 lift on rare codes, but the gain may simply reflect extra training volume rather than true cross-version transfer.

read the letter

The main thing to know is that the authors report a 27% micro-F1 improvement on 18k rare ICD-10 codes when they add ICD-9 annotations to training, plus macro gains on frequent codes, all with a smaller model. They do this without any explicit code mapping or alignment step. That is the concrete empirical observation the paper rests on. It is a direct test of whether version mismatch can be handled by just pooling the data, which matters for real deployments where hospitals switch versions over time or across regions. The long-tail focus is also sensible; rare-code performance is the practical bottleneck in clinical coding, so showing movement there is more useful than another incremental win on head classes. The setup itself is simple: they take an existing label-wise attention architecture and retrain it on the combined corpus. No new framework or derivation is introduced, but the result is a practical data point for anyone already using similar models. The soft spot is the missing control for data volume. Adding ICD-9 necessarily increases the total number of training examples, yet the abstract and described experiments do not include a size-matched run that adds an equivalent number of extra ICD-10-only samples. Without that, it is hard to separate the benefit of cross-version representations from the benefit of more supervision overall. Details on statistical tests, the exact frequency cutoff for rare codes, and full baseline tables are also absent from the summary, which makes the 27% figure difficult to assess in isolation. The assumption that the attention layers will automatically discover useful shared structure across versions is plausible but untested in the provided information. This work is for applied clinical NLP teams that already run ICD coding pipelines and need to handle version drift without building separate models. Readers who care about deployment constraints rather than new theory will get the most out of the numbers. It is not a foundational paper, but the experiment is straightforward enough that it deserves a serious referee who can request the volume-matched ablation and the missing statistical details. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The paper examines training a modified label-wise attention model for ICD-10 code prediction by augmenting ICD-10 training data with ICD-9 annotations. It claims that, despite version differences in code definitions and granularity, this multi-version approach yields a 27% micro-F1 improvement on 18K rare ICD-10 codes relative to ICD-10-only training, plus macro-metric gains on 8K frequent codes, all while using substantially fewer parameters.

Significance. If the reported gains are attributable to cross-version representation sharing in the label-wise attention layers rather than simply to increased training volume, the result would be practically relevant for clinical coding systems that must accommodate ICD version transitions and long-tail code distributions. The work directly targets a known bottleneck in deployable medical NLP models.

major comments (3)

[Abstract] Abstract: the headline 27% micro-F1 lift on 18K rare codes is presented without any baseline model specification, statistical significance test, or explicit frequency threshold used to define the rare-code set; these omissions leave the central empirical claim only weakly supported.
[Experiments] Experiments (assumed §4): no size-matched ablation is reported that adds an equivalent number of additional ICD-10-only examples to the training set; without this control, it is impossible to separate the benefit of multi-version training from the simple effect of larger training corpus size.
[Methods] Methods (assumed §3): the description of the label-wise attention architecture does not specify whether ICD-9 and ICD-10 code embeddings or attention parameters are shared or kept separate, which is load-bearing for the claim that the model learns transferable cross-version representations without explicit mapping.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a concise statement of the exact dataset sizes (number of notes or tokens) for the ICD-9 and ICD-10 portions.
[Results] Notation for micro-F1 versus macro-F1 should be introduced once and used consistently when reporting results on rare versus frequent codes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, providing clarifications from the manuscript and committing to revisions where they will strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: the headline 27% micro-F1 lift on 18K rare codes is presented without any baseline model specification, statistical significance test, or explicit frequency threshold used to define the rare-code set; these omissions leave the central empirical claim only weakly supported.

Authors: We agree that the abstract would be strengthened by including these details for self-containment. The baseline is the label-wise attention model trained on ICD-10 data alone, as described in Section 4. Statistical significance of the improvements is assessed in the results (Section 4.3). The rare-code set is defined via a frequency threshold in the experimental setup (Section 4.2), yielding the reported 18K codes. We will revise the abstract to explicitly reference the baseline, note the significance testing, and state the frequency threshold used. revision: yes
Referee: [Experiments] Experiments (assumed §4): no size-matched ablation is reported that adds an equivalent number of additional ICD-10-only examples to the training set; without this control, it is impossible to separate the benefit of multi-version training from the simple effect of larger training corpus size.

Authors: This is a fair and important point for isolating the contribution of cross-version data. While Table 1 reports the differing training set sizes and the gains are most pronounced on rare codes (where additional volume alone would be less impactful), a direct size-matched control is absent. We will add this ablation in the revised experiments section by augmenting the ICD-10-only training set with an equivalent volume of additional ICD-10 examples to match the multi-version corpus size. revision: yes
Referee: [Methods] Methods (assumed §3): the description of the label-wise attention architecture does not specify whether ICD-9 and ICD-10 code embeddings or attention parameters are shared or kept separate, which is load-bearing for the claim that the model learns transferable cross-version representations without explicit mapping.

Authors: We appreciate the referee highlighting this for clarity. In the modified label-wise attention model (Section 3), the attention parameters are shared across versions to enable transferable representations, while code embeddings remain version-specific to accommodate differences in definitions and granularity. We will add an explicit statement in the methods section detailing this sharing strategy to make the architecture unambiguous. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison on held-out data

full rationale

The paper reports experimental results from training a label-wise attention model on combined ICD-9/ICD-10 data versus ICD-10 alone, then measuring micro-F1 and macro metrics on a held-out ICD-10 test set. No equations, derivations, or self-referential definitions appear in the abstract or described setup. Performance numbers are direct outputs of standard train/test splits and are externally falsifiable; they do not reduce to any fitted quantity defined by the same data or to a self-citation chain. The central claim therefore remains an independent empirical observation rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities beyond the generic assumption that the attention model can ingest mixed-version labels.

pith-pipeline@v0.9.0 · 5701 in / 1040 out tokens · 41997 ms · 2026-05-20T11:51:25.870984+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Code like humans: A multi-agent solution for medical coding

Motzfeldt, Andreas Geert and Edin, Joakim and Christensen, Casper L and Hardmeier, Christian and Maaløe, Lars and Rogers, Anna. Code like humans: A multi-agent solution for medical coding. Findings of the Association for Computational Linguistics: EMNLP 2025

work page 2025
[2]

ICD -11: an international classification of diseases for the twenty-first century

Harrison, James E and Weber, Stefanie and Jakob, Robert and Chute, Christopher G. ICD -11: an international classification of diseases for the twenty-first century. BMC Medical Informatics and Decision Making

work page
[3]

MedDCR : Learning to design agentic workflows for medical coding

Zheng, Jiyang and Nassar, Islam and Vu, Thanh and Zhong, Xu and Lin, Yang and Liu, Tongliang and Duong, Long and Li, Yuan-Fang. MedDCR : Learning to design agentic workflows for medical coding. arXiv [cs.AI]. arXiv:2511.13361

work page arXiv
[4]

Improving rare and common ICD coding via a multi-agent LLM -based approach

Li, Rumeng and Wang, Xun and Yu, Hong. Improving rare and common ICD coding via a multi-agent LLM -based approach. Proceedings of the 34th ACM International Conference on Information and Knowledge Management

work page
[5]

An unsupervised approach to achieve supervised-level explainability in healthcare records

Edin, Joakim and Maistro, Maria and Maaløe, Lars and Borgholt, Lasse and Havtorn, Jakob Drachmann and Ruotsalo, Tuukka. An unsupervised approach to achieve supervised-level explainability in healthcare records. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

work page 2024
[6]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

He, Kai and Mao, Rui and Lin, Qika and Ruan, Yucheng and Lan, Xiang and Feng, Mengling and Cambria, Erik. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. An International Journal on Information Fusion

work page
[7]

Surpassing GPT- 4 medical coding with a two-stage approach

Yang, Zhichao and Batra, Sanjit Singh and Stremmel, Joel and Halperin, Eran. Surpassing GPT -4 medical coding with a two-stage approach. arXiv [cs.CL]. arXiv:2311.13735

work page arXiv
[8]

On the cross-lingual transferability of monolingual representations

Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani. On the cross-lingual transferability of monolingual representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

work page
[9]

Feasibility of replacing the ICD -10- CM with the ICD -11 for morbidity coding: A content analysis

Fung, Kin Wah and Xu, Julia and McConnell-Lamptey, Shannon and Pickett, Donna and Bodenreider, Olivier. Feasibility of replacing the ICD -10- CM with the ICD -11 for morbidity coding: A content analysis. Journal of the American Medical Informatics Association

work page
[10]

CoRelation : Boosting Automatic ICD Coding through Contextualized Code Relation Learning

Luo, Junyu and Wang, Xiaochen and Wang, Jiaqi and Chang, Aofei and Wang, Yaqing and Ma, Fenglong. CoRelation : Boosting Automatic ICD Coding through Contextualized Code Relation Learning. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

work page 2024
[11]

A Label Attention Model for ICD Coding from Clinical Text

Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Anthony. A Label Attention Model for ICD Coding from Clinical Text. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20

work page
[12]

Explainable Prediction of Medical Codes from Clinical Text

Mullenbach, James and Wiegreffe, Sarah and Duke, Jon and Sun, Jimeng and Eisenstein, Jacob. Explainable Prediction of Medical Codes from Clinical Text. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

work page 2018
[13]

The Clinician and Dataset Shift in Artificial Intelligence

Finlayson, Samuel G and Subbaswamy, Adarsh and Singh, Karandeep and Bowers, John and Kupke, Annabel and Zittrain, Jonathan and Kohane, Isaac S and Saria, Suchi. The Clinician and Dataset Shift in Artificial Intelligence. The New England journal of medicine

work page
[14]

Automated Medical Coding on MIMIC - III and MIMIC - IV : A Critical Review and Replicability Study

Edin, Joakim and Junge, Alexander and Havtorn, Jakob D and Borgholt, Lasse and Maistro, Maria and Ruotsalo, Tuukka and Maaløe, Lars. Automated Medical Coding on MIMIC - III and MIMIC - IV : A Critical Review and Replicability Study. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

work page
[15]

Extracting international classification of diseases codes from clinical documentation using large language models

Simmons, Ashley and Takkavatakarn, Kullaya and McDougal, Megan and Dilcher, Brian and Pincavitch, Jami and Meadows, Lukas and Kauffman, Justin and Klang, Eyal and Wig, Rebecca and Smith, Gordon and Soroush, Ali and Freeman, Robert and Apakama, Donald J and Charney, Alexander W and Kohli-Seth, Roopa and Nadkarni, Girish N and Sakhuja, Ankit. Extracting int...

work page
[16]

Deep learning for automatic ICD coding: Review, opportunities and challenges

Li, Xiaobo and Zhang, Yijia and Hou, Xiaodi and Wang, Shilong and Lin, Hongfei. Deep learning for automatic ICD coding: Review, opportunities and challenges. Artificial intelligence in medicine

work page
[17]

Automated clinical coding: what, why, and where we are?

Dong, Hang and Falis, Matúš and Whiteley, William and Alex, Beatrice and Matterson, Joshua and Ji, Shaoxiong and Chen, Jiaoyan and Wu, Honghan. Automated clinical coding: what, why, and where we are?. NPJ digital medicine

work page
[18]

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022

Wu, Honghan and Wang, Minhong and Wu, Jinge and Francis, Farah and Chang, Yun-Hsuan and Shavick, Alex and Dong, Hang and Poon, Michael T C and Fitzpatrick, Natalie and Levine, Adam P and Slater, Luke T and Handy, Alex and Karwath, Andreas and Gkoutos, Georgios V and Chelala, Claude and Shah, Anoop Dinesh and Stewart, Robert and Collier, Nigel and Alex, Be...

work page 2007
[19]

Beyond label attention: Transparency in language models for automated medical coding via dictionary learning

Wu, John and Wu, David and Sun, Jimeng. Beyond label attention: Transparency in language models for automated medical coding via dictionary learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

work page 2024
[20]

Combining classifiers in text categorization

Larkey, Leah S and Croft, W Bruce. Combining classifiers in text categorization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval

work page
[21]

Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding

Yuan, Zheng and Tan, Chuanqi and Huang, Songfang. Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

work page
[22]

A systematic literature review of automated clinical coding and classification systems

Stanfill, Mary H and Williams, Margaret and Fenton, Susan H and Jenders, Robert A and Hersh, William R. A systematic literature review of automated clinical coding and classification systems. Journal of the American Medical Informatics Association: JAMIA

work page
[23]

Towards Automated ICD Coding Using Deep Learning

Shi, Haoran and Xie, Pengtao and Hu, Zhiting and Zhang, Ming and Xing, Eric P. Towards Automated ICD Coding Using Deep Learning. arXiv [cs.CL]. arXiv:1711.04075

work page arXiv
[24]

MIMIC- IV , a freely accessible electronic health record dataset

Johnson, Alistair E W and Bulgarelli, Lucas and Shen, Lu and Gayles, Alvin and Shammout, Ayad and Horng, Steven and Pollard, Tom J and Moody, Benjamin and Gow, Brian and Lehman, Li-Wei H and Celi, Leo A and Mark, Roger G. MIMIC- IV , a freely accessible electronic health record dataset. Scientific data

work page
[25]

PLM - ICD : Automatic ICD Coding with Pretrained Language Models

Huang, Chao-Wei and Tsai, Shang-Chi and Chen, Yun-Nung. PLM - ICD : Automatic ICD Coding with Pretrained Language Models. Proceedings of the 4th Clinical Natural Language Processing Workshop

work page
[26]

and Zimlichman Eyal and Barash Yiftach and Freeman Robert and Charney Alexander W

Soroush Ali and Glicksberg Benjamin S. and Zimlichman Eyal and Barash Yiftach and Freeman Robert and Charney Alexander W. and Nadkarni Girish N and Klang Eyal. Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying. NEJM AI

work page
[27]

Attention Is All You Need

Vaswani, Ashish and Shazeer, Noam M and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia. Attention is All you Need. Neural Information Processing Systems. 1706.03762

work page internal anchor Pith review Pith/arXiv arXiv
[28]

A unified review of deep learning for automated medical coding

Ji, Shaoxiong and Li, Xiaobo and Sun, Wei and Dong, Hang and Taalas, Ara and Zhang, Yijia and Wu, Honghan and Pitkänen, Esa and Marttinen, Pekka. A unified review of deep learning for automated medical coding. ACM computing surveys

work page
[29]

Aligning AI research with the needs of clinical coding workflows: Eight recommendations based on US data analysis and critical review

Gan, Yidong and Rybinski, Maciej and Hachey, Ben and Kummerfeld, Jonathan K. Aligning AI research with the needs of clinical coding workflows: Eight recommendations based on US data analysis and critical review. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page
[30]

Less is more: Explainable and efficient ICD code prediction with clinical entities

Douglas, James C and Gan, Yidong and Hachey, Ben and Kummerfeld, Jonathan K. Less is more: Explainable and efficient ICD code prediction with clinical entities. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page
[31]

Automated ICD coding using extreme multi-label long text transformer-based models

Liu, Leibo and Perez-Concha, Oscar and Nguyen, Anthony and Bennett, Vicki and Jorm, Louisa. Automated ICD coding using extreme multi-label long text transformer-based models. Artificial intelligence in medicine

work page
[32]

MDACE : MIMIC documents annotated with code evidence

Cheng, Hua and Jafari, Rana and Russell, April and Klopfer, Russell and Lu, Edmond and Striner, Benjamin and Gormley, Matthew. MDACE : MIMIC documents annotated with code evidence. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page

[1] [1]

Code like humans: A multi-agent solution for medical coding

Motzfeldt, Andreas Geert and Edin, Joakim and Christensen, Casper L and Hardmeier, Christian and Maaløe, Lars and Rogers, Anna. Code like humans: A multi-agent solution for medical coding. Findings of the Association for Computational Linguistics: EMNLP 2025

work page 2025

[2] [2]

ICD -11: an international classification of diseases for the twenty-first century

Harrison, James E and Weber, Stefanie and Jakob, Robert and Chute, Christopher G. ICD -11: an international classification of diseases for the twenty-first century. BMC Medical Informatics and Decision Making

work page

[3] [3]

MedDCR : Learning to design agentic workflows for medical coding

Zheng, Jiyang and Nassar, Islam and Vu, Thanh and Zhong, Xu and Lin, Yang and Liu, Tongliang and Duong, Long and Li, Yuan-Fang. MedDCR : Learning to design agentic workflows for medical coding. arXiv [cs.AI]. arXiv:2511.13361

work page arXiv

[4] [4]

Improving rare and common ICD coding via a multi-agent LLM -based approach

Li, Rumeng and Wang, Xun and Yu, Hong. Improving rare and common ICD coding via a multi-agent LLM -based approach. Proceedings of the 34th ACM International Conference on Information and Knowledge Management

work page

[5] [5]

An unsupervised approach to achieve supervised-level explainability in healthcare records

Edin, Joakim and Maistro, Maria and Maaløe, Lars and Borgholt, Lasse and Havtorn, Jakob Drachmann and Ruotsalo, Tuukka. An unsupervised approach to achieve supervised-level explainability in healthcare records. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

work page 2024

[6] [6]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

He, Kai and Mao, Rui and Lin, Qika and Ruan, Yucheng and Lan, Xiang and Feng, Mengling and Cambria, Erik. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. An International Journal on Information Fusion

work page

[7] [7]

Surpassing GPT- 4 medical coding with a two-stage approach

Yang, Zhichao and Batra, Sanjit Singh and Stremmel, Joel and Halperin, Eran. Surpassing GPT -4 medical coding with a two-stage approach. arXiv [cs.CL]. arXiv:2311.13735

work page arXiv

[8] [8]

On the cross-lingual transferability of monolingual representations

Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani. On the cross-lingual transferability of monolingual representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

work page

[9] [9]

Feasibility of replacing the ICD -10- CM with the ICD -11 for morbidity coding: A content analysis

Fung, Kin Wah and Xu, Julia and McConnell-Lamptey, Shannon and Pickett, Donna and Bodenreider, Olivier. Feasibility of replacing the ICD -10- CM with the ICD -11 for morbidity coding: A content analysis. Journal of the American Medical Informatics Association

work page

[10] [10]

CoRelation : Boosting Automatic ICD Coding through Contextualized Code Relation Learning

Luo, Junyu and Wang, Xiaochen and Wang, Jiaqi and Chang, Aofei and Wang, Yaqing and Ma, Fenglong. CoRelation : Boosting Automatic ICD Coding through Contextualized Code Relation Learning. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

work page 2024

[11] [11]

A Label Attention Model for ICD Coding from Clinical Text

Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Anthony. A Label Attention Model for ICD Coding from Clinical Text. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20

work page

[12] [12]

Explainable Prediction of Medical Codes from Clinical Text

Mullenbach, James and Wiegreffe, Sarah and Duke, Jon and Sun, Jimeng and Eisenstein, Jacob. Explainable Prediction of Medical Codes from Clinical Text. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

work page 2018

[13] [13]

The Clinician and Dataset Shift in Artificial Intelligence

Finlayson, Samuel G and Subbaswamy, Adarsh and Singh, Karandeep and Bowers, John and Kupke, Annabel and Zittrain, Jonathan and Kohane, Isaac S and Saria, Suchi. The Clinician and Dataset Shift in Artificial Intelligence. The New England journal of medicine

work page

[14] [14]

Automated Medical Coding on MIMIC - III and MIMIC - IV : A Critical Review and Replicability Study

Edin, Joakim and Junge, Alexander and Havtorn, Jakob D and Borgholt, Lasse and Maistro, Maria and Ruotsalo, Tuukka and Maaløe, Lars. Automated Medical Coding on MIMIC - III and MIMIC - IV : A Critical Review and Replicability Study. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

work page

[15] [15]

Extracting international classification of diseases codes from clinical documentation using large language models

Simmons, Ashley and Takkavatakarn, Kullaya and McDougal, Megan and Dilcher, Brian and Pincavitch, Jami and Meadows, Lukas and Kauffman, Justin and Klang, Eyal and Wig, Rebecca and Smith, Gordon and Soroush, Ali and Freeman, Robert and Apakama, Donald J and Charney, Alexander W and Kohli-Seth, Roopa and Nadkarni, Girish N and Sakhuja, Ankit. Extracting int...

work page

[16] [16]

Deep learning for automatic ICD coding: Review, opportunities and challenges

Li, Xiaobo and Zhang, Yijia and Hou, Xiaodi and Wang, Shilong and Lin, Hongfei. Deep learning for automatic ICD coding: Review, opportunities and challenges. Artificial intelligence in medicine

work page

[17] [17]

Automated clinical coding: what, why, and where we are?

Dong, Hang and Falis, Matúš and Whiteley, William and Alex, Beatrice and Matterson, Joshua and Ji, Shaoxiong and Chen, Jiaoyan and Wu, Honghan. Automated clinical coding: what, why, and where we are?. NPJ digital medicine

work page

[18] [18]

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022

Wu, Honghan and Wang, Minhong and Wu, Jinge and Francis, Farah and Chang, Yun-Hsuan and Shavick, Alex and Dong, Hang and Poon, Michael T C and Fitzpatrick, Natalie and Levine, Adam P and Slater, Luke T and Handy, Alex and Karwath, Andreas and Gkoutos, Georgios V and Chelala, Claude and Shah, Anoop Dinesh and Stewart, Robert and Collier, Nigel and Alex, Be...

work page 2007

[19] [19]

Beyond label attention: Transparency in language models for automated medical coding via dictionary learning

Wu, John and Wu, David and Sun, Jimeng. Beyond label attention: Transparency in language models for automated medical coding via dictionary learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

work page 2024

[20] [20]

Combining classifiers in text categorization

Larkey, Leah S and Croft, W Bruce. Combining classifiers in text categorization. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval

work page

[21] [21]

Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding

Yuan, Zheng and Tan, Chuanqi and Huang, Songfang. Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

work page

[22] [22]

A systematic literature review of automated clinical coding and classification systems

Stanfill, Mary H and Williams, Margaret and Fenton, Susan H and Jenders, Robert A and Hersh, William R. A systematic literature review of automated clinical coding and classification systems. Journal of the American Medical Informatics Association: JAMIA

work page

[23] [23]

Towards Automated ICD Coding Using Deep Learning

Shi, Haoran and Xie, Pengtao and Hu, Zhiting and Zhang, Ming and Xing, Eric P. Towards Automated ICD Coding Using Deep Learning. arXiv [cs.CL]. arXiv:1711.04075

work page arXiv

[24] [24]

MIMIC- IV , a freely accessible electronic health record dataset

Johnson, Alistair E W and Bulgarelli, Lucas and Shen, Lu and Gayles, Alvin and Shammout, Ayad and Horng, Steven and Pollard, Tom J and Moody, Benjamin and Gow, Brian and Lehman, Li-Wei H and Celi, Leo A and Mark, Roger G. MIMIC- IV , a freely accessible electronic health record dataset. Scientific data

work page

[25] [25]

PLM - ICD : Automatic ICD Coding with Pretrained Language Models

Huang, Chao-Wei and Tsai, Shang-Chi and Chen, Yun-Nung. PLM - ICD : Automatic ICD Coding with Pretrained Language Models. Proceedings of the 4th Clinical Natural Language Processing Workshop

work page

[26] [26]

and Zimlichman Eyal and Barash Yiftach and Freeman Robert and Charney Alexander W

Soroush Ali and Glicksberg Benjamin S. and Zimlichman Eyal and Barash Yiftach and Freeman Robert and Charney Alexander W. and Nadkarni Girish N and Klang Eyal. Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying. NEJM AI

work page

[27] [27]

Attention Is All You Need

Vaswani, Ashish and Shazeer, Noam M and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia. Attention is All you Need. Neural Information Processing Systems. 1706.03762

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

A unified review of deep learning for automated medical coding

Ji, Shaoxiong and Li, Xiaobo and Sun, Wei and Dong, Hang and Taalas, Ara and Zhang, Yijia and Wu, Honghan and Pitkänen, Esa and Marttinen, Pekka. A unified review of deep learning for automated medical coding. ACM computing surveys

work page

[29] [29]

Aligning AI research with the needs of clinical coding workflows: Eight recommendations based on US data analysis and critical review

Gan, Yidong and Rybinski, Maciej and Hachey, Ben and Kummerfeld, Jonathan K. Aligning AI research with the needs of clinical coding workflows: Eight recommendations based on US data analysis and critical review. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page

[30] [30]

Less is more: Explainable and efficient ICD code prediction with clinical entities

Douglas, James C and Gan, Yidong and Hachey, Ben and Kummerfeld, Jonathan K. Less is more: Explainable and efficient ICD code prediction with clinical entities. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page

[31] [31]

Automated ICD coding using extreme multi-label long text transformer-based models

Liu, Leibo and Perez-Concha, Oscar and Nguyen, Anthony and Bennett, Vicki and Jorm, Louisa. Automated ICD coding using extreme multi-label long text transformer-based models. Artificial intelligence in medicine

work page

[32] [32]

MDACE : MIMIC documents annotated with code evidence

Cheng, Hua and Jafari, Rana and Russell, April and Klopfer, Russell and Lu, Edmond and Striner, Benjamin and Gormley, Matthew. MDACE : MIMIC documents annotated with code evidence. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page