UW-BHI at MEDIQA 2019: An Analysis of Representation Methods for Medical Natural Language Inference
Pith reviewed 2026-05-25 00:02 UTC · model grok-4.3
The pith
The performance and internal representations of an ESIM model on MedNLI depend on whether BERT, ESP or Cui2Vec supplies the input representations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The choice of representation method among BERT, ESP, and Cui2Vec influences both the accuracy achieved by the ESIM on the MedNLI task and the characteristics of the model's internal representations.
What carries the argument
The Enhanced Sequential Inference Model (ESIM) operating under different embedding conditions from BERT, ESP, or Cui2Vec.
If this is right
- Different representation methods will produce different levels of performance on the MedNLI task.
- The internal representations learned by the model will reflect the properties of the chosen input embeddings.
- The MedNLI task can serve to distinguish the strengths of knowledge-based versus purely distributed representations.
Where Pith is reading between the lines
- Results from this comparison could guide selection of embeddings for other clinical inference tasks beyond MedNLI.
- The analysis opens the possibility of hybrid representations that combine the strengths observed from each method.
Load-bearing premise
The MedNLI task relies heavily on semantic understanding and therefore serves as a suitable evaluation set for comparing the representation methods.
What would settle it
If the three representation methods produce identical performance scores and indistinguishable internal representations in the ESIM on the MedNLI dataset, the claim that they differ would be falsified.
Figures
read the original abstract
Recent advances in distributed language modeling have led to large performance increases on a variety of natural language processing (NLP) tasks. However, it is not well understood how these methods may be augmented by knowledge-based approaches. This paper compares the performance and internal representation of an Enhanced Sequential Inference Model (ESIM) between three experimental conditions based on the representation method: Bidirectional Encoder Representations from Transformers (BERT), Embeddings of Semantic Predications (ESP), or Cui2Vec. The methods were evaluated on the Medical Natural Language Inference (MedNLI) subtask of the MEDIQA 2019 shared task. This task relied heavily on semantic understanding and thus served as a suitable evaluation set for the comparison of these representation methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares the performance and internal representation of an Enhanced Sequential Inference Model (ESIM) between three experimental conditions based on the representation method: BERT, Embeddings of Semantic Predications (ESP), or Cui2Vec. The methods were evaluated on the Medical Natural Language Inference (MedNLI) subtask of the MEDIQA 2019 shared task, which the authors argue is suitable because it relies heavily on semantic understanding.
Significance. If the results demonstrate clear, reproducible differences in how these representations capture medical semantics inside a fixed ESIM architecture, the work would help clarify when knowledge-based embeddings augment or underperform contextual models such as BERT on clinical inference tasks. The explicit focus on both performance and internal representations is a strength that could inform embedding selection in medical NLP.
major comments (1)
- [Abstract] Abstract: the description of an empirical comparison is given, yet no performance numbers, error bars, training details, or evaluation metrics are reported. Without these data it is impossible to verify whether the three representation methods produce distinguishable results on MedNLI.
minor comments (1)
- [Abstract] Abstract: the statement that MedNLI 'served as a suitable evaluation set' is asserted without a supporting citation or short rationale linking the task's semantic demands to the three chosen representations.
Simulated Author's Rebuttal
We thank the referee for their feedback. We address the single major comment below and agree that revisions to the abstract are warranted.
read point-by-point responses
-
Referee: [Abstract] Abstract: the description of an empirical comparison is given, yet no performance numbers, error bars, training details, or evaluation metrics are reported. Without these data it is impossible to verify whether the three representation methods produce distinguishable results on MedNLI.
Authors: We agree the abstract should report key results to allow immediate assessment of whether the representation methods yield distinguishable outcomes. In revision we will add the primary accuracy figures for the three ESIM variants (BERT, ESP, Cui2Vec) on the MedNLI test set and state that accuracy is the reported metric. Full training hyper-parameters, random seeds, and any error bars or statistical tests belong in the experimental setup and results sections rather than the abstract; we will ensure those sections already contain or will be expanded to contain this information so that the distinguishability of the three conditions can be verified. revision: yes
Circularity Check
No significant circularity
full rationale
The paper conducts a straightforward empirical comparison of three off-the-shelf representation methods (BERT, ESP, Cui2Vec) inside a fixed ESIM architecture on the MedNLI task. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The justification that MedNLI requires semantic understanding is an explicit, non-circular assumption consistent with the experimental goal. The work contains no load-bearing steps that reduce to their own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The berkeley framenet project. In Pro- ceedings of the 36th Annual Meeting of the Asso- ciation for Computational Linguistics and 17th In- ternational Conference on Computational Linguis- tics - Volume 1 , ACL ’98/COLING ’98, pages 86– 90, Stroudsburg, PA, USA. Association for Com- putational Linguistics. https://doi.org/10. 3115/980845.980860. Andrew L. ...
-
[2]
Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vec- tors with subword information. Transactions of the Association for Computational Linguistics , 5:135–
work page 2017
-
[3]
Marc Mézard and Andrea Montanari.Information, Physics, and Computation
https://doi.org/10.1162/tacl_a_ 00051. Antoine Bordes and Jason Weston. 2009. Learn- ing Structured Embeddings of Knowledge Bases. Artificial Intelligence , (Bengio):301–306. https://doi.org/10.1016/j.procs. 2017.05.045. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Procee...
-
[4]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Indexing by latent semantic analysis. JASIS, 41:391–407. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: pre-training of deep bidirectional transformers for language under- standing. CoRR, abs/1810.04805. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Pe- ters, Michael Schmitz, a...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/312624.312649 2018
-
[5]
In In Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pages 103–6
Random indexing of text samples for latent semantic analysis. In In Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pages 103–6. Erlbaum. William R Kearns and Jason A Thomas. 2018. Re- source and response type classification for consumer health question answering. AMIA Annual Sym- posium proceedings. AMIA Symposium , 2018:634– 6...
work page 2018
-
[6]
Character-Aware Neural Language Models
SemMedDB: A PubMed-scale repository of biomedical semantic predications. Bioinfor- matics, 28(23):3158–3160. https://doi.org/ 10.1093/bioinformatics/bts591. Yoon Kim, Yacine Jernite, David Sontag, and Alexan- der M. Rush. 2015. Character-aware neural lan- guage models. CoRR, abs/1508.06615. Staffan Larsson and David R. Traum. 2000. In- formation state and...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/bioinformatics/bts591 2015
-
[7]
Lessons from Natural Language Inference in the Clinical Domain
https://doi.org/10.1016/j.jbi. 2003.11.003. Kirk Roberts and Dina Demner-fushman. 2016. An- notating Logical Forms for EHR Questions. In Proceedings of the 10th International Conference on Language Resources and Evaluation , Section 3, pages 3772–3778. Alexey Romanov and Chaitanya Shivade. 2018. Lessons from natural language inference in the clin- ical do...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.jbi 2003
-
[8]
Basic Reasoning with Tensor Product Representations
chapter Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194–281. MIT Press, Cambridge, MA, USA. P. Smolensky. 1990. Tensor product variable bind- ing and the representation of symbolic structures in connectionist systems. Artif. Intell. , 46(1- 2):159–216. https://doi.org/10.1016/ 0004-3702(90)90007-M. Paul Smolensky, Moo...
work page internal anchor Pith review Pith/arXiv arXiv 1990
-
[9]
Recognizing mentions of adverse drug re- action in social media using knowledge-infused re- current models. In Proceedings of the 15th Con- ference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Pa- pers, pages 142–151, Valencia, Spain. Association for Computational Linguistics. Swabha Swayamdipta, Sam Thomson, Ke...
-
[10]
Syntactic Scaffolds for Semantic Structures
Syntactic scaffolds for semantic structures. CoRR, abs/1808.10485. Joseph Turian, Lev-Arie Ratinov, and Yoshua Bengio
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Word representations: A simple and general method for semi-supervised learning. In Proceed- ings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384–394, Up- psala, Sweden. Association for Computational Lin- guistics. Peter D. Turney and Patrick Pantel. 2010. From fre- quency to meaning: Vector space models of se- mantic...
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[12]
Deep learning via semi-supervised embed- ding. In Proceedings of the 25th International Con- ference on Machine Learning , ICML ’08, pages 1168–1175, New York, NY , USA. ACM. https: //doi.org/10.1145/1390156.1390303. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V . Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, ...
-
[13]
Character-level Convolutional Networks for Text Classification
Character-level convolutional networks for text classification. CoRR, abs/1509.01626
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.