Dr.Quad at MEDIQA 2019: Towards Textual Inference and Question Entailment using contextualized representations

Aditi Chaudhary; Ashwin Srinivasan; Eric Nyberg; James Route; Teruko Mitamura; Vinayshekhar Bannihatti Kumar

arxiv: 1907.10136 · v1 · pith:VBU72ZTMnew · submitted 2019-07-23 · 💻 cs.CL

Dr.Quad at MEDIQA 2019: Towards Textual Inference and Question Entailment using contextualized representations

Vinayshekhar Bannihatti Kumar , Ashwin Srinivasan , Aditi Chaudhary , James Route , Teruko Mitamura , Eric Nyberg This is my paper

Pith reviewed 2026-05-24 17:07 UTC · model grok-4.3

classification 💻 cs.CL

keywords textual entailmentquestion entailmentmedical domaindata augmentationcontextualized representationsnatural language inferenceshared task

0 comments

The pith

Incorporating medical domain knowledge through data augmentation improves performance on textual inference and question entailment tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a system submitted to the 2019 shared task on textual inference and question entailment in medicine. It starts from a multi-task learning method for entailment and tests adaptations of general language models to the medical domain. The central result is that data augmentation using domain knowledge proves effective for handling the challenges of specialized fields. This approach matters because it offers a way to leverage existing models when domain-specific data is limited. Readers can see how domain adaptation via augmentation helps bridge general NLP capabilities to practical medical applications.

Core claim

Our submissions to the ACL-BioNLP 2019 shared task demonstrate that incorporating domain knowledge through data augmentation is a powerful strategy for addressing challenges posed by specialized domains such as medicine, based on extending prior multi-task objective functions for textual entailment to contextualized representations.

What carries the argument

Data augmentation strategy for injecting medical domain knowledge into multi-task textual entailment models using contextualized representations.

If this is right

Improved results on the MEDIQA 2019 test sets for textual inference and question entailment.
Effective generalization of state-of-the-art language models to the medical domain.
Data augmentation as a key method for domain adaptation in specialized NLP tasks.
Applicability of the multi-task framework to medical question answering scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar augmentation techniques could be tested in other data-scarce domains like legal or scientific text.
Combining this with newer larger language models might yield further gains.
The method highlights the value of domain knowledge even when using pre-trained contextual representations.

Load-bearing premise

Performance improvements on the shared task are mainly attributable to the data augmentation approach rather than other implementation details.

What would settle it

A controlled experiment showing no significant performance difference when the same model is trained without the domain-specific data augmentation would falsify the central claim.

read the original abstract

This paper presents the submissions by Team Dr.Quad to the ACL-BioNLP 2019 shared task on Textual Inference and Question Entailment in the Medical Domain. Our system is based on the prior work Liu et al. (2019) which uses a multi-task objective function for textual entailment. In this work, we explore different strategies for generalizing state-of-the-art language understanding models to the specialized medical domain. Our results on the shared task demonstrate that incorporating domain knowledge through data augmentation is a powerful strategy for addressing challenges posed by specialized domains such as medicine.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Shared-task system paper applies Liu et al. multi-task model plus data augmentation to medical domain but supplies no ablations or controls to support the main claim.

read the letter

This is a shared-task system description that applies Liu et al.'s multi-task entailment model to the medical domain with data augmentation, but the results don't isolate what the augmentation contributes. The paper describes Team Dr.Quad's entries in the MEDIQA 2019 task on textual inference and question entailment. They start from the prior multi-task setup and explore domain adaptation tactics, highlighting data augmentation as the way to bring in medical knowledge. The abstract claims this shows augmentation is powerful for specialized domains. What works here is the straightforward application to a real shared task. Shared task reports like this can give the community quick ideas on what engineering choices people made for the medical setting. The soft spot is the lack of evidence for the main claim. The stress test points out correctly that there's no ablation keeping everything else fixed while changing only the augmentation. Without that, or even basic baselines and numbers in the abstract, you can't tell if the scores come from the augmentation or from model choice, training details, or other tweaks. The paper seems to be an empirical report rather than a controlled study. This kind of work is mainly for people already in the shared task or doing medical NLP who want to see one team's approach. It doesn't add a new framework or first-principles result. I wouldn't bring it to a reading group. I wouldn't cite it in my own work. It probably doesn't need a serious referee for a main conference; the evidence is too thin for the conclusion drawn. If the full paper has more details on the experiments, that might change things, but based on what's here it reads as an incremental engineering note.

Referee Report

2 major / 1 minor

Summary. The paper reports Team Dr.Quad's submissions to the MEDIQA 2019 shared task on textual inference and question entailment in the medical domain. The system builds on Liu et al. (2019)'s multi-task entailment model; the authors explore domain-adaptation strategies for language models and conclude that incorporating domain knowledge via data augmentation is a powerful approach for specialized domains such as medicine.

Significance. If the performance differences can be causally attributed to the data-augmentation component, the result would supply a practical data-centric recipe for domain adaptation in medical NLP. The work supplies a concrete system description for a shared-task setting, which can serve as a reference point for subsequent participants.

major comments (2)

[Abstract] Abstract: the claim that 'our results on the shared task demonstrate that incorporating domain knowledge through data augmentation is a powerful strategy' is unsupported by any ablation, baseline comparison, or controlled experiment; the manuscript supplies no tables, figures, or sections describing training details, hyper-parameters, or the exact augmentation procedure.
The system description states that it is 'based on the prior work Liu et al. (2019)' yet provides no account of which components were held fixed versus modified; without an ablation that toggles only the data-augmentation step, observed test-set scores cannot be attributed primarily to that component rather than to model selection or training choices.

minor comments (1)

The manuscript would benefit from an explicit list of the data-augmentation operations performed and the size of the augmented training set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review of our shared-task system paper. We address the major comments below, noting that this is a concise system description for the MEDIQA 2019 task rather than a methods paper with full ablations. We are prepared to expand the manuscript with additional details on procedures and comparisons where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'our results on the shared task demonstrate that incorporating domain knowledge through data augmentation is a powerful strategy' is unsupported by any ablation, baseline comparison, or controlled experiment; the manuscript supplies no tables, figures, or sections describing training details, hyper-parameters, or the exact augmentation procedure.

Authors: We agree that the abstract claim would be strengthened by explicit ablations and training details, which are absent from the current manuscript. As a shared-task system paper, our focus was on describing the submitted system and its leaderboard performance rather than controlled experiments. The claim reflects our development observations that medical-domain data augmentation improved adaptation over the base contextualized model, but we acknowledge this is not demonstrated via isolated comparisons in the text. We will revise to include a dedicated section on the augmentation procedure, hyper-parameters, and any available baseline comparisons from our internal runs. revision: partial
Referee: [—] The system description states that it is 'based on the prior work Liu et al. (2019)' yet provides no account of which components were held fixed versus modified; without an ablation that toggles only the data-augmentation step, observed test-set scores cannot be attributed primarily to that component rather than to model selection or training choices.

Authors: The system reuses the multi-task entailment architecture from Liu et al. (2019) with the primary modification being the addition of medical knowledge via data augmentation for domain adaptation; the base model, objective, and training framework were held fixed. We did not include an explicit ablation isolating only the augmentation step in the manuscript. While the shared-task results provide an external benchmark against other systems, we recognize that internal controlled comparisons would better support attribution. We will add a clarification paragraph detailing fixed versus modified components and any relevant development-set comparisons in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical shared-task report with no internal derivation

full rationale

The paper is an empirical report of participation in the MEDIQA 2019 shared task. It describes a system built on Liu et al. (2019) multi-task entailment and explores data-augmentation strategies for the medical domain, with results presented as observed performance on the shared-task test sets. There are no equations, no fitted parameters renamed as predictions, no uniqueness theorems, and no derivation chain that reduces to its own inputs. The central claim is an empirical observation about augmentation, not a mathematical derivation, so none of the enumerated circularity patterns apply.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities appear in the abstract; the work is an empirical system description.

pith-pipeline@v0.9.0 · 5649 in / 842 out tokens · 16439 ms · 2026-05-24T17:07:40.409405+00:00 · methodology

Dr.Quad at MEDIQA 2019: Towards Textual Inference and Question Entailment using contextualized representations

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)