Qwant Research @DEFT 2019: Document matching and information retrieval using clinical cases
Pith reviewed 2026-05-25 01:46 UTC · model grok-4.3
The pith
Language models and hybrid neural-linguistic methods achieve encouraging accuracy on semantic similarity and information extraction from French clinical cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An information extraction system for French clinical cases, implemented in both a neural-network-only version and a version that incorporates linguistic analysis, yields encouraging accuracy on DEFT 2019 task 3, while language-model approaches are used to address semantic similarity matching in task 2.
What carries the argument
Language models for semantic similarity matching paired with dual neural-network and linguistic-analysis pipelines for information extraction.
Load-bearing premise
The DEFT 2019 clinical-case datasets and evaluation metrics are representative enough of real French medical text processing needs for the reported accuracy to indicate useful progress.
What would settle it
Accuracy measurements on an independent collection of French clinical cases and discussions outside the DEFT 2019 set that fall substantially below the reported levels.
read the original abstract
This paper reports on Qwant Research contribution to tasks 2 and 3 of the DEFT 2019's challenge, focusing on French clinical cases analysis. Task 2 is a task on semantic similarity between clinical cases and discussions. For this task, we propose an approach based on language models and evaluate the impact on the results of different preprocessings and matching techniques. For task 3, we have developed an information extraction system yielding very encouraging results accuracy-wise. We have experimented two different approaches, one based on the exclusive use of neural networks, the other based on a linguistic analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports Qwant Research's participation in DEFT 2019 tasks 2 and 3 on French clinical case analysis. For task 2 (semantic similarity between clinical cases and discussions), the authors propose language-model approaches and evaluate the impact of different preprocessings and matching techniques. For task 3 (information extraction), they describe two systems—one based exclusively on neural networks and one on linguistic analysis—claiming that the developed IE system yielded very encouraging accuracy results.
Significance. If the accuracy claims for task 3 are substantiated, the work supplies a direct comparison of neural versus linguistic methods on French clinical text, which could serve as a useful reference point for shared-task participants and for French medical NLP more broadly. The task-2 experiments on preprocessing and matching choices may also help isolate which factors matter most for semantic similarity in this domain.
major comments (1)
- [Abstract] Abstract: the central claim that the information extraction system 'yielded very encouraging results accuracy-wise' is presented without any numerical accuracy figures, baselines, error bars, ablation results, or statistical significance tests, preventing assessment of whether the results actually advance the state of the art.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript describing our participation in the DEFT 2019 shared task. We address the single major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the information extraction system 'yielded very encouraging results accuracy-wise' is presented without any numerical accuracy figures, baselines, error bars, ablation results, or statistical significance tests, preventing assessment of whether the results actually advance the state of the art.
Authors: We agree that the abstract would be strengthened by the inclusion of concrete numerical results to support the claim. The body of the paper reports the accuracy figures for both the neural and linguistic IE systems on task 3, along with comparisons to the other participating systems. To address the referee's concern, we will revise the abstract to include the key accuracy scores achieved by our best system and a brief indication of how it compared to the shared-task baselines and other submissions. This change will allow readers to immediately gauge the performance without needing to consult the full text. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper is a straightforward empirical system report on participation in the DEFT 2019 shared task. It describes two approaches (language-model matching for task 2; neural-network and linguistic IE for task 3) and reports accuracy on the provided benchmark data. No equations, parameter-fitting steps, derivations, or self-citations appear in the text; all claims are direct experimental outcomes on external data, so no load-bearing step reduces to its own inputs by construction.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.