A generic rule-based system for clinical trial patient selection

Jianlin Shi; John F. Hurdle; Kevin Graves

arxiv: 1907.06860 · v1 · pith:RD6VXW5Hnew · submitted 2019-07-16 · 💻 cs.CL · cs.CY

A generic rule-based system for clinical trial patient selection

Jianlin Shi , Kevin Graves , John F. Hurdle This is my paper

Pith reviewed 2026-05-24 21:19 UTC · model grok-4.3

classification 💻 cs.CL cs.CY

keywords clinical trial patient selectionrule-based systemnatural language processinginclusion criteriaexclusion criteriamedical text classificationeligibility screening

0 comments

The pith

A generic rule-based natural language pipeline can identify patients meeting heterogeneous inclusion and exclusion criteria for clinical trials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that one fixed collection of hand-crafted rules, run through a standard natural language processing pipeline, suffices to decide patient eligibility across many different trial criteria. A sympathetic reader would care because trial eligibility decisions currently require either labor-intensive manual review or separate machine-learning models for each new criterion set. The work shows the rule-based route avoids both, producing usable output on the challenge data. If the claim holds, it indicates that many medical text tasks can be handled by transparent, reusable logic rather than learned models that must be retrained for each new scenario.

Core claim

The authors demonstrate that their generic rule-based natural language pipeline supports the task of identifying patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial.

What carries the argument

A single generic set of hand-crafted rules applied through a natural language processing pipeline to patient records.

If this is right

Eligibility screening for new trials can reuse the same rule collection rather than requiring fresh model training.
The pipeline generalizes to criteria that differ substantially in wording and structure.
Hand-crafted rules supply an interpretable decision path that can be inspected and updated by domain experts.
Rule-based methods remain competitive for patient-matching tasks even when the input criteria vary widely.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rule-collection strategy might transfer to other medical-document tasks whose logic can be stated as explicit conditions.
If the rules were generated from a small set of examples rather than written by hand, development time for new trials could shrink further.
The approach highlights a trade-off between the effort to write comprehensive rules and the effort to label data for machine-learning alternatives.

Load-bearing premise

The heterogeneous inclusion and exclusion criteria can be captured and applied reliably by one fixed collection of hand-crafted rules without task-specific machine learning or per-criterion tuning.

What would settle it

A new collection of criteria, written independently of the existing rule set, on which the pipeline's accuracy falls well below the level reported for the original test data.

read the original abstract

The n2c2 2018 Challenge task 1 aimed to identify patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. We demonstrate a generic rule-based natural language pipeline can support this task with decent performance (the average F1 score on the test set is 0.89, ranked the 8th out of 45 teams ).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rule-based pipeline hits 0.89 F1 on n2c2 2018 task but the generic claim needs methods details to hold up.

read the letter

The paper reports a rule-based NLP system reaching 0.89 average F1 on the n2c2 2018 patient selection task and placing 8th out of 45 teams. That is the main empirical result worth noting. It applies an existing style of rule-based extraction to a shared-task benchmark with heterogeneous inclusion and exclusion criteria, and the number is tied to the public test set. Rule-based clinical NLP has been around for years, so the contribution is the concrete performance on this specific challenge rather than a new framework. The work shows that hand-crafted rules can deliver usable scores without machine learning, which is a practical data point for trial recruitment screening. The abstract keeps the claim straightforward and avoids overclaiming. The soft spot is the genericity assertion. The abstract calls the pipeline generic, but supplies no rule examples, no count of how many patterns were written, and no description of development effort per criterion. If the methods turn out to be a modest shared rule base that applies across criteria, the result supports the claim. If instead it is a large collection of criterion-specific patterns with separate tuning, then the performance is consistent with standard rule engineering and the generic label does not add much. The stress-test note is accurate on this point. The abstract alone leaves that question open. There are no equations, fitted parameters, or self-referential derivations, so the circularity burden is zero. This paper is mainly for clinical NLP researchers who track baselines on the n2c2 tasks or who want to compare rule-based versus learned systems on patient selection. A reader looking for a reported F1 on that benchmark would find the number useful. It deserves peer review because it supplies a clear, externally verifiable result on a public dataset. Reviewers can request the rule details and error analysis to assess how reusable the pipeline actually is. I would send it to referees rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a rule-based natural language processing pipeline for the n2c2 2018 Challenge task 1 on identifying patients who meet heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. It claims that a generic rule-based system (without task-specific machine learning) achieves an average F1 score of 0.89 on the test set, ranking 8th out of 45 teams.

Significance. A genuinely generic, reusable rule-based pipeline that reliably handles diverse criteria with minimal per-criterion tuning would be significant as an interpretable alternative to ML approaches in clinical NLP, where explainability and limited data are common constraints. The reported ranking indicates practical competitiveness if the genericity claim holds.

major comments (2)

[Abstract] Abstract: the claim that a single 'generic' rule-based pipeline suffices for heterogeneous criteria without 'extensive per-criterion tuning' is not supported by any demonstration (e.g., rule examples, shared vs. specific rule counts, or development-effort breakdown) that a compact shared rule base was used rather than conventional criterion-by-criterion engineering.
[Methods] Methods: if the pipeline is implemented via a large collection of criterion-specific patterns (as the skeptic note anticipates), the F1 result no longer substantiates the 'generic' qualifier and reduces to standard rule engineering, which is load-bearing for the central claim.

minor comments (2)

The abstract states the F1 score but supplies no rule examples, error analysis, or validation details, limiting evaluation of how the rules were applied across criteria.
Consider including a small table of representative rules to illustrate reuse across inclusion/exclusion criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the genericity claim. We address each major comment below and will revise the manuscript accordingly to provide the requested evidence.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that a single 'generic' rule-based pipeline suffices for heterogeneous criteria without 'extensive per-criterion tuning' is not supported by any demonstration (e.g., rule examples, shared vs. specific rule counts, or development-effort breakdown) that a compact shared rule base was used rather than conventional criterion-by-criterion engineering.

Authors: We agree that the abstract lacks explicit supporting demonstrations. In the revised manuscript we will update the abstract to reference the rule composition and add a new subsection (or table) in Methods that provides rule examples, counts of shared versus criterion-specific rules, and a development-effort breakdown to substantiate the compact shared rule base. revision: yes
Referee: [Methods] Methods: if the pipeline is implemented via a large collection of criterion-specific patterns (as the skeptic note anticipates), the F1 result no longer substantiates the 'generic' qualifier and reduces to standard rule engineering, which is load-bearing for the central claim.

Authors: The pipeline applies a single rule engine across all criteria using reusable patterns for common clinical concepts, with only limited per-criterion adaptation; the competitive ranking without task-specific ML supports this design. Nevertheless, we acknowledge the need for explicit quantification. We will revise the Methods section to include a quantitative breakdown of shared versus adapted rules and the extent of per-criterion effort, thereby clarifying that the approach is not conventional criterion-by-criterion engineering. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external shared-task benchmark

full rationale

The paper presents an empirical rule-based NLP system evaluated on the n2c2 2018 Challenge test set, reporting F1=0.89 without any equations, fitted parameters, derivations, or self-citations that reduce claims to inputs by construction. The central claim rests on performance against an external benchmark rather than any self-referential loop or renamed fit. No load-bearing step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model is presented; the work is an applied system description with no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5576 in / 1016 out tokens · 43764 ms · 2026-05-24T21:19:20.812412+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials

Carlisle B, Kimmelman J, Ramsay T, MacKinnon N. Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials. Clin Trials. 2015;12(1):77-83. doi:10.1177/1740774514558307

work page doi:10.1177/1740774514558307 2015
[2]

Recruiting vulnerable populations into research: a systematic review of recruitment interventions

UyBico SJ, Pavel S, Gross CP. Recruiting vulnerable populations into research: a systematic review of recruitment interventions. J Gen Intern Med. 2007;22(6):852-863. doi:10.1007/s11606-007-0126-3

work page doi:10.1007/s11606-007-0126-3 2007
[3]

Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence

Köpcke F, Trinczek B, Majeed RW, et al. Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence. BMC Med Inform Decis Mak. 2013;13(1):37. doi:10.1186/1472-6947-13-37

work page doi:10.1186/1472-6947-13-37 2013
[4]

BMC Medical Informatics and Decision Making , author =

Ni Y, Wright J, Perentesis J, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre- screening for pediatric oncology patients. BMC Med Inform Decis Mak. 2015;15(1):28. doi:10.1186/s12911-015-0149-3

work page doi:10.1186/s12911-015-0149-3 2015
[5]

Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility

Penberthy L, Brown R, Puma F, Dahman B. Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials. 2010;31(3):207-217. doi:10.1016/j.cct.2010.03.005

work page doi:10.1016/j.cct.2010.03.005 2010
[6]

Electronic health records to facilitate clinical research

Cowie MR, Blomster JI, Curtis LH, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106(1):1-9. doi:10.1007/s00392-016-1025-6

work page doi:10.1007/s00392-016-1025-6 2017
[7]

Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials

Petkov VI, Penberthy LT, Dahman BA, Poklepovic A, Gillam CW, McDermott JH. Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials. Exp Biol Med (Maywood). 2013;238(12):1370-1378. doi:10.1177/1535370213508172

work page doi:10.1177/1535370213508172 2013
[8]

Extracting information from the text of electronic medical records to improve case detection: A systematic review

Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: A systematic review. J Am Med Informatics Assoc. 2016;23(5):1007-1015. doi:10.1093/jamia/ocv180

work page doi:10.1093/jamia/ocv180 2016
[9]

Recognizing obesity and comorbidities in sparse data

Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16(4):561-570. doi:10.1197/jamia.M3115

work page doi:10.1197/jamia.m3115 2009
[10]

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Heal Informatics. 2018;22(5):1589-1604. doi:10.1109/JBHI.2017.2767063 7

work page doi:10.1109/jbhi.2017.2767063 2018
[11]

Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined

Chiticariu L, Li Y, Reiss F. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined. 2013. https://www.semanticscholar.org/paper/Rule-Based-Information-Extraction-is-Dead!-Long- Chiticariu-Li/0692b9ad39145f57237199f3d4488667d5d9e5e7. Accessed January 17, 2019

work page 2013
[12]

Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable

Shi J, Hurdle JF. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable. J Biomed Inform. 2018;85:106-113. doi:10.1016/J.JBI.2018.08.002

work page doi:10.1016/j.jbi.2018.08.002 2018
[13]

Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing

Shi J, Mowery D, Zhang M, Sanders J, Chapman W, Gawron L. Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI). IEEE; 2017:568-571. doi:10.1109/ICHI.2017.21

work page doi:10.1109/ichi.2017.21 2017
[14]

Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification

Shi J, Arego J, Barney R, Hurdle FJ. Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification. In: Theodore H Stanley Research Symposium. Salt Lake City; 2017. https://medicine.utah.edu/anesthesiology/research-symposium/pdf/2.pdf

work page 2017

[1] [1]

Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials

Carlisle B, Kimmelman J, Ramsay T, MacKinnon N. Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials. Clin Trials. 2015;12(1):77-83. doi:10.1177/1740774514558307

work page doi:10.1177/1740774514558307 2015

[2] [2]

Recruiting vulnerable populations into research: a systematic review of recruitment interventions

UyBico SJ, Pavel S, Gross CP. Recruiting vulnerable populations into research: a systematic review of recruitment interventions. J Gen Intern Med. 2007;22(6):852-863. doi:10.1007/s11606-007-0126-3

work page doi:10.1007/s11606-007-0126-3 2007

[3] [3]

Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence

Köpcke F, Trinczek B, Majeed RW, et al. Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence. BMC Med Inform Decis Mak. 2013;13(1):37. doi:10.1186/1472-6947-13-37

work page doi:10.1186/1472-6947-13-37 2013

[4] [4]

BMC Medical Informatics and Decision Making , author =

Ni Y, Wright J, Perentesis J, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre- screening for pediatric oncology patients. BMC Med Inform Decis Mak. 2015;15(1):28. doi:10.1186/s12911-015-0149-3

work page doi:10.1186/s12911-015-0149-3 2015

[5] [5]

Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility

Penberthy L, Brown R, Puma F, Dahman B. Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials. 2010;31(3):207-217. doi:10.1016/j.cct.2010.03.005

work page doi:10.1016/j.cct.2010.03.005 2010

[6] [6]

Electronic health records to facilitate clinical research

Cowie MR, Blomster JI, Curtis LH, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106(1):1-9. doi:10.1007/s00392-016-1025-6

work page doi:10.1007/s00392-016-1025-6 2017

[7] [7]

Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials

Petkov VI, Penberthy LT, Dahman BA, Poklepovic A, Gillam CW, McDermott JH. Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials. Exp Biol Med (Maywood). 2013;238(12):1370-1378. doi:10.1177/1535370213508172

work page doi:10.1177/1535370213508172 2013

[8] [8]

Extracting information from the text of electronic medical records to improve case detection: A systematic review

Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: A systematic review. J Am Med Informatics Assoc. 2016;23(5):1007-1015. doi:10.1093/jamia/ocv180

work page doi:10.1093/jamia/ocv180 2016

[9] [9]

Recognizing obesity and comorbidities in sparse data

Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16(4):561-570. doi:10.1197/jamia.M3115

work page doi:10.1197/jamia.m3115 2009

[10] [10]

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Heal Informatics. 2018;22(5):1589-1604. doi:10.1109/JBHI.2017.2767063 7

work page doi:10.1109/jbhi.2017.2767063 2018

[11] [11]

Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined

Chiticariu L, Li Y, Reiss F. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined. 2013. https://www.semanticscholar.org/paper/Rule-Based-Information-Extraction-is-Dead!-Long- Chiticariu-Li/0692b9ad39145f57237199f3d4488667d5d9e5e7. Accessed January 17, 2019

work page 2013

[12] [12]

Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable

Shi J, Hurdle JF. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable. J Biomed Inform. 2018;85:106-113. doi:10.1016/J.JBI.2018.08.002

work page doi:10.1016/j.jbi.2018.08.002 2018

[13] [13]

Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing

Shi J, Mowery D, Zhang M, Sanders J, Chapman W, Gawron L. Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI). IEEE; 2017:568-571. doi:10.1109/ICHI.2017.21

work page doi:10.1109/ichi.2017.21 2017

[14] [14]

Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification

Shi J, Arego J, Barney R, Hurdle FJ. Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification. In: Theodore H Stanley Research Symposium. Salt Lake City; 2017. https://medicine.utah.edu/anesthesiology/research-symposium/pdf/2.pdf

work page 2017