A generic rule-based system for clinical trial patient selection
Pith reviewed 2026-05-24 21:19 UTC · model grok-4.3
The pith
A generic rule-based natural language pipeline can identify patients meeting heterogeneous inclusion and exclusion criteria for clinical trials.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors demonstrate that their generic rule-based natural language pipeline supports the task of identifying patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial.
What carries the argument
A single generic set of hand-crafted rules applied through a natural language processing pipeline to patient records.
If this is right
- Eligibility screening for new trials can reuse the same rule collection rather than requiring fresh model training.
- The pipeline generalizes to criteria that differ substantially in wording and structure.
- Hand-crafted rules supply an interpretable decision path that can be inspected and updated by domain experts.
- Rule-based methods remain competitive for patient-matching tasks even when the input criteria vary widely.
Where Pith is reading between the lines
- The same rule-collection strategy might transfer to other medical-document tasks whose logic can be stated as explicit conditions.
- If the rules were generated from a small set of examples rather than written by hand, development time for new trials could shrink further.
- The approach highlights a trade-off between the effort to write comprehensive rules and the effort to label data for machine-learning alternatives.
Load-bearing premise
The heterogeneous inclusion and exclusion criteria can be captured and applied reliably by one fixed collection of hand-crafted rules without task-specific machine learning or per-criterion tuning.
What would settle it
A new collection of criteria, written independently of the existing rule set, on which the pipeline's accuracy falls well below the level reported for the original test data.
read the original abstract
The n2c2 2018 Challenge task 1 aimed to identify patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. We demonstrate a generic rule-based natural language pipeline can support this task with decent performance (the average F1 score on the test set is 0.89, ranked the 8th out of 45 teams ).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a rule-based natural language processing pipeline for the n2c2 2018 Challenge task 1 on identifying patients who meet heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. It claims that a generic rule-based system (without task-specific machine learning) achieves an average F1 score of 0.89 on the test set, ranking 8th out of 45 teams.
Significance. A genuinely generic, reusable rule-based pipeline that reliably handles diverse criteria with minimal per-criterion tuning would be significant as an interpretable alternative to ML approaches in clinical NLP, where explainability and limited data are common constraints. The reported ranking indicates practical competitiveness if the genericity claim holds.
major comments (2)
- [Abstract] Abstract: the claim that a single 'generic' rule-based pipeline suffices for heterogeneous criteria without 'extensive per-criterion tuning' is not supported by any demonstration (e.g., rule examples, shared vs. specific rule counts, or development-effort breakdown) that a compact shared rule base was used rather than conventional criterion-by-criterion engineering.
- [Methods] Methods: if the pipeline is implemented via a large collection of criterion-specific patterns (as the skeptic note anticipates), the F1 result no longer substantiates the 'generic' qualifier and reduces to standard rule engineering, which is load-bearing for the central claim.
minor comments (2)
- The abstract states the F1 score but supplies no rule examples, error analysis, or validation details, limiting evaluation of how the rules were applied across criteria.
- Consider including a small table of representative rules to illustrate reuse across inclusion/exclusion criteria.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the genericity claim. We address each major comment below and will revise the manuscript accordingly to provide the requested evidence.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that a single 'generic' rule-based pipeline suffices for heterogeneous criteria without 'extensive per-criterion tuning' is not supported by any demonstration (e.g., rule examples, shared vs. specific rule counts, or development-effort breakdown) that a compact shared rule base was used rather than conventional criterion-by-criterion engineering.
Authors: We agree that the abstract lacks explicit supporting demonstrations. In the revised manuscript we will update the abstract to reference the rule composition and add a new subsection (or table) in Methods that provides rule examples, counts of shared versus criterion-specific rules, and a development-effort breakdown to substantiate the compact shared rule base. revision: yes
-
Referee: [Methods] Methods: if the pipeline is implemented via a large collection of criterion-specific patterns (as the skeptic note anticipates), the F1 result no longer substantiates the 'generic' qualifier and reduces to standard rule engineering, which is load-bearing for the central claim.
Authors: The pipeline applies a single rule engine across all criteria using reusable patterns for common clinical concepts, with only limited per-criterion adaptation; the competitive ranking without task-specific ML supports this design. Nevertheless, we acknowledge the need for explicit quantification. We will revise the Methods section to include a quantitative breakdown of shared versus adapted rules and the extent of per-criterion effort, thereby clarifying that the approach is not conventional criterion-by-criterion engineering. revision: yes
Circularity Check
No circularity: empirical evaluation on external shared-task benchmark
full rationale
The paper presents an empirical rule-based NLP system evaluated on the n2c2 2018 Challenge test set, reporting F1=0.89 without any equations, fitted parameters, derivations, or self-citations that reduce claims to inputs by construction. The central claim rests on performance against an external benchmark rather than any self-referential loop or renamed fit. No load-bearing step matches the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Carlisle B, Kimmelman J, Ramsay T, MacKinnon N. Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials. Clin Trials. 2015;12(1):77-83. doi:10.1177/1740774514558307
-
[2]
Recruiting vulnerable populations into research: a systematic review of recruitment interventions
UyBico SJ, Pavel S, Gross CP. Recruiting vulnerable populations into research: a systematic review of recruitment interventions. J Gen Intern Med. 2007;22(6):852-863. doi:10.1007/s11606-007-0126-3
-
[3]
Köpcke F, Trinczek B, Majeed RW, et al. Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence. BMC Med Inform Decis Mak. 2013;13(1):37. doi:10.1186/1472-6947-13-37
-
[4]
BMC Medical Informatics and Decision Making , author =
Ni Y, Wright J, Perentesis J, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre- screening for pediatric oncology patients. BMC Med Inform Decis Mak. 2015;15(1):28. doi:10.1186/s12911-015-0149-3
-
[5]
Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility
Penberthy L, Brown R, Puma F, Dahman B. Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials. 2010;31(3):207-217. doi:10.1016/j.cct.2010.03.005
-
[6]
Electronic health records to facilitate clinical research
Cowie MR, Blomster JI, Curtis LH, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106(1):1-9. doi:10.1007/s00392-016-1025-6
-
[7]
Petkov VI, Penberthy LT, Dahman BA, Poklepovic A, Gillam CW, McDermott JH. Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials. Exp Biol Med (Maywood). 2013;238(12):1370-1378. doi:10.1177/1535370213508172
-
[8]
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: A systematic review. J Am Med Informatics Assoc. 2016;23(5):1007-1015. doi:10.1093/jamia/ocv180
-
[9]
Recognizing obesity and comorbidities in sparse data
Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16(4):561-570. doi:10.1197/jamia.M3115
-
[10]
Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Heal Informatics. 2018;22(5):1589-1604. doi:10.1109/JBHI.2017.2767063 7
-
[11]
Chiticariu L, Li Y, Reiss F. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined. 2013. https://www.semanticscholar.org/paper/Rule-Based-Information-Extraction-is-Dead!-Long- Chiticariu-Li/0692b9ad39145f57237199f3d4488667d5d9e5e7. Accessed January 17, 2019
work page 2013
-
[12]
Shi J, Hurdle JF. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable. J Biomed Inform. 2018;85:106-113. doi:10.1016/J.JBI.2018.08.002
-
[13]
Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing
Shi J, Mowery D, Zhang M, Sanders J, Chapman W, Gawron L. Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI). IEEE; 2017:568-571. doi:10.1109/ICHI.2017.21
-
[14]
Shi J, Arego J, Barney R, Hurdle FJ. Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification. In: Theodore H Stanley Research Symposium. Salt Lake City; 2017. https://medicine.utah.edu/anesthesiology/research-symposium/pdf/2.pdf
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.