pith. sign in

arxiv: 1907.06860 · v1 · pith:RD6VXW5Hnew · submitted 2019-07-16 · 💻 cs.CL · cs.CY

A generic rule-based system for clinical trial patient selection

Pith reviewed 2026-05-24 21:19 UTC · model grok-4.3

classification 💻 cs.CL cs.CY
keywords clinical trial patient selectionrule-based systemnatural language processinginclusion criteriaexclusion criteriamedical text classificationeligibility screening
0
0 comments X

The pith

A generic rule-based natural language pipeline can identify patients meeting heterogeneous inclusion and exclusion criteria for clinical trials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that one fixed collection of hand-crafted rules, run through a standard natural language processing pipeline, suffices to decide patient eligibility across many different trial criteria. A sympathetic reader would care because trial eligibility decisions currently require either labor-intensive manual review or separate machine-learning models for each new criterion set. The work shows the rule-based route avoids both, producing usable output on the challenge data. If the claim holds, it indicates that many medical text tasks can be handled by transparent, reusable logic rather than learned models that must be retrained for each new scenario.

Core claim

The authors demonstrate that their generic rule-based natural language pipeline supports the task of identifying patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial.

What carries the argument

A single generic set of hand-crafted rules applied through a natural language processing pipeline to patient records.

If this is right

  • Eligibility screening for new trials can reuse the same rule collection rather than requiring fresh model training.
  • The pipeline generalizes to criteria that differ substantially in wording and structure.
  • Hand-crafted rules supply an interpretable decision path that can be inspected and updated by domain experts.
  • Rule-based methods remain competitive for patient-matching tasks even when the input criteria vary widely.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rule-collection strategy might transfer to other medical-document tasks whose logic can be stated as explicit conditions.
  • If the rules were generated from a small set of examples rather than written by hand, development time for new trials could shrink further.
  • The approach highlights a trade-off between the effort to write comprehensive rules and the effort to label data for machine-learning alternatives.

Load-bearing premise

The heterogeneous inclusion and exclusion criteria can be captured and applied reliably by one fixed collection of hand-crafted rules without task-specific machine learning or per-criterion tuning.

What would settle it

A new collection of criteria, written independently of the existing rule set, on which the pipeline's accuracy falls well below the level reported for the original test data.

read the original abstract

The n2c2 2018 Challenge task 1 aimed to identify patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. We demonstrate a generic rule-based natural language pipeline can support this task with decent performance (the average F1 score on the test set is 0.89, ranked the 8th out of 45 teams ).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a rule-based natural language processing pipeline for the n2c2 2018 Challenge task 1 on identifying patients who meet heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. It claims that a generic rule-based system (without task-specific machine learning) achieves an average F1 score of 0.89 on the test set, ranking 8th out of 45 teams.

Significance. A genuinely generic, reusable rule-based pipeline that reliably handles diverse criteria with minimal per-criterion tuning would be significant as an interpretable alternative to ML approaches in clinical NLP, where explainability and limited data are common constraints. The reported ranking indicates practical competitiveness if the genericity claim holds.

major comments (2)
  1. [Abstract] Abstract: the claim that a single 'generic' rule-based pipeline suffices for heterogeneous criteria without 'extensive per-criterion tuning' is not supported by any demonstration (e.g., rule examples, shared vs. specific rule counts, or development-effort breakdown) that a compact shared rule base was used rather than conventional criterion-by-criterion engineering.
  2. [Methods] Methods: if the pipeline is implemented via a large collection of criterion-specific patterns (as the skeptic note anticipates), the F1 result no longer substantiates the 'generic' qualifier and reduces to standard rule engineering, which is load-bearing for the central claim.
minor comments (2)
  1. The abstract states the F1 score but supplies no rule examples, error analysis, or validation details, limiting evaluation of how the rules were applied across criteria.
  2. Consider including a small table of representative rules to illustrate reuse across inclusion/exclusion criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the genericity claim. We address each major comment below and will revise the manuscript accordingly to provide the requested evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that a single 'generic' rule-based pipeline suffices for heterogeneous criteria without 'extensive per-criterion tuning' is not supported by any demonstration (e.g., rule examples, shared vs. specific rule counts, or development-effort breakdown) that a compact shared rule base was used rather than conventional criterion-by-criterion engineering.

    Authors: We agree that the abstract lacks explicit supporting demonstrations. In the revised manuscript we will update the abstract to reference the rule composition and add a new subsection (or table) in Methods that provides rule examples, counts of shared versus criterion-specific rules, and a development-effort breakdown to substantiate the compact shared rule base. revision: yes

  2. Referee: [Methods] Methods: if the pipeline is implemented via a large collection of criterion-specific patterns (as the skeptic note anticipates), the F1 result no longer substantiates the 'generic' qualifier and reduces to standard rule engineering, which is load-bearing for the central claim.

    Authors: The pipeline applies a single rule engine across all criteria using reusable patterns for common clinical concepts, with only limited per-criterion adaptation; the competitive ranking without task-specific ML supports this design. Nevertheless, we acknowledge the need for explicit quantification. We will revise the Methods section to include a quantitative breakdown of shared versus adapted rules and the extent of per-criterion effort, thereby clarifying that the approach is not conventional criterion-by-criterion engineering. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external shared-task benchmark

full rationale

The paper presents an empirical rule-based NLP system evaluated on the n2c2 2018 Challenge test set, reporting F1=0.89 without any equations, fitted parameters, derivations, or self-citations that reduce claims to inputs by construction. The central claim rests on performance against an external benchmark rather than any self-referential loop or renamed fit. No load-bearing step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model is presented; the work is an applied system description with no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5576 in / 1016 out tokens · 43764 ms · 2026-05-24T21:19:20.812412+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials

    Carlisle B, Kimmelman J, Ramsay T, MacKinnon N. Unsuccessful trial accrual and human subjects protections: an empirical analysis of recently closed trials. Clin Trials. 2015;12(1):77-83. doi:10.1177/1740774514558307

  2. [2]

    Recruiting vulnerable populations into research: a systematic review of recruitment interventions

    UyBico SJ, Pavel S, Gross CP. Recruiting vulnerable populations into research: a systematic review of recruitment interventions. J Gen Intern Med. 2007;22(6):852-863. doi:10.1007/s11606-007-0126-3

  3. [3]

    Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence

    Köpcke F, Trinczek B, Majeed RW, et al. Evaluation of data completeness in the electronic h ealth record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence. BMC Med Inform Decis Mak. 2013;13(1):37. doi:10.1186/1472-6947-13-37

  4. [4]

    BMC Medical Informatics and Decision Making , author =

    Ni Y, Wright J, Perentesis J, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre- screening for pediatric oncology patients. BMC Med Inform Decis Mak. 2015;15(1):28. doi:10.1186/s12911-015-0149-3

  5. [5]

    Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility

    Penberthy L, Brown R, Puma F, Dahman B. Automated matching software fo r clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials. 2010;31(3):207-217. doi:10.1016/j.cct.2010.03.005

  6. [6]

    Electronic health records to facilitate clinical research

    Cowie MR, Blomster JI, Curtis LH, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106(1):1-9. doi:10.1007/s00392-016-1025-6

  7. [7]

    Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials

    Petkov VI, Penberthy LT, Dahman BA, Poklepovic A, Gillam CW, McDermott JH. Automated determination of metastases in unstructured radiology reports for eligibility screening in oncology clinical trials. Exp Biol Med (Maywood). 2013;238(12):1370-1378. doi:10.1177/1535370213508172

  8. [8]

    Extracting information from the text of electronic medical records to improve case detection: A systematic review

    Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: A systematic review. J Am Med Informatics Assoc. 2016;23(5):1007-1015. doi:10.1093/jamia/ocv180

  9. [9]

    Recognizing obesity and comorbidities in sparse data

    Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16(4):561-570. doi:10.1197/jamia.M3115

  10. [10]

    Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

    Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Heal Informatics. 2018;22(5):1589-1604. doi:10.1109/JBHI.2017.2767063 7

  11. [11]

    Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined

    Chiticariu L, Li Y, Reiss F. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! undefined. 2013. https://www.semanticscholar.org/paper/Rule-Based-Information-Extraction-is-Dead!-Long- Chiticariu-Li/0692b9ad39145f57237199f3d4488667d5d9e5e7. Accessed January 17, 2019

  12. [12]

    Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable

    Shi J, Hurdle JF. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable. J Biomed Inform. 2018;85:106-113. doi:10.1016/J.JBI.2018.08.002

  13. [13]

    Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing

    Shi J, Mowery D, Zhang M, Sanders J, Chapman W, Gawron L. Extracting Intrauterine Device Usage from Clinical Texts Using Natural Language Processing. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI). IEEE; 2017:568-571. doi:10.1109/ICHI.2017.21

  14. [14]

    Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification

    Shi J, Arego J, Barney R, Hurdle FJ. Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification. In: Theodore H Stanley Research Symposium. Salt Lake City; 2017. https://medicine.utah.edu/anesthesiology/research-symposium/pdf/2.pdf