Neurosymbolic Clinical Trial Matching via LLM-Driven Abduction and Logical Verification
Pith reviewed 2026-06-26 17:01 UTC · model grok-4.3
The pith
A neurosymbolic framework that pairs LLM abduction with logical verification outperforms pure LLM baselines on clinical trial matching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the αNeSy-CTM framework, which uses LLMs to perform abductive reasoning over noisy and underspecified clinical text and then subjects the resulting hypotheses to logical verification, delivers up to 30 percent relative improvement over zero-shot LLM baselines while also increasing accuracy, specificity, and robustness; the same system proves complementary to Chain-of-Thought prompting.
What carries the argument
The αNeSy-CTM abductive neurosymbolic framework, which generates explanatory hypotheses from clinical text via LLMs and then verifies them against eligibility criteria using formal logic.
If this is right
- Up to 30 percent relative gain in matching performance over zero-shot LLM baselines.
- Measurable gains in accuracy, specificity, and robustness versus a non-abductive neurosymbolic baseline.
- Abductive reasoning measurably aids handling of underspecified clinical evidence.
- Strong complementarity with Chain-of-Thought prompting, suggesting scope for combined use.
Where Pith is reading between the lines
- A practical routing policy could send routine cases to Chain-of-Thought and reserve the full neurosymbolic path for ambiguous records.
- The same abduction-plus-verification pattern may transfer to other medical tasks that require matching noisy data against rule sets.
- Grounding LLM outputs in explicit logical checks offers a concrete mechanism for building audit trails in clinical AI systems.
Load-bearing premise
LLMs can produce abductive inferences from incomplete clinical notes that are accurate enough for logical verification to reliably correct or confirm eligibility decisions.
What would settle it
A test collection of patient records and trial criteria in which the LLM abduction step consistently yields hypotheses that cause the logical verifier to output the wrong eligibility label.
Figures
read the original abstract
Large Language Models (LLMs) offer a promising path to automate Clinical Trial Matching (CTM), but still struggle with the deterministic verification required for complex eligibility criteria. Conversely, purely symbolic methods provide formal rigour but break down when faced with incomplete patient records and noisy clinical evidence. To bridge this gap, we investigate a hybrid framework for CTM combining LLMs with logical verification. In particular, we introduce an abductive neurosymbolic CTM framework ({\alpha}NeSy-CTM), which leverages the linguistic and world knowledge in LLMs to support reasoning over noisy and underspecified clinical text. Extensive evaluation demonstrates that {\alpha}NeSy-CTM substantially outperforms standalone LLM baselines, achieving up to 30% relative improvement over zero-shot baselines. In addition, our analyses confirm the impact of abductive reasoning on CTM, with {\alpha}NeSy-CTM exhibiting improved accuracy, specificity, and robustness over a non-abductive neurosymbolic setting. Furthermore, {\alpha}NeSy-CTM and Chain-of-Thought (CoT) reasoning prove highly complementary, highlighting the potential for a hybrid routing policy. Ultimately, this paper demonstrates the impact of neurosymbolic methods for automating CTM, providing a path toward the next generation of auditable, LLM-driven clinical applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces αNeSy-CTM, an abductive neurosymbolic framework for clinical trial matching that combines LLM-driven abduction over noisy clinical text with subsequent logical verification of eligibility criteria. It reports that the approach substantially outperforms standalone LLM baselines (up to 30% relative improvement over zero-shot), yields gains in accuracy/specificity/robustness relative to a non-abductive neurosymbolic variant, and is complementary to Chain-of-Thought prompting.
Significance. If the performance claims hold under rigorous evaluation, the work would demonstrate a concrete benefit of neurosymbolic hybridization for a high-stakes application, providing a path toward more auditable LLM-based clinical tools. The focus on abduction to handle underspecified records directly targets a known failure mode of both pure LLMs and pure symbolic systems.
major comments (2)
- [Abstract] Abstract: the central empirical claim of up to 30% relative improvement (plus gains in accuracy, specificity, and robustness) is stated without any reference to the datasets, number of trials/patients, metrics, statistical tests, exclusion criteria, or concrete baselines. This information is load-bearing for assessing whether the reported gains can be attributed to the abductive + logical pipeline rather than implementation details or dataset artifacts.
- [Abstract] Abstract / Framework: the performance advantage is predicated on LLM abduction producing hypotheses that are sufficiently accurate and complete for deterministic logical verification to correct or improve upon pure-LLM decisions. No error rates for the abduction step, no comparison of abduced facts against ground-truth annotations, and no analysis of how the verifier behaves under incomplete or erroneous inputs are supplied; without these, the 30% gain and robustness claims cannot be evaluated.
minor comments (2)
- [Abstract] Abstract: the acronym αNeSy-CTM is introduced without expansion or explanation of the leading alpha symbol.
- [Abstract] Abstract: the claimed complementarity between αNeSy-CTM and CoT is noted but no concrete routing policy, decision criteria, or joint evaluation protocol is described.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and commit to revisions that improve the clarity and substantiation of our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central empirical claim of up to 30% relative improvement (plus gains in accuracy, specificity, and robustness) is stated without any reference to the datasets, number of trials/patients, metrics, statistical tests, exclusion criteria, or concrete baselines. This information is load-bearing for assessing whether the reported gains can be attributed to the abductive + logical pipeline rather than implementation details or dataset artifacts.
Authors: We agree that the abstract presents results at a high level. The evaluation details (datasets, scale, metrics, and baselines) appear in Sections 4 and 5. We will revise the abstract to briefly reference the key datasets, approximate numbers of patients and trials, primary metrics, and main baselines so that the 30% claim can be contextualized immediately. revision: yes
-
Referee: [Abstract] Abstract / Framework: the performance advantage is predicated on LLM abduction producing hypotheses that are sufficiently accurate and complete for deterministic logical verification to correct or improve upon pure-LLM decisions. No error rates for the abduction step, no comparison of abduced facts against ground-truth annotations, and no analysis of how the verifier behaves under incomplete or erroneous inputs are supplied; without these, the 30% gain and robustness claims cannot be evaluated.
Authors: The referee correctly notes the absence of explicit abduction-step diagnostics. The current manuscript reports end-to-end results rather than intermediate accuracy. We will add a dedicated analysis (new subsection in the experiments) that reports available error rates for abduced facts, provides comparisons against ground-truth annotations where they exist, and examines verifier behavior on incomplete or noisy inputs. Where ground-truth annotations are unavailable we will supply qualitative examples and note the limitation. revision: yes
Circularity Check
No circularity; empirical evaluation of hybrid LLM-symbolic framework
full rationale
The paper introduces an abductive neurosymbolic framework (αNeSy-CTM) for clinical trial matching and reports performance gains from extensive evaluation against LLM baselines. No equations, fitted parameters, or derivation chains appear in the provided text. Claims of up to 30% relative improvement and gains in accuracy/specificity are presented as direct empirical observations from experiments, not as outputs that reduce by construction to the inputs or to self-citations. The central premise relies on the (untested in the abstract) accuracy of LLM abductions, but this is a correctness/empirical-validity issue rather than circularity. The derivation is self-contained as an empirical demonstration.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs contain linguistic and world knowledge sufficient to support abductive reasoning over noisy clinical text
invented entities (1)
-
αNeSy-CTM framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
ISBN 979-8-89176-348-7
Association for Computational Linguis- tics. ISBN 979-8-89176-348-7. doi: 10. 18653/v1/2025.mathnlp-main.8. URL https:// aclanthology.org/2025.mathnlp-main.8/. OpenAI. gpt-oss-120b & gpt-oss-20b model card,
2025
-
[2]
URL https://arxiv.org/abs/2508. 10925. Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3806–3824, 2023. Xin Quan, Marco Valentino, Louise A. Dennis, and André Freita...
2023
-
[3]
doi: 10.1007/s13755-026-00428-z. Team Olmo et al. Olmo 3, 2025. URL https: //arxiv.org/abs/2512.13961. Shaun Treweek, Mike Pitkethly, Jonathan Cook, Cynthia Fraser, Emma Mitchell, Frank Sulli- van, Cathy Jackson, Tanja K. Taskila, and Helen Gardner. Strategies to improve recruitment to randomised trials.The Cochrane Database of Systematic Reviews, 2013(11...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s13755-026-00428-z 2025
-
[4]
patient is an adult
**Internal Reasoning (Abduction)**: - First, analyze the patient data to infer implicit facts. For example, if "patient is an adult", you should use`age >= 18`in the Z3 code. If a patient has "metastatic cancer", you should use`is_metastatic = True `. - Then, compare the patient data against the trial criteria to identify missing information
-
[5]
**Define Variables**: Based on your reasoning, represent all relevant attributes as Z3 variables (Int, Bool, Real)
-
[6]
This allows the solver to explore possibilities
**Handle Missing Information**: For any information identified as missing (e.g.,'ECOG_status'), create a Z3 variable for it (e.g.,` ECOG_status = Int('ECOG_status')`) but do not add a fact to constrain its value. This allows the solver to explore possibilities
-
[10]
approximately equal
**Approximation**: Represent " approximately equal" (\~) using an epsilon range: |X-Y| < Epsilon. Here is an example to follow: ==== Starting SMT Formalisation ==== Input Premise: Patient 001. A 52-year-old male with long-standing type 2 diabetes, essential hypertension, and anaemia of chronic disease presents as a former smoker who quit a decade ago; he ...
-
[11]
**Variables and Types:** - Age: Int - Gender: Bool (true for male, false for female) - Haemoglobin: Real - CreatinineClearance: Real - PlateletCount: Real - PregnancyStatus: Bool (true for pregnant, false for not pregnant) - ActiveMalignancy: Bool (true for active malignancy, false for no active malignancy)
-
[12]
You are a Z3 formalization expert. Output ONLY executable Z3 code. You MUST wrap the entire code in a single'```z3'code block
**Constraints:** - Age must be between 18 and 65. - Haemoglobin must be at least 9 g/dL. - Creatinine clearance must be at least 60 mL/min. - Platelet count must be at least 100 x 10^3 cells/uL. - Patient must not be pregnant. - Patient must have no active malignancy . ==== SMT Formalisation Complete ==== Initial code: ; Declare variables and their types ...
-
[13]
**Define Variables**: Represent all patient and trial attributes as Z3 variables (Int, Bool, Real)
-
[14]
**Handle Missing Information**: If a criterion requires information not present in the patient data (e.g.,' EGFR_status'), create a Z3 variable for it (e.g.,`EGFR_status = Bool(' EGFR_status')`) but do not add a fact to constrain its value
-
[15]
**Formulate Logic**:
-
[16]
NEVER use Unicode symbols like >=, <=, ~, !=, etc
**ASCII ONLY**: STRICTLY use only standard ASCII characters for all operators. NEVER use Unicode symbols like >=, <=, ~, !=, etc. Use (>= X Y), (<= X Y), and (!= X Y)
-
[17]
* **Parameter Separation**: do not forget to use commas where needed
**SMT-LIB FORMAT**: Ensure all function calls (e.g., And, Or, define-fun, assert) use correct SMT- LIB syntax. * **Parameter Separation**: do not forget to use commas where needed. * **Parentheses**: Every function call MUST be fully enclosed in parentheses, including nested calls
-
[18]
approximately equal
**Approximation**: Represent " approximately equal" (~) using an epsilon range: |X-Y| < Epsilon. Here is an example to follow: ==== Starting SMT Formalisation ==== Input Premise: Patient 001. A 52-year-old male with long-standing type 2 diabetes, essential hypertension, and anaemia of chronic disease presents as a former smoker who quit a decade ago; he h...
-
[19]
**Trial Criteria**: Create a list` eligibility_criteria`for all trial requirements
-
[20]
**Solver Logic**: Create a`Solver` instance`s`, add both lists to it
-
[21]
2" (Eligible) or
**Set Result**: Use an`if s.check() == sat:`block to set a variable` result`to "2" (Eligible) or "1" ( Ineligible). - PregnancyStatus: Bool (true for pregnant, false for not pregnant)
-
[22]
"" G.3 Template C: QuaSAR Prompt
**Constraints:** - Age must be between 18 and 65. - Platelet count must be at least 100 x 10^3 cells/uL. - Patient must not be pregnant. - Patient must have no active malignancy . Here is the SMT-LIB model: This model checks whether Patient 001 satisfies all the eligibility criteria for Trial A based on the given constraints. ==== SMT Formalisation Comple...
-
[23]
**Analysis**: Explain what the patient facts and trial rules are based on the Z3 code
-
[24]
**Reasoning**: Logically deduce whether the patient is Eligible, Ineligible
-
[25]
"" G.4 Template D: Chain-of-Thought Prompt f
**Answering**: Conclude with the final answer in the format: < decision>LABEL</decision>, where LABEL is one of'Eligible',' Ineligible'. #Previous Error (if any, please fix your reasoning) {feedback} """ G.4 Template D: Chain-of-Thought Prompt f"Patient description:\n{ topic_description}\n\n" f"Clinical trial document:\n{doc_content }\n\n" "Task: Determin...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.