Neurosymbolic Clinical Trial Matching via LLM-Driven Abduction and Logical Verification

Baiyang Qu; Leonardo Ranaldi; Marco Valentino; Xi Wang

arxiv: 2606.20895 · v1 · pith:EBCGIID7new · submitted 2026-06-18 · 💻 cs.AI

Neurosymbolic Clinical Trial Matching via LLM-Driven Abduction and Logical Verification

Baiyang Qu , Leonardo Ranaldi , Xi Wang , Marco Valentino This is my paper

Pith reviewed 2026-06-26 17:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords neurosymbolic AIclinical trial matchingLLM abductionlogical verificationpatient eligibilityhybrid reasoningabductive inference

0 comments

The pith

A neurosymbolic framework that pairs LLM abduction with logical verification outperforms pure LLM baselines on clinical trial matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a hybrid system called αNeSy-CTM can overcome the shortcomings of standalone large language models when deciding whether patients meet complex trial eligibility rules. It does so by letting the LLM generate candidate explanations from incomplete records and then checking those explanations against formal criteria with symbolic logic. A reader would care because trial matching is a high-stakes task where errors waste resources and pure neural methods lack the determinism needed for clinical use. If the claim holds, the work points to a route for making LLM outputs in medicine both more accurate and auditable.

Core claim

The paper claims that the αNeSy-CTM framework, which uses LLMs to perform abductive reasoning over noisy and underspecified clinical text and then subjects the resulting hypotheses to logical verification, delivers up to 30 percent relative improvement over zero-shot LLM baselines while also increasing accuracy, specificity, and robustness; the same system proves complementary to Chain-of-Thought prompting.

What carries the argument

The αNeSy-CTM abductive neurosymbolic framework, which generates explanatory hypotheses from clinical text via LLMs and then verifies them against eligibility criteria using formal logic.

If this is right

Up to 30 percent relative gain in matching performance over zero-shot LLM baselines.
Measurable gains in accuracy, specificity, and robustness versus a non-abductive neurosymbolic baseline.
Abductive reasoning measurably aids handling of underspecified clinical evidence.
Strong complementarity with Chain-of-Thought prompting, suggesting scope for combined use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A practical routing policy could send routine cases to Chain-of-Thought and reserve the full neurosymbolic path for ambiguous records.
The same abduction-plus-verification pattern may transfer to other medical tasks that require matching noisy data against rule sets.
Grounding LLM outputs in explicit logical checks offers a concrete mechanism for building audit trails in clinical AI systems.

Load-bearing premise

LLMs can produce abductive inferences from incomplete clinical notes that are accurate enough for logical verification to reliably correct or confirm eligibility decisions.

What would settle it

A test collection of patient records and trial criteria in which the LLM abduction step consistently yields hypotheses that cause the logical verifier to output the wrong eligibility label.

Figures

Figures reproduced from arXiv: 2606.20895 by Baiyang Qu, Leonardo Ranaldi, Marco Valentino, Xi Wang.

**Figure 1.** Figure 1: Abductive Neurosymbolic Clinical Trial Matching. The framework consists of four main stages: abductive [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Relative improvement over Zero-shot (%) using the 3-backbone average from [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Stacked intersection view for αNeSy-CTM vs. CoT on the 600 paired samples from TREC 2021–2023. Each bar decomposes the yearly outcome distribution into Both correct, Both wrong, CoT-Correct, and αNeSyCTM-Correct; exact counts are reported in Appendix [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Performance under controlled evidence per [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Fine-grained complexity-bucket improvement [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Large Language Models (LLMs) offer a promising path to automate Clinical Trial Matching (CTM), but still struggle with the deterministic verification required for complex eligibility criteria. Conversely, purely symbolic methods provide formal rigour but break down when faced with incomplete patient records and noisy clinical evidence. To bridge this gap, we investigate a hybrid framework for CTM combining LLMs with logical verification. In particular, we introduce an abductive neurosymbolic CTM framework ({\alpha}NeSy-CTM), which leverages the linguistic and world knowledge in LLMs to support reasoning over noisy and underspecified clinical text. Extensive evaluation demonstrates that {\alpha}NeSy-CTM substantially outperforms standalone LLM baselines, achieving up to 30% relative improvement over zero-shot baselines. In addition, our analyses confirm the impact of abductive reasoning on CTM, with {\alpha}NeSy-CTM exhibiting improved accuracy, specificity, and robustness over a non-abductive neurosymbolic setting. Furthermore, {\alpha}NeSy-CTM and Chain-of-Thought (CoT) reasoning prove highly complementary, highlighting the potential for a hybrid routing policy. Ultimately, this paper demonstrates the impact of neurosymbolic methods for automating CTM, providing a path toward the next generation of auditable, LLM-driven clinical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

αNeSy-CTM adds a usable abduction-plus-verification layer on top of LLMs for clinical trial matching and reports clear gains, but the gains rest on unexamined abduction quality.

read the letter

The paper's main move is to name and test αNeSy-CTM: an LLM does abduction to turn noisy patient notes into candidate facts, then a logical verifier checks eligibility criteria against those facts. It shows up to 30% relative lift over zero-shot LLM baselines on accuracy, specificity, and robustness, and it beats a non-abductive neurosymbolic version. The authors also note that the method pairs well with chain-of-thought prompting.

That combination is the concrete new piece. Most prior CTM work either stays inside the LLM or stays inside pure symbolic rules; this one tries to route the hard parts to each. The experiments appear to isolate the abduction step and the complementarity with CoT, which is useful for anyone building hybrid systems.

The soft spot is the one flagged in the stress test. The reported gains require that the LLM abductions are accurate and complete enough for the logic layer to produce reliable decisions. The abstract gives no error rates on the abduced facts, no comparison to gold annotations, and no breakdown of how uncertain or conflicting inputs are handled. If those abductions are often wrong or incomplete, the logical step cannot fix them and the 30% number will not generalize. Without those diagnostics the central claim stays hard to evaluate.

This is aimed at clinical NLP groups and people working on neurosymbolic hybrids for real data. It has a named method, a measurable empirical signal, and a practical angle, so it clears the bar for peer review even though the evaluation details will need scrutiny.

Referee Report

2 major / 2 minor

Summary. The paper introduces αNeSy-CTM, an abductive neurosymbolic framework for clinical trial matching that combines LLM-driven abduction over noisy clinical text with subsequent logical verification of eligibility criteria. It reports that the approach substantially outperforms standalone LLM baselines (up to 30% relative improvement over zero-shot), yields gains in accuracy/specificity/robustness relative to a non-abductive neurosymbolic variant, and is complementary to Chain-of-Thought prompting.

Significance. If the performance claims hold under rigorous evaluation, the work would demonstrate a concrete benefit of neurosymbolic hybridization for a high-stakes application, providing a path toward more auditable LLM-based clinical tools. The focus on abduction to handle underspecified records directly targets a known failure mode of both pure LLMs and pure symbolic systems.

major comments (2)

[Abstract] Abstract: the central empirical claim of up to 30% relative improvement (plus gains in accuracy, specificity, and robustness) is stated without any reference to the datasets, number of trials/patients, metrics, statistical tests, exclusion criteria, or concrete baselines. This information is load-bearing for assessing whether the reported gains can be attributed to the abductive + logical pipeline rather than implementation details or dataset artifacts.
[Abstract] Abstract / Framework: the performance advantage is predicated on LLM abduction producing hypotheses that are sufficiently accurate and complete for deterministic logical verification to correct or improve upon pure-LLM decisions. No error rates for the abduction step, no comparison of abduced facts against ground-truth annotations, and no analysis of how the verifier behaves under incomplete or erroneous inputs are supplied; without these, the 30% gain and robustness claims cannot be evaluated.

minor comments (2)

[Abstract] Abstract: the acronym αNeSy-CTM is introduced without expansion or explanation of the leading alpha symbol.
[Abstract] Abstract: the claimed complementarity between αNeSy-CTM and CoT is noted but no concrete routing policy, decision criteria, or joint evaluation protocol is described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and commit to revisions that improve the clarity and substantiation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim of up to 30% relative improvement (plus gains in accuracy, specificity, and robustness) is stated without any reference to the datasets, number of trials/patients, metrics, statistical tests, exclusion criteria, or concrete baselines. This information is load-bearing for assessing whether the reported gains can be attributed to the abductive + logical pipeline rather than implementation details or dataset artifacts.

Authors: We agree that the abstract presents results at a high level. The evaluation details (datasets, scale, metrics, and baselines) appear in Sections 4 and 5. We will revise the abstract to briefly reference the key datasets, approximate numbers of patients and trials, primary metrics, and main baselines so that the 30% claim can be contextualized immediately. revision: yes
Referee: [Abstract] Abstract / Framework: the performance advantage is predicated on LLM abduction producing hypotheses that are sufficiently accurate and complete for deterministic logical verification to correct or improve upon pure-LLM decisions. No error rates for the abduction step, no comparison of abduced facts against ground-truth annotations, and no analysis of how the verifier behaves under incomplete or erroneous inputs are supplied; without these, the 30% gain and robustness claims cannot be evaluated.

Authors: The referee correctly notes the absence of explicit abduction-step diagnostics. The current manuscript reports end-to-end results rather than intermediate accuracy. We will add a dedicated analysis (new subsection in the experiments) that reports available error rates for abduced facts, provides comparisons against ground-truth annotations where they exist, and examines verifier behavior on incomplete or noisy inputs. Where ground-truth annotations are unavailable we will supply qualitative examples and note the limitation. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical evaluation of hybrid LLM-symbolic framework

full rationale

The paper introduces an abductive neurosymbolic framework (αNeSy-CTM) for clinical trial matching and reports performance gains from extensive evaluation against LLM baselines. No equations, fitted parameters, or derivation chains appear in the provided text. Claims of up to 30% relative improvement and gains in accuracy/specificity are presented as direct empirical observations from experiments, not as outputs that reduce by construction to the inputs or to self-citations. The central premise relies on the (untested in the abstract) accuracy of LLM abductions, but this is a correctness/empirical-validity issue rather than circularity. The derivation is self-contained as an empirical demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review yields minimal ledger entries; the central claim rests on the unelaborated premise that LLMs already contain usable world knowledge for abduction on clinical text.

axioms (1)

domain assumption LLMs contain linguistic and world knowledge sufficient to support abductive reasoning over noisy clinical text
Invoked when the abstract states that the framework 'leverages the linguistic and world knowledge in LLMs to support reasoning over noisy and underspecified clinical text'.

invented entities (1)

αNeSy-CTM framework no independent evidence
purpose: Hybrid abductive neurosymbolic CTM system
Newly named and described in the abstract; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5763 in / 1271 out tokens · 28454 ms · 2026-06-26T17:01:06.092019+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 1 canonical work pages · 1 internal anchor

[1]

ISBN 979-8-89176-348-7

Association for Computational Linguis- tics. ISBN 979-8-89176-348-7. doi: 10. 18653/v1/2025.mathnlp-main.8. URL https:// aclanthology.org/2025.mathnlp-main.8/. OpenAI. gpt-oss-120b & gpt-oss-20b model card,

2025
[2]

URL https://arxiv.org/abs/2508. 10925. Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3806–3824, 2023. Xin Quan, Marco Valentino, Louise A. Dennis, and André Freita...

2023
[3]

Olmo 3

doi: 10.1007/s13755-026-00428-z. Team Olmo et al. Olmo 3, 2025. URL https: //arxiv.org/abs/2512.13961. Shaun Treweek, Mike Pitkethly, Jonathan Cook, Cynthia Fraser, Emma Mitchell, Frank Sulli- van, Cathy Jackson, Tanja K. Taskila, and Helen Gardner. Strategies to improve recruitment to randomised trials.The Cochrane Database of Systematic Reviews, 2013(11...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s13755-026-00428-z 2025
[4]

patient is an adult

**Internal Reasoning (Abduction)**: - First, analyze the patient data to infer implicit facts. For example, if "patient is an adult", you should use`age >= 18`in the Z3 code. If a patient has "metastatic cancer", you should use`is_metastatic = True `. - Then, compare the patient data against the trial criteria to identify missing information
[5]

**Define Variables**: Based on your reasoning, represent all relevant attributes as Z3 variables (Int, Bool, Real)
[6]

This allows the solver to explore possibilities

**Handle Missing Information**: For any information identified as missing (e.g.,'ECOG_status'), create a Z3 variable for it (e.g.,` ECOG_status = Int('ECOG_status')`) but do not add a fact to constrain its value. This allows the solver to explore possibilities
[10]

approximately equal

**Approximation**: Represent " approximately equal" (\~) using an epsilon range: |X-Y| < Epsilon. Here is an example to follow: ==== Starting SMT Formalisation ==== Input Premise: Patient 001. A 52-year-old male with long-standing type 2 diabetes, essential hypertension, and anaemia of chronic disease presents as a former smoker who quit a decade ago; he ...
[11]

**Variables and Types:** - Age: Int - Gender: Bool (true for male, false for female) - Haemoglobin: Real - CreatinineClearance: Real - PlateletCount: Real - PregnancyStatus: Bool (true for pregnant, false for not pregnant) - ActiveMalignancy: Bool (true for active malignancy, false for no active malignancy)
[12]

You are a Z3 formalization expert. Output ONLY executable Z3 code. You MUST wrap the entire code in a single'```z3'code block

**Constraints:** - Age must be between 18 and 65. - Haemoglobin must be at least 9 g/dL. - Creatinine clearance must be at least 60 mL/min. - Platelet count must be at least 100 x 10^3 cells/uL. - Patient must not be pregnant. - Patient must have no active malignancy . ==== SMT Formalisation Complete ==== Initial code: ; Declare variables and their types ...
[13]

**Define Variables**: Represent all patient and trial attributes as Z3 variables (Int, Bool, Real)
[14]

**Handle Missing Information**: If a criterion requires information not present in the patient data (e.g.,' EGFR_status'), create a Z3 variable for it (e.g.,`EGFR_status = Bool(' EGFR_status')`) but do not add a fact to constrain its value
[15]

**Formulate Logic**:
[16]

NEVER use Unicode symbols like >=, <=, ~, !=, etc

**ASCII ONLY**: STRICTLY use only standard ASCII characters for all operators. NEVER use Unicode symbols like >=, <=, ~, !=, etc. Use (>= X Y), (<= X Y), and (!= X Y)
[17]

* **Parameter Separation**: do not forget to use commas where needed

**SMT-LIB FORMAT**: Ensure all function calls (e.g., And, Or, define-fun, assert) use correct SMT- LIB syntax. * **Parameter Separation**: do not forget to use commas where needed. * **Parentheses**: Every function call MUST be fully enclosed in parentheses, including nested calls
[18]

approximately equal

**Approximation**: Represent " approximately equal" (~) using an epsilon range: |X-Y| < Epsilon. Here is an example to follow: ==== Starting SMT Formalisation ==== Input Premise: Patient 001. A 52-year-old male with long-standing type 2 diabetes, essential hypertension, and anaemia of chronic disease presents as a former smoker who quit a decade ago; he h...
[19]

**Trial Criteria**: Create a list` eligibility_criteria`for all trial requirements
[20]

**Solver Logic**: Create a`Solver` instance`s`, add both lists to it
[21]

2" (Eligible) or

**Set Result**: Use an`if s.check() == sat:`block to set a variable` result`to "2" (Eligible) or "1" ( Ineligible). - PregnancyStatus: Bool (true for pregnant, false for not pregnant)
[22]

"" G.3 Template C: QuaSAR Prompt

**Constraints:** - Age must be between 18 and 65. - Platelet count must be at least 100 x 10^3 cells/uL. - Patient must not be pregnant. - Patient must have no active malignancy . Here is the SMT-LIB model: This model checks whether Patient 001 satisfies all the eligibility criteria for Trial A based on the given constraints. ==== SMT Formalisation Comple...
[23]

**Analysis**: Explain what the patient facts and trial rules are based on the Z3 code
[24]

**Reasoning**: Logically deduce whether the patient is Eligible, Ineligible
[25]

"" G.4 Template D: Chain-of-Thought Prompt f

**Answering**: Conclude with the final answer in the format: < decision>LABEL</decision>, where LABEL is one of'Eligible',' Ineligible'. #Previous Error (if any, please fix your reasoning) {feedback} """ G.4 Template D: Chain-of-Thought Prompt f"Patient description:\n{ topic_description}\n\n" f"Clinical trial document:\n{doc_content }\n\n" "Task: Determin...

2021

[1] [1]

ISBN 979-8-89176-348-7

Association for Computational Linguis- tics. ISBN 979-8-89176-348-7. doi: 10. 18653/v1/2025.mathnlp-main.8. URL https:// aclanthology.org/2025.mathnlp-main.8/. OpenAI. gpt-oss-120b & gpt-oss-20b model card,

2025

[2] [2]

URL https://arxiv.org/abs/2508. 10925. Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3806–3824, 2023. Xin Quan, Marco Valentino, Louise A. Dennis, and André Freita...

2023

[3] [3]

Olmo 3

doi: 10.1007/s13755-026-00428-z. Team Olmo et al. Olmo 3, 2025. URL https: //arxiv.org/abs/2512.13961. Shaun Treweek, Mike Pitkethly, Jonathan Cook, Cynthia Fraser, Emma Mitchell, Frank Sulli- van, Cathy Jackson, Tanja K. Taskila, and Helen Gardner. Strategies to improve recruitment to randomised trials.The Cochrane Database of Systematic Reviews, 2013(11...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s13755-026-00428-z 2025

[4] [4]

patient is an adult

**Internal Reasoning (Abduction)**: - First, analyze the patient data to infer implicit facts. For example, if "patient is an adult", you should use`age >= 18`in the Z3 code. If a patient has "metastatic cancer", you should use`is_metastatic = True `. - Then, compare the patient data against the trial criteria to identify missing information

[5] [5]

**Define Variables**: Based on your reasoning, represent all relevant attributes as Z3 variables (Int, Bool, Real)

[6] [6]

This allows the solver to explore possibilities

**Handle Missing Information**: For any information identified as missing (e.g.,'ECOG_status'), create a Z3 variable for it (e.g.,` ECOG_status = Int('ECOG_status')`) but do not add a fact to constrain its value. This allows the solver to explore possibilities

[7] [10]

approximately equal

**Approximation**: Represent " approximately equal" (\~) using an epsilon range: |X-Y| < Epsilon. Here is an example to follow: ==== Starting SMT Formalisation ==== Input Premise: Patient 001. A 52-year-old male with long-standing type 2 diabetes, essential hypertension, and anaemia of chronic disease presents as a former smoker who quit a decade ago; he ...

[8] [11]

**Variables and Types:** - Age: Int - Gender: Bool (true for male, false for female) - Haemoglobin: Real - CreatinineClearance: Real - PlateletCount: Real - PregnancyStatus: Bool (true for pregnant, false for not pregnant) - ActiveMalignancy: Bool (true for active malignancy, false for no active malignancy)

[9] [12]

You are a Z3 formalization expert. Output ONLY executable Z3 code. You MUST wrap the entire code in a single'```z3'code block

**Constraints:** - Age must be between 18 and 65. - Haemoglobin must be at least 9 g/dL. - Creatinine clearance must be at least 60 mL/min. - Platelet count must be at least 100 x 10^3 cells/uL. - Patient must not be pregnant. - Patient must have no active malignancy . ==== SMT Formalisation Complete ==== Initial code: ; Declare variables and their types ...

[10] [13]

**Define Variables**: Represent all patient and trial attributes as Z3 variables (Int, Bool, Real)

[11] [14]

**Handle Missing Information**: If a criterion requires information not present in the patient data (e.g.,' EGFR_status'), create a Z3 variable for it (e.g.,`EGFR_status = Bool(' EGFR_status')`) but do not add a fact to constrain its value

[12] [15]

**Formulate Logic**:

[13] [16]

NEVER use Unicode symbols like >=, <=, ~, !=, etc

**ASCII ONLY**: STRICTLY use only standard ASCII characters for all operators. NEVER use Unicode symbols like >=, <=, ~, !=, etc. Use (>= X Y), (<= X Y), and (!= X Y)

[14] [17]

* **Parameter Separation**: do not forget to use commas where needed

**SMT-LIB FORMAT**: Ensure all function calls (e.g., And, Or, define-fun, assert) use correct SMT- LIB syntax. * **Parameter Separation**: do not forget to use commas where needed. * **Parentheses**: Every function call MUST be fully enclosed in parentheses, including nested calls

[15] [18]

approximately equal

**Approximation**: Represent " approximately equal" (~) using an epsilon range: |X-Y| < Epsilon. Here is an example to follow: ==== Starting SMT Formalisation ==== Input Premise: Patient 001. A 52-year-old male with long-standing type 2 diabetes, essential hypertension, and anaemia of chronic disease presents as a former smoker who quit a decade ago; he h...

[16] [19]

**Trial Criteria**: Create a list` eligibility_criteria`for all trial requirements

[17] [20]

**Solver Logic**: Create a`Solver` instance`s`, add both lists to it

[18] [21]

2" (Eligible) or

**Set Result**: Use an`if s.check() == sat:`block to set a variable` result`to "2" (Eligible) or "1" ( Ineligible). - PregnancyStatus: Bool (true for pregnant, false for not pregnant)

[19] [22]

"" G.3 Template C: QuaSAR Prompt

**Constraints:** - Age must be between 18 and 65. - Platelet count must be at least 100 x 10^3 cells/uL. - Patient must not be pregnant. - Patient must have no active malignancy . Here is the SMT-LIB model: This model checks whether Patient 001 satisfies all the eligibility criteria for Trial A based on the given constraints. ==== SMT Formalisation Comple...

[20] [23]

**Analysis**: Explain what the patient facts and trial rules are based on the Z3 code

[21] [24]

**Reasoning**: Logically deduce whether the patient is Eligible, Ineligible

[22] [25]

"" G.4 Template D: Chain-of-Thought Prompt f

**Answering**: Conclude with the final answer in the format: < decision>LABEL</decision>, where LABEL is one of'Eligible',' Ineligible'. #Previous Error (if any, please fix your reasoning) {feedback} """ G.4 Template D: Chain-of-Thought Prompt f"Patient description:\n{ topic_description}\n\n" f"Clinical trial document:\n{doc_content }\n\n" "Task: Determin...

2021