Neuro-Symbolic Resolution of Recommendation Conflicts in Multimorbidity Clinical Guidelines

Jian Du; Shiyao Xie

arxiv: 2604.17340 · v1 · submitted 2026-04-19 · 💻 cs.CL

Neuro-Symbolic Resolution of Recommendation Conflicts in Multimorbidity Clinical Guidelines

Shiyao Xie , Jian Du This is my paper

Pith reviewed 2026-05-10 06:34 UTC · model grok-4.3

classification 💻 cs.CL

keywords neuro-symbolic AIclinical guidelinesmultimorbidityconflict detectionSAT solverlocal conflictmedical knowledge coordinationSGLT2 inhibitors

0 comments

The pith

A neuro-symbolic pipeline translates multimorbidity guidelines into logic and uses a SAT solver to detect conflicts that LLMs miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that clinical guidelines developed separately for single diseases contain many logical conflicts when applied together to patients with several conditions at once. These conflicts, especially those arising where recommendations for different diseases intersect, create real problems for doctors and break AI systems that simply retrieve and summarize the guidelines. The authors build a system that uses multiple AI agents to turn the natural-language recommendations into precise symbolic statements, then feeds those statements to a SAT solver to find contradictions and redundancies. Testing on twelve authoritative guidelines for SGLT2 inhibitors shows that over ninety percent of the conflicts are local to the comorbidity overlap, a pattern single-disease documents cannot capture, and the hybrid method reaches an F1 score of 0.861 where current large language models fail completely.

Core claim

We introduce a Neuro-Symbolic framework that automates the detection of recommendation redundancies and conflicts. Our pipeline employs a multi-agent system to translate unstructured clinical natural language into rigorous symbolic logic language, which is then verified by a Satisfiability (SAT) solver. By formulating a hierarchical taxonomy of logical rule interactions, we identify a critical category termed Local Conflict - a decision conflict arising from the intersection of comorbidities. Evaluating our system on a curated benchmark of 12 authoritative SGLT2 inhibitor guidelines, we reveal that 90.6% of conflicts are Local, a structural complexity that single-disease guidelines fail to地址

What carries the argument

The multi-agent neuro-symbolic pipeline that converts guideline text into formal symbolic logic for SAT-solver verification, together with the taxonomy that isolates Local Conflicts at comorbidity intersections.

If this is right

Single-disease guidelines must be coordinated through explicit logic checks before they can be used safely for multimorbid patients.
Retrieval-augmented generation systems in medicine require a prior logical-verification layer to avoid propagating guideline contradictions.
Local conflicts constitute the dominant failure mode, so future guideline development should incorporate multimorbidity intersection analysis from the outset.
The neuro-symbolic method offers a repeatable way to surface and resolve inconsistencies across guidelines produced by different specialty societies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the pipeline to additional drug classes or disease areas could produce a reusable library of resolved guideline conflicts for common multimorbidity patterns.
Pairing the conflict map with individual patient records would allow the system to surface only the conflicts that actually apply to a given case.
The performance gap with pure language models indicates that safety-critical medical AI will need symbolic verification components as a standard complement to neural retrieval.

Load-bearing premise

The multi-agent system can convert the original clinical text into symbolic logic statements that are accurate and complete enough for the SAT solver to produce trustworthy results.

What would settle it

Independent manual inspection of the symbolic translations for a subset of the twelve guidelines that finds repeated mismatches between the logic statements and the source text, causing the solver to report conflicts that do not exist or to miss real ones.

Figures

Figures reproduced from arXiv: 2604.17340 by Jian Du, Shiyao Xie.

**Figure 2.** Figure 2: The Neuro-Symbolic Pipeline for Clinical Guideline Formalization and Verification [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Performance Comparison across Logical Sub [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of RAG Retrieval Noise on Reasoning Per [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Clinical guidelines, typically developed by independent specialty societies, inherently exhibit substantial fragmentation, redundancy, and logical contradiction. These inconsistencies, particularly when applied to patients with multimorbidity, not only cause cognitive dissonance for clinicians but also introduce catastrophic noise into AI systems, rendering the standard Retrieval-Augmented Generation (RAG) system fragile and prone to hallucination. To address this fundamental reliability crisis, we introduce a Neuro-Symbolic framework that automates the detection of recommendation redundancies and conflicts. Our pipeline employs a multi-agent system to translate unstructured clinical natural language into rigorous symbolic logic language, which is then verified by a Satisfiability (SAT) solver. By formulating a hierarchical taxonomy of logical rule interactions, we identify a critical category termed Local Conflict - a decision conflict arising from the intersection of comorbidities. Evaluating our system on a curated benchmark of 12 authoritative SGLT2 inhibitor guidelines, we reveal that 90.6% of conflicts are Local, a structural complexity that single-disease guidelines fail to address. While state-of-the-art LLMs fail in detecting these conflicts, our neuro-symbolic approach achieves an F1 score of 0.861. This work demonstrates that logical verification must precede retrieval, establishing a new technical standard for automated knowledge coordination in medical AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The neuro-symbolic pipeline for catching local conflicts in multimorbidity guidelines is a sensible direction, but the unvalidated LLM-to-logic translation step leaves the 90.6% and F1=0.861 claims unsupported.

read the letter

This paper describes a neuro-symbolic pipeline for spotting redundancies and conflicts in clinical guidelines when patients have multiple diseases. The approach runs a multi-agent LLM system to convert guideline text into formal logic statements, then uses a SAT solver to check for inconsistencies. They introduce a taxonomy that singles out Local Conflict as the kind that appears only at the overlap of comorbidities, and on 12 SGLT2 inhibitor guidelines they find 90.6 percent of conflicts are local, with the full system reaching an F1 score of 0.861. The new element here is the emphasis on local conflicts in multimorbidity and the use of symbolic verification to catch issues that pure LLM methods miss. It directly targets a known weakness in current RAG setups for medical knowledge, where conflicting guidelines can lead to bad outputs. That framing is useful. The main weakness is the absence of any check on whether the translation from natural language to logic is reliable. The reported numbers depend on the multi-agent system getting the predicates and conditions right every time, but the abstract supplies no expert review, no inter-annotator agreement, and no error analysis on the formalization step. If the logic encoding misses or misstates a condition, the SAT results become meaningless. The stress-test note captures this exactly. For someone working on clinical AI or automated guideline management, the paper offers a concrete architecture to consider. It could prompt useful discussion in a reading group about how to combine neural and symbolic methods in high-stakes domains. I would recommend sending it to peer review. The underlying problem is real and the proposed direction is reasonable, but the authors need to document the benchmark construction and add validation for the translation accuracy before the performance claims can be taken at face value.

Referee Report

1 major / 1 minor

Summary. The paper proposes a neuro-symbolic framework to detect redundancies and conflicts in multimorbidity clinical guidelines. A multi-agent LLM system translates unstructured guideline text into symbolic logic, which a SAT solver then analyzes using a hierarchical taxonomy of rule interactions; the key new category is 'Local Conflict' (decision conflicts at comorbidity intersections). On a curated set of 12 SGLT2-inhibitor guidelines the system reports that 90.6 % of detected conflicts are local and that the overall approach reaches an F1 of 0.861, substantially outperforming direct LLM conflict detection.

Significance. If the translation step proves reliable, the work would establish a verifiable pre-retrieval layer for medical RAG systems and would quantify a previously under-appreciated structural source of guideline inconsistency. The explicit use of an external SAT solver and the introduction of a falsifiable local-conflict taxonomy are concrete strengths that could be reproduced and extended.

major comments (1)

[Abstract] Abstract and evaluation description: the reported F1 score of 0.861 and the 90.6 % local-conflict statistic rest entirely on the correctness of the multi-agent translation from natural-language recommendations to formal logic predicates. No benchmark-curation protocol, conflict-annotation guidelines, inter-rater reliability statistics, expert adjudication of the generated formulas, or error analysis of the translation step are supplied; without these the performance numbers cannot be interpreted as evidence that the neuro-symbolic pipeline works.

minor comments (1)

The term 'Local Conflict' is introduced in the abstract but would benefit from a concise formal definition and one or two concrete examples in the main text before the taxonomy is used in the SAT encoding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the critical importance of evaluation transparency in our neuro-symbolic pipeline. We agree that the reported F1 score and local-conflict statistics require detailed supporting documentation on the translation and annotation processes to be fully interpretable. We will revise the manuscript to address this.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation description: the reported F1 score of 0.861 and the 90.6 % local-conflict statistic rest entirely on the correctness of the multi-agent translation from natural-language recommendations to formal logic predicates. No benchmark-curation protocol, conflict-annotation guidelines, inter-rater reliability statistics, expert adjudication of the generated formulas, or error analysis of the translation step are supplied; without these the performance numbers cannot be interpreted as evidence that the neuro-symbolic pipeline works.

Authors: We concur that the performance metrics depend on the fidelity of the multi-agent translation to formal logic and that the current manuscript lacks sufficient methodological detail on this step. In the revised version we will add a dedicated subsection under Evaluation that (i) describes the benchmark-curation protocol, including selection criteria for the 12 SGLT2-inhibitor guidelines and preprocessing steps; (ii) provides the conflict-annotation guidelines used to establish ground-truth labels; (iii) reports inter-rater reliability (or notes single-expert annotation with justification); (iv) outlines the expert adjudication procedure applied to the generated predicates; and (v) includes a systematic error analysis of translation failures with representative examples. These additions will enable readers to assess the reliability of the 0.861 F1 and 90.6 % local-conflict figures without changing the core experimental results or conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline applies independent SAT solver to external guidelines

full rationale

The paper's core chain—multi-agent LLM translation of guideline text into symbolic logic, followed by SAT solving and taxonomy-based conflict classification—does not reduce any result to its own inputs by construction. The benchmark consists of 12 external SGLT2 inhibitor guidelines; the F1=0.861 and 90.6% local-conflict statistic are computed against that independent corpus rather than fitted parameters or self-referential definitions. No equations appear, no self-citations are invoked as load-bearing uniqueness theorems, and the SAT solver is an external, off-the-shelf verifier. The translation step is a methodological choice whose accuracy is not validated in the provided text, but this is a correctness concern, not a circular reduction. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the domain assumption that LLM multi-agent translation faithfully captures guideline logic for SAT verification and on the representativeness of the 12 SGLT2 guidelines for general multimorbidity conflicts. The Local Conflict category is introduced without independent evidence outside the paper.

axioms (1)

domain assumption Unstructured clinical guidelines can be faithfully translated into symbolic logic by multi-agent LLMs without critical loss of meaning
This underpins the entire pipeline from natural language to SAT solver input.

invented entities (1)

Local Conflict no independent evidence
purpose: A category of decision conflicts arising specifically from the intersection of comorbidities
Defined as critical in the hierarchical taxonomy of logical rule interactions.

pith-pipeline@v0.9.0 · 5520 in / 1477 out tokens · 71589 ms · 2026-05-10T06:34:20.807568+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

[1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page
[2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page
[3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980
[4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page
[5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984
[6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page
[7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page
[8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page
[9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page
[10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

work page 2017
[11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page
[12]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

work page 2025
[13]

and Gui, Jiang

Wu, Weiyi and Xu, Xinwen and Gao, Chongyang and Diao, Xingjian and Li, Siting and Salas, Lucas A. and Gui, Jiang. Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.38

work page doi:10.18653/v1/2025.findings-emnlp.38 2025
[14]

Journal of the American Medical Informatics Association , volume=

Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines , author=. Journal of the American Medical Informatics Association , volume=. 2025 , publisher=

work page 2025
[15]

PLOS Digital Health , volume=

Retrieval augmented generation for large language models in healthcare: A systematic review , author=. PLOS Digital Health , volume=. 2025 , publisher=

work page 2025
[16]

arXiv preprint arXiv:2511.05901 , year=

Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations , author=. arXiv preprint arXiv:2511.05901 , year=

work page arXiv
[17]

Nejm ai , volume=

Almanac—retrieval-augmented language models for clinical medicine , author=. Nejm ai , volume=. 2024 , publisher=

work page 2024
[18]

npj Digital Medicine , volume=

SurgeryLLM: a retrieval-augmented generation large language model framework for surgical decision support and workflow enhancement , author=. npj Digital Medicine , volume=. 2024 , publisher=

work page 2024
[19]

NPJ digital medicine , volume=

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework , author=. NPJ digital medicine , volume=. 2024 , publisher=

work page 2024
[20]

PLoS One , volume=

Exploring the concordance of recommendations across guidelines on chest imaging for the diagnosis and management of COVID-19: A proposed methodological approach based on a case study , author=. PLoS One , volume=. 2023 , publisher=

work page 2023
[21]

European Journal of Hospital Pharmacy , volume=

Consistency of recommendations from clinical practice guidelines for the management of critically ill COVID-19 patients , author=. European Journal of Hospital Pharmacy , volume=. 2021 , publisher=

work page 2021
[22]

Argument & Computation , year=

Assumption-based argumentation with preferences and goals for patient-centric reasoning with interacting clinical guidelines , author=. Argument & Computation , year=

work page
[23]

BMC Health Services Research , year=

Epidemiological strategies for adapting clinical practice guidelines to the needs of multimorbid patients , author=. BMC Health Services Research , year=

work page
[24]

Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare , author=. arXiv preprint arXiv:2511.06668 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

ArXiv , year=

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG , author=. ArXiv , year=

work page
[26]

ArXiv , year=

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards , author=. ArXiv , year=

work page
[27]

PLoS Medicine , year=

The importance and challenges of shared decision making in older people with multimorbidity , author=. PLoS Medicine , year=

work page
[28]

Journal of Evaluation in Clinical Practice , volume=

Are Canadian Clinical Practice Guidelines Accounting for Adults With Multiple Chronic Diseases? A Systematic Review , author=. Journal of Evaluation in Clinical Practice , volume=. 2025 , publisher=

work page 2025
[29]

BMC Medical Research Methodology , year=

Defining expert opinion in clinical guidelines: insights from 98 scientific societies – a methodological study , author=. BMC Medical Research Methodology , year=

work page
[30]

International Urogynecology Journal , year=

Evaluation of clinical practice guidelines (CPG) on the management of female chronic pelvic pain (CPP) using the AGREE II instrument , author=. International Urogynecology Journal , year=

work page
[31]

The BMJ , year=

Drug-disease and drug-drug interactions: systematic examination of recommendations in 12 UK national clinical guidelines , author=. The BMJ , year=

work page
[32]

The Lancet , year=

The rise and fall of aspirin in the primary prevention of cardiovascular disease , author=. The Lancet , year=

work page
[33]

Frontiers in Medicine , year=

Recommendations for the primary prevention of atherosclerotic cardiovascular disease in primary care: a systematic guideline review , author=. Frontiers in Medicine , year=

work page
[34]

CMAJ : Canadian Medical Association Journal , year=

Canadian Cardiovascular Harmonized National Guideline Endeavour (C-CHANGE) guideline for the prevention and management of cardiovascular disease in primary care: 2022 update , author=. CMAJ : Canadian Medical Association Journal , year=

work page 2022
[35]

ArXiv , year=

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning , author=. ArXiv , year=

work page
[36]

International Conference on Tools and Algorithms for Construction and Analysis of Systems , year=

Z3: An Efficient SMT Solver , author=. International Conference on Tools and Algorithms for Construction and Analysis of Systems , year=

work page
[37]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024
[38]

Qwen3-Max: Just Scale it , author =

work page
[39]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[40]

JMIR Formative Research , year=

System for Context-Specific Visualization of Clinical Practice Guidelines (GuLiNav): Concept and Software Implementation , author=. JMIR Formative Research , year=

work page
[41]

International Conference on Health Informatics , year=

Enhancing Decision-making Systems with Relevant Patient Information by Leveraging Clinical Notes , author=. International Conference on Health Informatics , year=

work page
[42]

Conference on Empirical Methods in Natural Language Processing , year=

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers , author=. Conference on Empirical Methods in Natural Language Processing , year=

work page
[43]

arXiv preprint arXiv:2406.17663 , year=

Llm-arc: Enhancing llms with an automated reasoning critic , author=. arXiv preprint arXiv:2406.17663 , year=

work page arXiv
[44]

Annual Meeting of the Association for Computational Linguistics , year=

Faithful Logical Reasoning via Symbolic Chain-of-Thought , author=. Annual Meeting of the Association for Computational Linguistics , year=

work page
[45]

JMIR Medical Informatics , year=

Fast Healthcare Interoperability Resources, Clinical Quality Language, and Systematized Nomenclature of Medicine—Clinical Terms in Representing Clinical Evidence Logic Statements for the Use of Imaging Procedures: Descriptive Study , author=. JMIR Medical Informatics , year=

work page
[46]

Applied Clinical Informatics , year=

Igniting Harmonized Digital Clinical Quality Measurement through Terminology, CQL, and FHIR , author=. Applied Clinical Informatics , year=

work page
[47]

Learning Health Systems , year=

Toward cross‐platform electronic health record‐driven phenotyping using Clinical Quality Language , author=. Learning Health Systems , year=

work page
[48]

Applied Clinical Informatics , year=

A Comparison of Arden Syntax and Clinical Quality Language as Knowledge Representation Formalisms for Clinical Decision Support , author=. Applied Clinical Informatics , year=

work page
[49]

2024 , url=

Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection , author=. 2024 , url=

work page 2024
[50]

2024 , url=

Towards Logically Sound Natural Language Reasoning with Logic-Enhanced Language Model Agents , author=. 2024 , url=

work page 2024
[51]

Journal of the American Medical Informatics Association : JAMIA , year=

A lifecycle framework illustrates eight stages necessary for realizing the benefits of patient-centered clinical decision support , author=. Journal of the American Medical Informatics Association : JAMIA , year=

work page
[52]

2025 , url=

LLM-Assisted Formalization Enables Deterministic Detection of Statutory Inconsistency in the Internal Revenue Code , author=. 2025 , url=

work page 2025
[53]

ArXiv , year=

LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents , author=. ArXiv , year=

work page

[1] [1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page

[2] [2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page

[3] [3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980

[4] [4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page

[5] [5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984

[6] [6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page

[7] [7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page

[8] [8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page

[9] [9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page

[10] [10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

work page 2017

[11] [11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page

[12] [12]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

work page 2025

[13] [13]

and Gui, Jiang

Wu, Weiyi and Xu, Xinwen and Gao, Chongyang and Diao, Xingjian and Li, Siting and Salas, Lucas A. and Gui, Jiang. Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.38

work page doi:10.18653/v1/2025.findings-emnlp.38 2025

[14] [14]

Journal of the American Medical Informatics Association , volume=

Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines , author=. Journal of the American Medical Informatics Association , volume=. 2025 , publisher=

work page 2025

[15] [15]

PLOS Digital Health , volume=

Retrieval augmented generation for large language models in healthcare: A systematic review , author=. PLOS Digital Health , volume=. 2025 , publisher=

work page 2025

[16] [16]

arXiv preprint arXiv:2511.05901 , year=

Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations , author=. arXiv preprint arXiv:2511.05901 , year=

work page arXiv

[17] [17]

Nejm ai , volume=

Almanac—retrieval-augmented language models for clinical medicine , author=. Nejm ai , volume=. 2024 , publisher=

work page 2024

[18] [18]

npj Digital Medicine , volume=

SurgeryLLM: a retrieval-augmented generation large language model framework for surgical decision support and workflow enhancement , author=. npj Digital Medicine , volume=. 2024 , publisher=

work page 2024

[19] [19]

NPJ digital medicine , volume=

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework , author=. NPJ digital medicine , volume=. 2024 , publisher=

work page 2024

[20] [20]

PLoS One , volume=

Exploring the concordance of recommendations across guidelines on chest imaging for the diagnosis and management of COVID-19: A proposed methodological approach based on a case study , author=. PLoS One , volume=. 2023 , publisher=

work page 2023

[21] [21]

European Journal of Hospital Pharmacy , volume=

Consistency of recommendations from clinical practice guidelines for the management of critically ill COVID-19 patients , author=. European Journal of Hospital Pharmacy , volume=. 2021 , publisher=

work page 2021

[22] [22]

Argument & Computation , year=

Assumption-based argumentation with preferences and goals for patient-centric reasoning with interacting clinical guidelines , author=. Argument & Computation , year=

work page

[23] [23]

BMC Health Services Research , year=

Epidemiological strategies for adapting clinical practice guidelines to the needs of multimorbid patients , author=. BMC Health Services Research , year=

work page

[24] [24]

Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare , author=. arXiv preprint arXiv:2511.06668 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

ArXiv , year=

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG , author=. ArXiv , year=

work page

[26] [26]

ArXiv , year=

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards , author=. ArXiv , year=

work page

[27] [27]

PLoS Medicine , year=

The importance and challenges of shared decision making in older people with multimorbidity , author=. PLoS Medicine , year=

work page

[28] [28]

Journal of Evaluation in Clinical Practice , volume=

Are Canadian Clinical Practice Guidelines Accounting for Adults With Multiple Chronic Diseases? A Systematic Review , author=. Journal of Evaluation in Clinical Practice , volume=. 2025 , publisher=

work page 2025

[29] [29]

BMC Medical Research Methodology , year=

Defining expert opinion in clinical guidelines: insights from 98 scientific societies – a methodological study , author=. BMC Medical Research Methodology , year=

work page

[30] [30]

International Urogynecology Journal , year=

Evaluation of clinical practice guidelines (CPG) on the management of female chronic pelvic pain (CPP) using the AGREE II instrument , author=. International Urogynecology Journal , year=

work page

[31] [31]

The BMJ , year=

Drug-disease and drug-drug interactions: systematic examination of recommendations in 12 UK national clinical guidelines , author=. The BMJ , year=

work page

[32] [32]

The Lancet , year=

The rise and fall of aspirin in the primary prevention of cardiovascular disease , author=. The Lancet , year=

work page

[33] [33]

Frontiers in Medicine , year=

Recommendations for the primary prevention of atherosclerotic cardiovascular disease in primary care: a systematic guideline review , author=. Frontiers in Medicine , year=

work page

[34] [34]

CMAJ : Canadian Medical Association Journal , year=

Canadian Cardiovascular Harmonized National Guideline Endeavour (C-CHANGE) guideline for the prevention and management of cardiovascular disease in primary care: 2022 update , author=. CMAJ : Canadian Medical Association Journal , year=

work page 2022

[35] [35]

ArXiv , year=

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning , author=. ArXiv , year=

work page

[36] [36]

International Conference on Tools and Algorithms for Construction and Analysis of Systems , year=

Z3: An Efficient SMT Solver , author=. International Conference on Tools and Algorithms for Construction and Analysis of Systems , year=

work page

[37] [37]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024

[38] [38]

Qwen3-Max: Just Scale it , author =

work page

[39] [39]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[40] [40]

JMIR Formative Research , year=

System for Context-Specific Visualization of Clinical Practice Guidelines (GuLiNav): Concept and Software Implementation , author=. JMIR Formative Research , year=

work page

[41] [41]

International Conference on Health Informatics , year=

Enhancing Decision-making Systems with Relevant Patient Information by Leveraging Clinical Notes , author=. International Conference on Health Informatics , year=

work page

[42] [42]

Conference on Empirical Methods in Natural Language Processing , year=

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers , author=. Conference on Empirical Methods in Natural Language Processing , year=

work page

[43] [43]

arXiv preprint arXiv:2406.17663 , year=

Llm-arc: Enhancing llms with an automated reasoning critic , author=. arXiv preprint arXiv:2406.17663 , year=

work page arXiv

[44] [44]

Annual Meeting of the Association for Computational Linguistics , year=

Faithful Logical Reasoning via Symbolic Chain-of-Thought , author=. Annual Meeting of the Association for Computational Linguistics , year=

work page

[45] [45]

JMIR Medical Informatics , year=

Fast Healthcare Interoperability Resources, Clinical Quality Language, and Systematized Nomenclature of Medicine—Clinical Terms in Representing Clinical Evidence Logic Statements for the Use of Imaging Procedures: Descriptive Study , author=. JMIR Medical Informatics , year=

work page

[46] [46]

Applied Clinical Informatics , year=

Igniting Harmonized Digital Clinical Quality Measurement through Terminology, CQL, and FHIR , author=. Applied Clinical Informatics , year=

work page

[47] [47]

Learning Health Systems , year=

Toward cross‐platform electronic health record‐driven phenotyping using Clinical Quality Language , author=. Learning Health Systems , year=

work page

[48] [48]

Applied Clinical Informatics , year=

A Comparison of Arden Syntax and Clinical Quality Language as Knowledge Representation Formalisms for Clinical Decision Support , author=. Applied Clinical Informatics , year=

work page

[49] [49]

2024 , url=

Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection , author=. 2024 , url=

work page 2024

[50] [50]

2024 , url=

Towards Logically Sound Natural Language Reasoning with Logic-Enhanced Language Model Agents , author=. 2024 , url=

work page 2024

[51] [51]

Journal of the American Medical Informatics Association : JAMIA , year=

A lifecycle framework illustrates eight stages necessary for realizing the benefits of patient-centered clinical decision support , author=. Journal of the American Medical Informatics Association : JAMIA , year=

work page

[52] [52]

2025 , url=

LLM-Assisted Formalization Enables Deterministic Detection of Statutory Inconsistency in the Internal Revenue Code , author=. 2025 , url=

work page 2025

[53] [53]

ArXiv , year=

LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents , author=. ArXiv , year=

work page