LLM based Knowledge Graph Approach to Automating Medical Device Regulatory Compliance
Pith reviewed 2026-06-30 11:08 UTC · model grok-4.3
The pith
Regulatory knowledge from FDA documents is encoded in a knowledge graph that an LLM queries to classify devices and check compliance automatically.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that translating FDA regulations into a machine-processable OWL/RDF knowledge graph and using an LLM to dynamically generate SPARQL queries allows automated device classification into Class I, II, or III along with real-time regulatory evaluation, as shown in validated use cases that reduce manual review.
What carries the argument
An OWL/RDF knowledge graph storing regulatory knowledge, queried via SPARQL queries generated on demand by the Mistral 7B Instruct model to perform compliance reasoning.
Load-bearing premise
The knowledge graph must accurately represent all cross-referenced FDA regulations without omissions or errors, and the LLM must consistently produce accurate SPARQL queries even for complex compliance questions.
What would settle it
Running the system on a known medical device with established classification and compliance status, then checking if the output matches the official FDA determination or expert analysis.
Figures
read the original abstract
Advanced medical devices increasingly rely on AI-driven frameworks to automate compliance processes, ensuring safety and efficacy while reducing regulatory burdens. In the United States, software-based medical devices, including those utilizing AI/ML models, are regulated by the FDA's Center for Devices and Radiological Health (CDRH) under the Code of Federal Regulations (CFR) Title 21. These regulations are extensive, cross-referenced documents that require significant human effort to parse, leading to high compliance costs for manufacturers. We propose a novel, semantically rich framework that extracts regulatory knowledge from FDA documents and translates it into a machine-processable format. Our system encodes regulatory knowledge into an OWL/RDF-based knowledge graph and uses the Mistral 7B Instruct model to dynamically generate SPARQL queries, perform compliance reasoning, and produce structured reports. This enables automated device classification (Class I, II, or III) and real-time regulatory evaluation. Validated through real-world use cases, our framework significantly reduces manual review effort, enhances interpretability, and accelerates time-to-market. The proposed approach integrates AI reasoning and semantic technologies to achieve scalable, transparent, and automated regulatory compliance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an LLM-based framework that extracts knowledge from FDA CFR Title 21 documents into an OWL/RDF knowledge graph, employs the Mistral 7B Instruct model to dynamically generate SPARQL queries for compliance reasoning and device classification (Class I/II/III), and produces structured reports. It claims this enables automated, real-time regulatory evaluation and has been validated on real-world use cases with significant reduction in manual effort.
Significance. If the extraction and query-generation steps prove reliable on cross-referenced regulations, the approach could reduce compliance costs for medical-device manufacturers and improve interpretability via semantic technologies; however, the absence of any reported metrics or error analysis leaves the practical impact unquantified.
major comments (2)
- [Abstract] Abstract: the claim that the system was 'validated through real-world use cases' and 'significantly reduces manual review effort' is unsupported; the manuscript supplies no description of the use cases, extraction pipeline, inter-annotator agreement for KG construction, SPARQL query success rate, or any quantitative comparison against manual review.
- [Approach description (implied in abstract)] The central assumption that regulatory text is losslessly translated into the OWL/RDF graph and that Mistral 7B Instruct produces semantically correct SPARQL for dense cross-references is stated without evidence; no counter-example handling, query validation procedure, or fidelity metrics are provided.
minor comments (1)
- The manuscript should include a dedicated Methods or Evaluation section with explicit metrics (e.g., precision/recall on classification, query executability rate) before the validation claim can be assessed.
Simulated Author's Rebuttal
We thank the referee for the constructive comments identifying areas where additional detail and evidence would strengthen the manuscript. We address each point below and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the system was 'validated through real-world use cases' and 'significantly reduces manual review effort' is unsupported; the manuscript supplies no description of the use cases, extraction pipeline, inter-annotator agreement for KG construction, SPARQL query success rate, or any quantitative comparison against manual review.
Authors: We agree the abstract claims require supporting detail. The manuscript will be revised to include an expanded description of the real-world use cases (currently summarized in Section 4), the extraction pipeline for KG construction from CFR Title 21, and any internal metrics collected on SPARQL query success. Inter-annotator agreement was not performed because KG population combined automated LLM extraction with targeted human review rather than multiple independent annotators; this will be clarified. Quantitative effort-reduction comparisons will be added based on the logged manual review times from the use cases. revision: yes
-
Referee: [Approach description (implied in abstract)] The central assumption that regulatory text is losslessly translated into the OWL/RDF graph and that Mistral 7B Instruct produces semantically correct SPARQL for dense cross-references is stated without evidence; no counter-example handling, query validation procedure, or fidelity metrics are provided.
Authors: The current text presents the framework but does not supply explicit fidelity metrics or counter-example analysis. We will add a dedicated subsection on query generation that includes (1) the prompt template and few-shot examples used with Mistral 7B Instruct, (2) the post-generation validation steps (syntax checking plus manual spot-checks on a subset of queries), and (3) representative counter-examples where the generated SPARQL required manual correction, together with how those cases were handled. Comprehensive end-to-end fidelity metrics across the full regulation set are not available from the original experiments and would require new annotation effort. revision: partial
Circularity Check
No circularity; system description relies on external FDA sources and standard LLM without self-referential reductions
full rationale
The paper describes a framework that extracts regulatory knowledge from external FDA CFR documents into an OWL/RDF knowledge graph and employs the off-the-shelf Mistral 7B Instruct model to generate SPARQL queries for compliance reasoning and device classification. No equations, derivations, fitted parameters, or predictions are present. No self-citations are used to justify uniqueness or load-bearing premises. Validation is asserted via real-world use cases without any internal fitting or renaming of results. The derivation chain is therefore self-contained against external benchmarks and exhibits no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Transparency of artificial intelligence/machine learning-enabled medical devices,
A. A. Shick, C. M. Webber, N. Kiarashi, J. Weinberg, A. Deoras, N. Petrick, A. Saha, and M. C. Diamond, “Transparency of artificial intelligence/machine learning-enabled medical devices,”NPJ Digital Medicine, vol. 7, 2024
2024
-
[2]
Diving deep onto discriminative ensemble of histological hashing & class-specific manifold learning for multi-class breast carcinoma taxonomy,
S. Pratiher and S. Chattoraj, “Diving deep onto discriminative ensemble of histological hashing & class-specific manifold learning for multi-class breast carcinoma taxonomy,”ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1025–1029, 2018
2019
-
[3]
Role of an automated deep learning algorithm for reliable screening of abnormality in chest radiographs: A prospective multicenter quality improvement study,
A. Govindarajan, A. Govindarajan, S. Tanamala, S. Chattoraj, B. Reddy, R. Agrawal, D. Iyer, A. Srivastava, P. Kumar, and P. Putha, “Role of an automated deep learning algorithm for reliable screening of abnormality in chest radiographs: A prospective multicenter quality improvement study,”Diagnostics, vol. 12, 2022. [4]Wearable Medical Devices Statistics ...
2022
-
[4]
The need for a system view to regulate artificial intelligence/machine learning-based software as medical device,
S. Gerke, B. Babic, T. Evgeniou, and I. G. Cohen, “The need for a system view to regulate artificial intelligence/machine learning-based software as medical device,”NPJ Digital Medicine, vol. 3, 2020
2020
-
[5]
United states food and drug administration regulation of clinical software in the era of artificial intelligence and machine learning,
V . Singh, S. Cheng, A. C. Kwan, and J. E. Ebinger, “United states food and drug administration regulation of clinical software in the era of artificial intelligence and machine learning,”Mayo Clinic Proceedings: Digital Health, 2025. [7]U.S. Food and Drug Administration., 2020 (accessed February, 2025). [Online]. Available: http://www.fda.gov/AboutFDA/Wh...
2025
-
[6]
Demystifying the u.s. food and drug administration: Understanding regulatory pathways,
N. Naghshineh, S. Brown, P. S. Cederna, B. Levi, J. L. Lisiecki, R. A. D’Amico, K. M. Hume, W. Seward, and J. P. Rubin, “Demystifying the u.s. food and drug administration: Understanding regulatory pathways,” Plastic and Reconstructive Surgery, vol. 134, p. 559–569, 2014
2014
-
[7]
Ai-driven compliance for medical devices,
Evalueserve, “Ai-driven compliance for medical devices,”IPRD Blog, 2024. [Online]. Available: https://iprd.evalueserve.com/blog/ ai-driven-compliance-for-medical-devices/
2024
-
[8]
The digital revolution in regulatory affairs: Embracing ai and automation,
RegDesk, “The digital revolution in regulatory affairs: Embracing ai and automation,” 2024. [Online]. Available: https://www.regdesk.co/ the-digital-revolution-in-regulatory-affairs-embracing-ai-and-automation/
2024
-
[9]
arXiv preprint arXiv:2401.06775 , year=
Z. A. Nazi and W. Peng, “Large language models in healthcare and medical domain: A review,”arXiv preprint arXiv:2401.06775, 2023. [Online]. Available: https://arxiv.org/abs/2401.06775
-
[10]
Mastering ai prompts for legal professionals in 2025,
ContractPodAi, “Mastering ai prompts for legal professionals in 2025,” 2025. [Online]. Available: https://contractpodai.com/news/ ai-prompts-for-legal-professionals/
2025
-
[11]
Rethinking legal compliance automation: Opportunities with large language models,
S. Hassani, M. Sabetzadeh, D. Amyot, and J. Liao, “Rethinking legal compliance automation: Opportunities with large language models,”arXiv preprint arXiv:2404.14356, 2024. [Online]. Available: https://arxiv.org/abs/2404.14356
-
[12]
Code of federal regulations (cfr),
“Code of federal regulations (cfr),” 2023, u.S. Government Publishing Office. [Online]. Available: https://www.ecfr.gov/
2023
-
[13]
Federal register vs. cfr: What’s the difference?
N. Archives, “Federal register vs. cfr: What’s the difference?” 2023. [Online]. Available: https://www.archives.gov/federal-register/tutorial/
2023
-
[14]
Medical device regulations: Title 21, parts 800 to 1050,
U. Food and D. Administration, “Medical device regulations: Title 21, parts 800 to 1050,” 2023. [Online]. Available: https: //www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/cfrsearch.cfm
2023
-
[15]
Classify your medical device,
FDA, “Classify your medical device,” 2023. [Online]. Avail- able: https://www.fda.gov/medical-devices/overview-device-regulation/ classify-your-medical-device
2023
-
[16]
Administration,Premarket Approval (PMA),
Food and D. Administration,Premarket Approval (PMA),
-
[17]
Available: https://www.fda.gov/medical-devices/ premarket-submissions/premarket-approval-pma
[Online]. Available: https://www.fda.gov/medical-devices/ premarket-submissions/premarket-approval-pma
-
[18]
F. . D. Administration,Overview of Device Classification and Panels, 2023. [Online]. Available: https://www.fda.gov/medical-devices/ device-classification-overview
2023
-
[19]
Administration,Exemptions from Pre- market Notification (510(k)), 2023
Food and D. Administration,Exemptions from Pre- market Notification (510(k)), 2023. [Online]. Avail- able: https://www.fda.gov/medical-devices/premarket-notification-510k/ exemptions-premarket-notification-510k
2023
-
[20]
F. . D. Administration,513(g) Request for Information, 2023. [Online]. Available: https://www.fda.gov/medical-devices/premarket-submissions/ 513g-submissions
2023
-
[21]
Medreg-kg: Knowledgegraph for stream- lining medical device regulatory compliance,
S. Chattoraj and K. P. Joshi, “Medreg-kg: Knowledgegraph for stream- lining medical device regulatory compliance,”2024 IEEE International Conference on Big Data (BigData), pp. 3382–3390, 2024
2024
-
[22]
Fda approval process: Drugs vs. devices,
J. Network, “Fda approval process: Drugs vs. devices,” 2022. [Online]. Available: https://jamanetwork.com/journals/jama/fullarticle/2673993
-
[23]
Delays in fda review and the impact on innovation,
H. A. Blog, “Delays in fda review and the impact on innovation,” 2021. [Online]. Available: https://www.healthaffairs.org/do/10.1377/forefront. 20210730.797145/full/
-
[24]
Artificial intelligence in health care: Anticipating challenges to ethics, privacy, and bias,
F. Jiang, Y . Jiang, H. Zhi, and Y . e. a. Dong, “Artificial intelligence in health care: Anticipating challenges to ethics, privacy, and bias,”The Lancet, vol. 395, no. 10228, p. 264–273, 2020
2020
-
[25]
The rise of artificial intelligence in healthcare applications,
E. J. Topol, “The rise of artificial intelligence in healthcare applications,” Nature Medicine, vol. 25, pp. 44–56, 2019
2019
-
[26]
Artificial intelligence and machine learning in software as a medical device: Action plan,
F. C. for Devices and R. Health, “Artificial intelligence and machine learning in software as a medical device: Action plan,” 2021. [Online]. Available: https://www.fda.gov/media/145022/download
2021
-
[27]
Comparison of deep learning performance against healthcare professionals in detecting diseases from medical imaging: A systematic review,
P. Rajpurkar, J. Irvin, and K. e. a. Zhu, “Comparison of deep learning performance against healthcare professionals in detecting diseases from medical imaging: A systematic review,”JAMA Network Open, vol. 2, no. 6, p. e197535, 2019
2019
-
[28]
Semantically rich approach to automating regulations of medical devices,
S. Chattoraj, R. Walid, and K. P. Joshi, “Semantically rich approach to automating regulations of medical devices,”2024 IEEE International Conference on Digital Health (ICDH), pp. 132–137, 2024
2024
-
[29]
Rdf 1.1 concepts and abstract syntax,
W. R. W. Group, “Rdf 1.1 concepts and abstract syntax,” 2014. [Online]. Available: https://www.w3.org/TR/rdf11-concepts/
2014
-
[30]
Owl web ontology language overview,
W. O. W. Group, “Owl web ontology language overview,” 2012. [Online]. Available: https://www.w3.org/TR/owl2-overview/
2012
-
[31]
Semantic search and retrieval framework (sarf),
F. CDRH, “Semantic search and retrieval framework (sarf),” 2022. [Online]. Available: https://www.fda.gov/media/150645/download
2022
-
[32]
Rdf-based methods for detecting syntax and logic errors in medical datasets,
Y . Kim and H. Jung, “Rdf-based methods for detecting syntax and logic errors in medical datasets,”Journal of Biomedical Semantics, vol. 12, pp. 1–13, 2021
2021
-
[33]
Sparql query techniques for detecting regulatory inconsistencies in medical device databases,
A. Takahashi and Y . Nakamura, “Sparql query techniques for detecting regulatory inconsistencies in medical device databases,” inProceedings of the International Conference on Health Informatics, 2020, pp. 45–52
2020
-
[34]
Predicting adverse drug reactions using knowledge graph embeddings and deep learning,
Y . Wang, P. Zhang, and H. Lv, “Predicting adverse drug reactions using knowledge graph embeddings and deep learning,”Artificial Intelligence in Medicine, vol. 113, p. 102045, 2021
2021
-
[35]
A knowledge graph approach for classifying adverse drug reactions from heterogeneous data,
X. Li, Z. Huang, and H. Yang, “A knowledge graph approach for classifying adverse drug reactions from heterogeneous data,”Journal of Biomedical Informatics, vol. 112, p. 103627, 2021
2021
-
[36]
Constructing the patient safety knowledge graph for post-market surveillance,
M. Chen and R. Xu, “Constructing the patient safety knowledge graph for post-market surveillance,”IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 5, pp. 2084–2093, 2022
2084
-
[37]
Building a disease comorbidity knowledge graph from faers using association rule mining,
L. Zhao and Y . Zhang, “Building a disease comorbidity knowledge graph from faers using association rule mining,”BMC Medical Informatics and Decision Making, vol. 21, p. 113, 2021
2021
-
[38]
Evaluation of comorbidity detection in faers: A case study on psoriasis, ms, and obesity,
R. Singh and J. Thomas, “Evaluation of comorbidity detection in faers: A case study on psoriasis, ms, and obesity,”Journal of Biomedical Research, vol. 35, no. 3, pp. 202–214, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.