Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

Houman Kazemzadeh; Kamyar Naderi

arxiv: 2605.25120 · v1 · pith:3WJDXRTKnew · submitted 2026-05-24 · 💻 cs.CL · cs.AI· cs.HC

Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

Houman Kazemzadeh , Kamyar Naderi This is my paper

Pith reviewed 2026-06-30 11:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.HC

keywords radiology reportingstructured dataDICOMHL7 FHIRhuman-supervised AIevidence-linked reportingmedical imaging interoperabilityRadLex

0 comments

The pith

A human-supervised reference architecture structures radiology reports by linking findings to image evidence and medical standards for integration and reuse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Radiology reports often leave measurements, image evidence, and terminology trapped in free text or spread across disconnected systems. The paper proposes a framework that combines exam-specific templates, speech-to-structure conversion, measurement and segmentation capture, controlled AI drafting, and interoperability standards including DICOM, HL7 FHIR, RadLex, SNOMED CT, LOINC, and UCUM. This setup is presented as a structured intelligence layer rather than an autonomous generator, supporting reviewed reporting, longitudinal comparisons, data reuse, governance, and connections to PACS, RIS, EHR, analytics, and registries. A reader would care if this approach makes imaging data reliably reusable without removing radiologist oversight.

Core claim

The paper claims that a human-supervised, evidence-linked reference architecture integrating exam-specific templates, speech-to-structure processing, measurement and segmentation capture, controlled AI-assisted drafting, and standards-based interoperability with DICOM, DICOM Structured Reporting, DICOM Segmentation, HL7 FHIR, RadLex, SNOMED CT, LOINC, and UCUM forms a structured intelligence layer for enterprise imaging that enables reviewed reporting, longitudinal comparison, clinical data reuse, governance, and integration with PACS, RIS, EHR, analytics, and registry workflows.

What carries the argument

The evidence-linked reference architecture, which ties report elements directly to image evidence through templates and standards while enforcing human supervision over AI assistance.

If this is right

Enables consistent longitudinal comparison of lesions and measurements across multiple exams.
Allows imaging data to be reused directly in analytics, registries, and clinical decision support.
Supports governance and quality management for AI-assisted reporting workflows.
Facilitates modality-specific adaptations while maintaining standards compliance.
Integrates with enterprise systems without requiring full replacement of current infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The structured output could serve as higher-quality labeled data for training future radiology AI models.
Regulatory pathways might emphasize validation of the human review step rather than the AI components alone.
Similar architectures could extend to other diagnostic domains that rely on free-text reports.
Widespread adoption would require new validation protocols focused on end-to-end data flow rather than isolated report accuracy.

Load-bearing premise

The listed standards and processing components can be combined into one clinically usable system that achieves the promised integration and reuse benefits without major unresolved technical, safety, or regulatory problems.

What would settle it

A real-world pilot deployment that cannot achieve reliable data exchange between the proposed components and existing PACS, RIS, and EHR systems would show the architecture does not deliver the claimed interoperability.

Figures

Figures reproduced from arXiv: 2605.25120 by Houman Kazemzadeh, Kamyar Naderi.

**Figure 1.** Figure 1: Conceptual evidence-to-report architecture for human-supervised structured radiology reporting. The architecture separates imaging and order ingestion, AI orchestration, structured reporting, human review, narrative report generation, and downstream EHR/analytics export. 8. Structured Reporting Layer The structured reporting layer defines the clinical and logical structure of the report. Templates should b… view at source ↗

read the original abstract

Radiology reports remain the primary mechanism by which imaging findings are communicated to clinical teams. However, much of the structured information behind these reports, including measurements, image evidence, prior comparisons, lesion identity, uncertainty, and terminology, often remains trapped in free text or fragmented across picture archiving and communication systems, radiology information systems, reporting workstations, worksheets, advanced visualization tools, and electronic health records. This paper proposes a human-supervised, evidence-linked reference architecture for structured radiology reporting. The framework combines exam-specific templates, speech-to-structure processing, measurement and segmentation capture, controlled AI-assisted drafting, and standards-based interoperability using DICOM, DICOM Structured Reporting, DICOM Segmentation, HL7 FHIR, RadLex, SNOMED CT, LOINC, and UCUM. The system is positioned not as an autonomous report generator, but as a structured intelligence layer for enterprise imaging that supports reviewed reporting, longitudinal comparison, clinical data reuse, governance, and integration with PACS, RIS, EHR, analytics, and registry workflows. The paper also discusses modality-specific deployment considerations, clinical safety risks, validation requirements, cybersecurity, privacy, quality management, and regulatory boundaries for AI-assisted radiology reporting systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level reference architecture proposal that maps existing radiology standards into a supervised system, but supplies no implementation or validation.

read the letter

The paper's core is a blueprint for evidence-linked radiology reporting that combines exam templates, speech-to-structure tools, AI drafting, and standards like DICOM SR, FHIR, RadLex, and SNOMED. It positions the whole thing as a human-supervised layer rather than an autonomous generator, aimed at better data reuse across PACS, EHR, and registries.

It does a solid job listing the relevant components and calling out practical issues such as modality-specific deployment, safety risks, validation needs, cybersecurity, and regulatory limits. Those sections read like a useful checklist for anyone trying to specify an enterprise imaging system.

The main limitation is that the claims about improved interoperability and longitudinal comparison rest entirely on untested assumptions. There is no prototype, no pilot data, no integration test, and no discussion of how the pieces would actually mesh without major friction. The paper acknowledges these gaps at a high level but does not move beyond description.

This is for radiology informatics teams or standards groups who want a consolidated view of current options. It will not change research directions or clinical practice on its own.

I would send it to peer review in a medical informatics journal, with the expectation that referees will push for more concrete feasibility work.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a human-supervised reference architecture for evidence-linked structured radiology reporting. It integrates exam-specific templates, speech-to-structure processing, measurement and segmentation capture, controlled AI-assisted drafting, and standards-based interoperability (DICOM, DICOM SR, DICOM Segmentation, HL7 FHIR, RadLex, SNOMED CT, LOINC, UCUM). The system is framed as a structured intelligence layer supporting reviewed reporting, longitudinal comparison, clinical data reuse, governance, and integration with PACS/RIS/EHR/analytics/registry workflows, while addressing modality-specific deployment, safety risks, validation, cybersecurity, privacy, quality management, and regulatory boundaries.

Significance. If the described integration of existing standards and processing components can be realized, the architecture could advance structured data capture in radiology, enabling improved longitudinal analysis, secondary data use, and workflow interoperability without replacing radiologist oversight. The explicit human-supervised positioning and reliance on established terminologies and formats are constructive strengths for a reference architecture paper.

major comments (1)

[Abstract] Abstract and deployment considerations section: the central positioning that the listed components 'can function as a structured intelligence layer' for the claimed benefits assumes seamless integration and clinical deployability, yet the manuscript supplies no data-flow diagrams, interface specifications, or analysis of interoperability gaps between DICOM SR, FHIR, and PACS/RIS systems.

minor comments (2)

Add explicit citations to prior structured reporting initiatives (e.g., RSNA RadReport templates, IHE profiles) to situate the proposal within existing efforts.
The regulatory boundaries discussion would benefit from concrete references to FDA AI/ML guidance or EU MDR classification for software as medical device.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. The feedback highlights an opportunity to strengthen the presentation of the reference architecture. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and deployment considerations section: the central positioning that the listed components 'can function as a structured intelligence layer' for the claimed benefits assumes seamless integration and clinical deployability, yet the manuscript supplies no data-flow diagrams, interface specifications, or analysis of interoperability gaps between DICOM SR, FHIR, and PACS/RIS systems.

Authors: We agree that the manuscript would benefit from additional clarity on integration. As a reference architecture paper, the focus is on the conceptual framework and component roles rather than a full implementation specification. However, we will add a high-level data-flow diagram in the deployment considerations section and include a concise discussion of known interoperability considerations and potential gaps between DICOM SR, FHIR, and typical PACS/RIS/EHR interfaces. This will better ground the positioning without overstating seamlessness. revision: yes

Circularity Check

0 steps flagged

No significant circularity: descriptive reference architecture proposal

full rationale

The paper is a high-level proposal for a human-supervised radiology reporting architecture that integrates existing standards (DICOM, HL7 FHIR, RadLex, etc.) and processing components. It contains no equations, no fitted parameters, no predictions derived from data, and no self-citations used as load-bearing justification for any derivation. The central claim is architectural positioning rather than a derived result, so no step reduces to its own inputs by construction. This matches the default expectation for non-empirical descriptive work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a conceptual architecture proposal paper. No mathematical models, free parameters, axioms, or new postulated entities are present.

pith-pipeline@v0.9.1-grok · 5747 in / 1131 out tokens · 28631 ms · 2026-06-30T11:51:44.256047+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 10 canonical work pages · 2 internal anchors

[1]

Radiological Society of North America. (n.d.). RadReport reporting templates. RSNA. https://www.rsna.org/practice-tools/data-tools-and-standards/radreport-reporting-templates
[2]

Radiological Society of North America. (n.d.). RadLex radiology lexicon. RSNA. https://www.rsna.org/practice-tools/data-tools-and-standards/radlex-radiology-lexicon
[3]

Radiological Society of North America. (n.d.). RadLex term browser. https://radlex.org/
[4]

National Electrical Manufacturers Association. (n.d.). DICOM PS3.16: Content mapping resource. DICOM Standard. https://dicom.nema.org/medical/dicom/current/output/html/part16.html
[5]

National Electrical Manufacturers Association. (n.d.). DICOM PS3.3: Information object definitions. DICOM Standard. https://dicom.nema.org/medical/dicom/current/output/html/part03.html
[6]

National Electrical Manufacturers Association. (n.d.). DICOM PS3.3: Information object definitions: Segmentation IOD. DICOM Standard. https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_A.51.html
[7]

HL7 International. (n.d.). FHIR DiagnosticReport resource. HL7 FHIR. https://hl7.org/fhir/diagnosticreport.html
[8]

HL7 International. (n.d.). FHIR Observation resource. HL7 FHIR. https://hl7.org/fhir/observation.html
[9]

HL7 International. (n.d.). FHIR ImagingStudy resource. HL7 FHIR. https://hl7.org/fhir/imagingstudy.html
[10]

Integrating the Healthcare Enterprise. (2018). Management of Radiology Report Templates (MRRT): Trial implementation supplement. IHE Radiology Technical Framework. https://www.ihe.net/uploadedFiles/Documents/Radiology/IHE_RAD_Suppl_MRRT.pdf

2018
[11]

E., Jr., Genereaux, B

Kahn, C. E., Jr., Genereaux, B. W., & Langlotz, C. P. (2015). Conversion of radiology reporting templates to the MRRT standard. Journal of Digital Imaging, 28(5), 528–536. https://doi.org/10.1007/s10278-015- 9785-3

work page doi:10.1007/s10278-015- 2015
[12]

Regenstrief Institute. (n.d.). LOINC. https://loinc.org/
[13]

Regenstrief Institute. (n.d.). Unified Code for Units of Measure (UCUM). https://unitsofmeasure.org/
[14]

SNOMED International. (n.d.). SNOMED CT. https://www.snomed.org/snomed-ct
[15]

Food and Drug Administration

U.S. Food and Drug Administration. (2022). Clinical decision support software: Guidance for industry and Food and Drug Administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/clinical-decision-support-software 26

2022
[16]

Food and Drug Administration

U.S. Food and Drug Administration. (2025). Marketing submission recommendations for a predetermined change control plan for artificial intelligence-enabled device software functions: Guidance for industry and Food and Drug Administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/marketing-submission-recommendations-p...

2025
[17]

Food and Drug Administration

U.S. Food and Drug Administration. (2025). Artificial intelligence-enabled device software functions: Lifecycle management and marketing submission recommendations: Draft guidance for industry and Food and Drug Administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/artificial-intelligence-enabled-device-software-fu...

2025
[18]

International Medical Device Regulators Forum. (2014). Software as a Medical Device: Possible framework for risk categorization and corresponding considerations (IMDRF/SaMD WG/N12FINAL:2014). https://www.imdrf.org/documents/software-medical-device-possible-framework-risk-categorization-and- corresponding-considerations

2014
[19]

International Medical Device Regulators Forum. (2025). Characterization considerations for medical device software and software-specific risk (IMDRF/SaMD WG/N81FINAL:2025). https://www.imdrf.org/sites/default/files/2025-01/IMDRF_SaMD%20WG_Software- Specific%20Risk_N81%20Final_0.pdf

2025
[20]

National Institute of Standards and Technology. (2023). Artificial intelligence risk management framework (AI RMF 1.0) (NIST AI 100-1). https://doi.org/10.6028/NIST.AI.100-1

work page doi:10.6028/nist.ai.100-1 2023
[21]

International Organization for Standardization. (2016). ISO 13485:2016: Medical devices—Quality management systems—Requirements for regulatory purposes. https://www.iso.org/standard/59752.html

2016
[22]

International Organization for Standardization. (2019). ISO 14971:2019: Medical devices—Application of risk management to medical devices. https://www.iso.org/standard/72704.html

2019
[23]

International Electrotechnical Commission. (2006). IEC 62304:2006: Medical device software—Software life cycle processes. https://webstore.iec.ch/en/publication/6792

2006
[24]

International Electrotechnical Commission. (2015). IEC 62366-1:2015: Medical devices—Part 1: Application of usability engineering to medical devices. https://www.iso.org/standard/63179.html

2015
[25]

Jain, S., Agrawal, A., Saporta, A., Truong, S. Q. H., Duong, D. N., Bui, T., Chambon, P., Zhang, Y., Lungren, M. P., Ng, A. Y., Langlotz, C. P., & Rajpurkar, P. (2021). RadGraph: Extracting clinical entities and relations from radiology reports. arXiv. https://arxiv.org/abs/2106.14463

work page arXiv 2021
[26]

Delbrouck, J.-B., Chambon, P., Chen, Z., Varma, M., Johnston, A., Blankemeier, L., Van Veen, D., Bui, T., Truong, S., & Langlotz, C. (2024). RadGraph-XL: A large-scale expert-annotated dataset for entity and relation extraction from radiology reports. In Findings of the Association for Computational Linguistics: ACL 2024 (pp. 12902–12915). Association for...

work page doi:10.18653/v1/2024.findings-acl.765 2024
[27]

Delbrouck, J.-B. (2025). RadGraph-XL: A large-scale expert-annotated dataset for entity and relation extraction from radiology reports (Version 1.0.0). PhysioNet. https://doi.org/10.13026/j8e7-pr22

work page doi:10.13026/j8e7-pr22 2025
[28]

Reichenpfader, D., Knupp, J., Sander, A., & Denecke, K. (2024). RadEx: A framework for structured information extraction from radiology reports based on large language models. arXiv. https://arxiv.org/abs/2406.15465

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

F., Porras, A

Lekadir, K., Frangi, A. F., Porras, A. R., Glocker, B., Cintas, C., Langlotz, C. P., et al. (2025). FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ, 388, e081554. https://doi.org/10.1136/bmj-2024-081554

work page doi:10.1136/bmj-2024-081554 2025
[30]

Y., Chen, I

Miao, B. Y., Chen, I. Y., Williams, C. Y. K., Davidson, J., Garcia-Agundez, A., Sun, S., Zack, T., Saria, S., Arnaout, R., Quer, G., Sadaei, H. J., Torkamani, A., Beaulieu-Jones, B., Yu, B., Gianfrancesco, M., Butte, A. J., Norgeot, B., & Sushil, M. (2025). The MI-CLAIM-GEN checklist for generative artificial intelligence in health. Nature Medicine, 31(5)...

work page doi:10.1038/s41591-024-03470-0 2025
[31]

K., Torkamani, A., Dias, R., Gianfrancesco, M., Arnaout, R., Kohane, I

Norgeot, B., Quer, G., Beaulieu-Jones, B. K., Torkamani, A., Dias, R., Gianfrancesco, M., Arnaout, R., Kohane, I. S., Saria, S., Topol, E., Obermeyer, Z., Yu, B., & Butte, A. J. (2020). Minimum information about clinical artificial intelligence modeling: The MI-CLAIM checklist. Nature Medicine, 26(9), 1320–1324. https://doi.org/10.1038/s41591-020-1041-y

work page doi:10.1038/s41591-020-1041-y 2020
[32]

HealthBench: Evaluating Large Language Models Towards Improved Human Health

Arora, R. K., Wei, J., Soskin Hicks, R., Bowman, P., Quiñonero-Candela, J., Tsimpourlas, F., Sharman, M., Shah, M., Vallone, A., Beutel, A., Heidecke, J., & Singhal, K. (2025). HealthBench: Evaluating large language models towards improved human health. arXiv. https://arxiv.org/abs/2505.08775

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Radiological Society of North America. (n.d.). RadReport reporting templates. RSNA. https://www.rsna.org/practice-tools/data-tools-and-standards/radreport-reporting-templates

[2] [2]

Radiological Society of North America. (n.d.). RadLex radiology lexicon. RSNA. https://www.rsna.org/practice-tools/data-tools-and-standards/radlex-radiology-lexicon

[3] [3]

Radiological Society of North America. (n.d.). RadLex term browser. https://radlex.org/

[4] [4]

National Electrical Manufacturers Association. (n.d.). DICOM PS3.16: Content mapping resource. DICOM Standard. https://dicom.nema.org/medical/dicom/current/output/html/part16.html

[5] [5]

National Electrical Manufacturers Association. (n.d.). DICOM PS3.3: Information object definitions. DICOM Standard. https://dicom.nema.org/medical/dicom/current/output/html/part03.html

[6] [6]

National Electrical Manufacturers Association. (n.d.). DICOM PS3.3: Information object definitions: Segmentation IOD. DICOM Standard. https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_A.51.html

[7] [7]

HL7 International. (n.d.). FHIR DiagnosticReport resource. HL7 FHIR. https://hl7.org/fhir/diagnosticreport.html

[8] [8]

HL7 International. (n.d.). FHIR Observation resource. HL7 FHIR. https://hl7.org/fhir/observation.html

[9] [9]

HL7 International. (n.d.). FHIR ImagingStudy resource. HL7 FHIR. https://hl7.org/fhir/imagingstudy.html

[10] [10]

Integrating the Healthcare Enterprise. (2018). Management of Radiology Report Templates (MRRT): Trial implementation supplement. IHE Radiology Technical Framework. https://www.ihe.net/uploadedFiles/Documents/Radiology/IHE_RAD_Suppl_MRRT.pdf

2018

[11] [11]

E., Jr., Genereaux, B

Kahn, C. E., Jr., Genereaux, B. W., & Langlotz, C. P. (2015). Conversion of radiology reporting templates to the MRRT standard. Journal of Digital Imaging, 28(5), 528–536. https://doi.org/10.1007/s10278-015- 9785-3

work page doi:10.1007/s10278-015- 2015

[12] [12]

Regenstrief Institute. (n.d.). LOINC. https://loinc.org/

[13] [13]

Regenstrief Institute. (n.d.). Unified Code for Units of Measure (UCUM). https://unitsofmeasure.org/

[14] [14]

SNOMED International. (n.d.). SNOMED CT. https://www.snomed.org/snomed-ct

[15] [15]

Food and Drug Administration

U.S. Food and Drug Administration. (2022). Clinical decision support software: Guidance for industry and Food and Drug Administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/clinical-decision-support-software 26

2022

[16] [16]

Food and Drug Administration

U.S. Food and Drug Administration. (2025). Marketing submission recommendations for a predetermined change control plan for artificial intelligence-enabled device software functions: Guidance for industry and Food and Drug Administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/marketing-submission-recommendations-p...

2025

[17] [17]

Food and Drug Administration

U.S. Food and Drug Administration. (2025). Artificial intelligence-enabled device software functions: Lifecycle management and marketing submission recommendations: Draft guidance for industry and Food and Drug Administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance- documents/artificial-intelligence-enabled-device-software-fu...

2025

[18] [18]

International Medical Device Regulators Forum. (2014). Software as a Medical Device: Possible framework for risk categorization and corresponding considerations (IMDRF/SaMD WG/N12FINAL:2014). https://www.imdrf.org/documents/software-medical-device-possible-framework-risk-categorization-and- corresponding-considerations

2014

[19] [19]

International Medical Device Regulators Forum. (2025). Characterization considerations for medical device software and software-specific risk (IMDRF/SaMD WG/N81FINAL:2025). https://www.imdrf.org/sites/default/files/2025-01/IMDRF_SaMD%20WG_Software- Specific%20Risk_N81%20Final_0.pdf

2025

[20] [20]

National Institute of Standards and Technology. (2023). Artificial intelligence risk management framework (AI RMF 1.0) (NIST AI 100-1). https://doi.org/10.6028/NIST.AI.100-1

work page doi:10.6028/nist.ai.100-1 2023

[21] [21]

International Organization for Standardization. (2016). ISO 13485:2016: Medical devices—Quality management systems—Requirements for regulatory purposes. https://www.iso.org/standard/59752.html

2016

[22] [22]

International Organization for Standardization. (2019). ISO 14971:2019: Medical devices—Application of risk management to medical devices. https://www.iso.org/standard/72704.html

2019

[23] [23]

International Electrotechnical Commission. (2006). IEC 62304:2006: Medical device software—Software life cycle processes. https://webstore.iec.ch/en/publication/6792

2006

[24] [24]

International Electrotechnical Commission. (2015). IEC 62366-1:2015: Medical devices—Part 1: Application of usability engineering to medical devices. https://www.iso.org/standard/63179.html

2015

[25] [25]

Jain, S., Agrawal, A., Saporta, A., Truong, S. Q. H., Duong, D. N., Bui, T., Chambon, P., Zhang, Y., Lungren, M. P., Ng, A. Y., Langlotz, C. P., & Rajpurkar, P. (2021). RadGraph: Extracting clinical entities and relations from radiology reports. arXiv. https://arxiv.org/abs/2106.14463

work page arXiv 2021

[26] [26]

Delbrouck, J.-B., Chambon, P., Chen, Z., Varma, M., Johnston, A., Blankemeier, L., Van Veen, D., Bui, T., Truong, S., & Langlotz, C. (2024). RadGraph-XL: A large-scale expert-annotated dataset for entity and relation extraction from radiology reports. In Findings of the Association for Computational Linguistics: ACL 2024 (pp. 12902–12915). Association for...

work page doi:10.18653/v1/2024.findings-acl.765 2024

[27] [27]

Delbrouck, J.-B. (2025). RadGraph-XL: A large-scale expert-annotated dataset for entity and relation extraction from radiology reports (Version 1.0.0). PhysioNet. https://doi.org/10.13026/j8e7-pr22

work page doi:10.13026/j8e7-pr22 2025

[28] [28]

Reichenpfader, D., Knupp, J., Sander, A., & Denecke, K. (2024). RadEx: A framework for structured information extraction from radiology reports based on large language models. arXiv. https://arxiv.org/abs/2406.15465

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

F., Porras, A

Lekadir, K., Frangi, A. F., Porras, A. R., Glocker, B., Cintas, C., Langlotz, C. P., et al. (2025). FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ, 388, e081554. https://doi.org/10.1136/bmj-2024-081554

work page doi:10.1136/bmj-2024-081554 2025

[30] [30]

Y., Chen, I

Miao, B. Y., Chen, I. Y., Williams, C. Y. K., Davidson, J., Garcia-Agundez, A., Sun, S., Zack, T., Saria, S., Arnaout, R., Quer, G., Sadaei, H. J., Torkamani, A., Beaulieu-Jones, B., Yu, B., Gianfrancesco, M., Butte, A. J., Norgeot, B., & Sushil, M. (2025). The MI-CLAIM-GEN checklist for generative artificial intelligence in health. Nature Medicine, 31(5)...

work page doi:10.1038/s41591-024-03470-0 2025

[31] [31]

K., Torkamani, A., Dias, R., Gianfrancesco, M., Arnaout, R., Kohane, I

Norgeot, B., Quer, G., Beaulieu-Jones, B. K., Torkamani, A., Dias, R., Gianfrancesco, M., Arnaout, R., Kohane, I. S., Saria, S., Topol, E., Obermeyer, Z., Yu, B., & Butte, A. J. (2020). Minimum information about clinical artificial intelligence modeling: The MI-CLAIM checklist. Nature Medicine, 26(9), 1320–1324. https://doi.org/10.1038/s41591-020-1041-y

work page doi:10.1038/s41591-020-1041-y 2020

[32] [32]

HealthBench: Evaluating Large Language Models Towards Improved Human Health

Arora, R. K., Wei, J., Soskin Hicks, R., Bowman, P., Quiñonero-Candela, J., Tsimpourlas, F., Sharman, M., Shah, M., Vallone, A., Beutel, A., Heidecke, J., & Singhal, K. (2025). HealthBench: Evaluating large language models towards improved human health. arXiv. https://arxiv.org/abs/2505.08775

work page internal anchor Pith review Pith/arXiv arXiv 2025