Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation

Ali-Musa Jaffer; Alison Sheehan; Alison Walker; Amod Sarnaik; Ashley Layman; Caitlin McMullen; Carlos Garcia Fernandez; Christine Sam; Cydney A. Warfield; Daniel A. Anaya

arxiv: 2604.20869 · v1 · submitted 2026-03-27 · 💻 cs.CY · cs.AI· cs.HC· cs.IR· cs.LG

Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation

Philippe E. Spiess , Md Muntasir Zitu , Alison Walker , Daniel A. Anaya , Robert M. Wenham , Michael Vogelbaum , Daniel Grass , Ali-Musa Jaffer

show 28 more authors

Amod Sarnaik Caitlin McMullen Christine Sam John V. Kiluk Tianshi Liu Tiago Biachi Julio Powsang Jing-Yi Chern Roger Li Seth Felder Samuel Reynolds Michael Shafique Alison Sheehan Ashley Layman Cydney A. Warfield Derrick Legoas Jaclyn Parrinello Jena Schmitz Kevin Eaton Mark Honor Luis Felipe Issam ElNaqa Elier Delgado Talia Berler Rachael V. Phillips Frantz Francisque Carlos Garcia Fernandez Gilmer Valdes

This is my paper

Pith reviewed 2026-05-14 23:36 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HCcs.IRcs.LG

keywords oncology AItreatment planningclinical reasoningguideline concordancevignette evaluationcommunity cancer caresafety layermulti-specialty

0 comments

The pith

An AI platform for oncology treatment planning produces outputs rated guideline-concordant and safe by clinicians across multiple specialties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests OncoBrain, an AI system built to generate oncology treatment plans by combining general-purpose large language models with a cancer-specific graph retrieval layer, a stored corpus of gold-standard plans, and a safety module called CHECK to catch hallucinations. In a vignette study of 173 cases spanning gynecologic, genitourinary, neuro-oncology, gastrointestinal, and hematologic cancers, three clinician groups—subspecialist oncologists, physician reviewers, and advanced practice providers—scored the plans on a shared 16-item instrument. Ratings peaked for scientific accuracy, evidence alignment, and safety, with guideline concordance averaging 4.60 to 4.70 on a 5-point scale and safety scores between 4.40 and 4.80. Lower but still positive marks appeared for workflow fit and time savings. The authors conclude that the platform shows promise for reducing cognitive load in community oncology and merits real-world testing.

Core claim

OncoBrain generated oncology treatment plans judged guideline-concordant, clinically acceptable, and easy to supervise by subspecialists, physicians, and advanced practice providers in a multi-specialty vignette evaluation.

What carries the argument

OncoBrain architecture: general-purpose LLMs plus graph retrieval-augmented generation over cancer knowledge, long-term memory from a gold-standard treatment-plan corpus, and the CHECK model-agnostic safety layer for hallucination detection and suppression.

If this is right

Plans receive high marks for evidence alignment, with mean scores of 4.60–4.70 across clinician groups.
Safety and misinformation concerns remain low, with mean scores of 4.40–4.80.
Workflow integration and perceived time savings receive favorable though slightly lower ratings that vary by clinician type.
Positive results hold across five major cancer categories and three reviewer cohorts totaling 173 cases.
Findings justify moving to prospective real-world trials in community settings where most U.S. cancer care occurs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Successful deployment could narrow survival gaps between community and academic cancer centers by easing data-integration demands.
The dedicated safety layer may become a template for other medical AI tools seeking regulatory clearance.
Different practice settings may need tailored workflow adjustments to realize the reported time savings.
Direct comparison of patient-level outcomes such as recurrence rates or toxicity would provide the clearest next test of value.

Load-bearing premise

That clinician ratings of AI-generated plans on structured vignette summaries will accurately predict performance and patient outcomes in everyday community oncology practice.

What would settle it

A prospective community-based study in which OncoBrain-assisted plans produce more guideline deviations, adverse events, or lower survival than plans made without the system.

Figures

Figures reproduced from arXiv: 2604.20869 by Ali-Musa Jaffer, Alison Sheehan, Alison Walker, Amod Sarnaik, Ashley Layman, Caitlin McMullen, Carlos Garcia Fernandez, Christine Sam, Cydney A. Warfield, Daniel A. Anaya, Daniel Grass, Derrick Legoas, Elier Delgado, Frantz Francisque, Gilmer Valdes, Issam ElNaqa, Jaclyn Parrinello, Jena Schmitz, Jing-Yi Chern, John V. Kiluk, Julio Powsang, Kevin Eaton, Luis Felipe, Mark Honor, Md Muntasir Zitu, Michael Shafique, Michael Vogelbaum, Philippe E. Spiess, Rachael V. Phillips, Robert M. Wenham, Roger Li, Samuel Reynolds, Seth Felder, Talia Berler, Tiago Biachi, Tianshi Liu.

**Figure 1.** Figure 1: OncoBrain evaluation workflow for treatment plan generation. Synthetic case summaries are first reviewed [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

read the original abstract

Background: More than 80% of U.S. cancer care is delivered in community settings, where survival remains worse than at academic centers. Clinicians must integrate genomics, staging, radiology, pathology, and changing guidelines, creating cognitive burden. We evaluated OncoBrain, an AI clinical reasoning platform for oncology treatment-plan generation, as an early step toward OGI. Methods: OncoBrain combines general-purpose LLMs with a cancer-specific graph retrieval-augmented generation layer, a gold-standard treatment-plan corpus as long-term memory, and a model-agnostic safety layer (CHECK) for hallucination detection and suppression. We evaluated clinician-enriched case summaries across gynecologic, genitourinary, neuro-oncology, gastrointestinal/hepatobiliary, and hematologic malignancies. Three clinician groups completed structured evaluations of 173 cases using a common 16-item instrument: subspecialist oncologists reviewed 50 cases, physician reviewers 78, and advanced practice providers 45. Results: Ratings were highest for scientific accuracy, evidence support, and safety, with lower but favorable scores for workflow integration and time savings. On a 5-point scale, mean alignment with evidence and guidelines was 4.60, 4.56, and 4.70 across subspecialists, physician reviewers, and advanced practice providers. Mean scores for absence of safety or misinformation concerns were 4.80, 4.40, and 4.60. Workflow integration averaged 4.50, 3.94, and 4.00; perceived time savings averaged 5.00, 3.89, and 3.60. Conclusions: In this multi-specialty vignette-based evaluation, OncoBrain generated oncology treatment plans judged guideline-concordant, clinically acceptable, and easy to supervise. These findings support the potential of a carefully engineered AI reasoning platform to assist oncology treatment planning and justify prospective real-world evaluation in community settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clinician ratings for OncoBrain look decent on clean vignettes but the design leaves real-world generalization untested.

read the letter

The main thing to know is that this paper reports clinician ratings of an oncology AI system on 173 vignettes, with mean scores of 4.56 to 4.80 out of 5 for guideline alignment and safety across subspecialists, physicians, and advanced practice providers. The multi-specialty evaluation using a shared 16-item instrument is the concrete new element, and the breakdown by reviewer group shows some expected variation in workflow scores that feels honest. The engineering choices—graph RAG layered on general LLMs, a treatment-plan memory corpus, and the CHECK safety filter—are described plainly and match the domain needs for reducing hallucinations in cancer planning. That part is useful as a documented baseline for similar efforts. The soft spots sit in the evaluation format. The cases are clinician-enriched summaries rather than raw charts, so they skip the incomplete labs, ambiguous imaging, and time pressure that define community oncology. No inter-rater reliability numbers or blinding procedure appear in the reported methods, which makes it hard to separate model performance from the tidy input conditions. There are also no links to actual patient outcomes, only ratings. This paper is for researchers and clinicians tracking early AI tools for oncology decision support, especially anyone interested in how structured feedback looks across specialties. Readers who want a clear snapshot of clinician impressions on controlled cases will get value from the numbers. It deserves peer review because the data collection is a real step forward even with the usual vignette limitations, and the concerns about generalization are fixable in follow-up work.

Referee Report

3 major / 2 minor

Summary. The paper evaluates OncoBrain, an AI clinical reasoning platform that integrates general-purpose LLMs with cancer-specific graph RAG, a gold-standard treatment-plan corpus, and a model-agnostic safety layer (CHECK). It reports results from a vignette-based study in which three clinician groups (subspecialist oncologists reviewing 50 cases, physician reviewers 78 cases, and advanced practice providers 45 cases) used a common 16-item instrument to rate 173 multi-specialty oncology cases, yielding mean scores of 4.56–4.80 on a 5-point scale for guideline alignment, safety, and related dimensions.

Significance. If the reported ratings prove robust, the work would provide useful early evidence that carefully engineered LLM-based systems with retrieval and safety layers can produce oncology plans judged guideline-concordant and clinically acceptable by practicing clinicians across five malignancy types. The multi-specialty design and explicit safety checks are positive features. However, the vignette-only format and absence of real-world outcome data or reliability metrics limit the strength of claims about community-oncology utility.

major comments (3)

[Methods] Methods (evaluation protocol): No inter-rater reliability statistics (e.g., Fleiss’ kappa or intraclass correlation) are reported for the 16-item instrument across the 173 cases or within reviewer groups. Without these metrics the mean scores (e.g., 4.60 for evidence alignment) cannot be interpreted as stable indicators of AI quality.
[Methods] Methods and Results: The manuscript provides no quantitative comparison of vignette completeness versus actual community-oncology charts (missing labs, ambiguous imaging, evolving preferences). This omission is load-bearing for the central claim that high ratings (4.56–4.80) demonstrate clinical acceptability, because the skeptic correctly notes that pre-digested summaries omit the data incompleteness and time pressure typical of real cases.
[Results] Results: Blinding procedures for the clinician reviewers are not described. Absence of blinding raises the possibility that ratings partly reflect knowledge of the AI source rather than intrinsic output quality, weakening the interpretation of the safety and guideline-concordance findings.

minor comments (2)

[Abstract] The abstract and results tables would benefit from explicit reporting of confidence intervals or standard deviations around the reported means to allow readers to assess precision.
[Methods] Notation for the 16-item instrument is not fully defined in the main text; a supplementary table listing each item and its exact wording would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments on our manuscript evaluating OncoBrain. We address each major point below, clarifying the study design where needed and indicating revisions to strengthen the presentation of limitations.

read point-by-point responses

Referee: [Methods] Methods (evaluation protocol): No inter-rater reliability statistics (e.g., Fleiss’ kappa or intraclass correlation) are reported for the 16-item instrument across the 173 cases or within reviewer groups. Without these metrics the mean scores (e.g., 4.60 for evidence alignment) cannot be interpreted as stable indicators of AI quality.

Authors: We agree that inter-rater reliability metrics would aid interpretation of rating stability. However, the study assigned each of the 173 cases to a single reviewer within one of the three groups, with no overlapping ratings of the same case. Consequently, statistics such as Fleiss’ kappa or intraclass correlation cannot be computed from the data. We will revise the Methods to describe this single-rater structure explicitly and add it as a limitation in the Discussion, while noting that multi-rater designs in future work could address this. revision: partial
Referee: [Methods] Methods and Results: The manuscript provides no quantitative comparison of vignette completeness versus actual community-oncology charts (missing labs, ambiguous imaging, evolving preferences). This omission is load-bearing for the central claim that high ratings (4.56–4.80) demonstrate clinical acceptability, because the skeptic correctly notes that pre-digested summaries omit the data incompleteness and time pressure typical of real cases.

Authors: We concur that vignette-based evaluations differ from real-world charts in completeness and contextual pressures. Our study employed standardized, clinician-enriched vignettes to enable consistent multi-specialty assessment, which is standard for initial AI evaluations. A direct quantitative comparison was outside this study's scope. We will expand the Discussion to address these differences, their potential impact on ratings, and the justification for prospective real-world studies to evaluate performance under actual clinical conditions. revision: yes
Referee: [Results] Results: Blinding procedures for the clinician reviewers are not described. Absence of blinding raises the possibility that ratings partly reflect knowledge of the AI source rather than intrinsic output quality, weakening the interpretation of the safety and guideline-concordance findings.

Authors: We acknowledge the importance of describing blinding. Reviewers were informed they were evaluating AI-generated plans, as the study objective was to assess clinician acceptance and perceived quality of such outputs. Full blinding was not implemented to maintain a realistic evaluation context. We will update the Methods to detail the evaluation procedures and reviewer awareness, and we will discuss the implications for potential bias as a limitation. revision: yes

standing simulated objections not resolved

Inter-rater reliability statistics cannot be reported because each case was assessed by only one reviewer with no overlaps.

Circularity Check

0 steps flagged

No circularity: evaluation rests on independent clinician ratings of AI outputs

full rationale

The paper reports an empirical multi-specialty vignette evaluation of OncoBrain using a 16-item clinician instrument across 173 cases. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the reported methods or results. Central claims (guideline concordance, clinical acceptability) are grounded in external clinician judgments rather than any reduction to the system's own inputs or prior author work. This is the expected non-finding for a straightforward evaluation study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The evaluation assumes that structured clinician ratings of de-identified vignettes can stand in for prospective clinical utility; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption Clinician ratings on a 16-item instrument provide a valid proxy for treatment plan quality and safety
Invoked in the methods and conclusions sections when interpreting the 4.5+ mean scores as evidence of clinical acceptability.

invented entities (1)

OncoBrain platform no independent evidence
purpose: AI clinical reasoning system for oncology treatment planning
The paper introduces and evaluates this specific engineered system combining LLMs, graph RAG, corpus memory, and CHECK safety layer.

pith-pipeline@v0.9.0 · 5832 in / 1246 out tokens · 23014 ms · 2026-05-14T23:36:05.257776+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

C., Charlton, M

Tucker, T. C., Charlton, M. E., Schroeder, M. C., Jacob, J., Tolle, C. L., Evers, B. M., & Mullett, T. W. (2021). Improving the Quality of Cancer Care in Community Hospitals. Annals of surgical oncology, 28(2), 632–638. https://doi.org/10.1245/s10434-020-08867-y

work page doi:10.1245/s10434-020-08867-y 2021
[2]

Are community oncology practices with or without clinical research programs different? A comparison of patient and practice characteristics

Altomare I, Wang X, Kaur M, et al. Are community oncology practices with or without clinical research programs different? A comparison of patient and practice characteristics. JNCI Cancer Spectr. 2024;8(4):pkae060. doi:10.1093/jncics/pkae060

work page doi:10.1093/jncics/pkae060 2024
[3]

G., Rubin, D

Pfister, D. G., Rubin, D. M., Elkin, E. B., Neill, U. S., Duck, E., Radzyner, M., & Bach, P. B. (2015). Risk Adjusting Survival Outcomes in Hospitals That Treat Patients With Cancer Without Information on Cancer Stage. JAMA oncology, 1(9), 1303–1310. https://doi.org/10.1001/jamaoncol.2015.3151

work page doi:10.1001/jamaoncol.2015.3151 2015
[4]

A., Sun, C

Wolfson, J. A., Sun, C. L., Wyatt, L. P., Hurria, A., & Bhatia, S. (2015). Impact of care at comprehensive cancer centers on outcome: Results from a population-based study. Cancer, 121(21), 3885–3893. https://doi.org/10.1002/cncr.29576

work page doi:10.1002/cncr.29576 2015
[5]

J., Goodney, P

Birkmeyer, N. J., Goodney, P. P., Stukel, T. A., Hillner, B. E., & Birkmeyer, J. D. (2005). Do cancer centers designated by the National Cancer Institute have better surgical outcomes?. Cancer, 103(3), 435–441. https://doi.org/10.1002/cncr.20785

work page doi:10.1002/cncr.20785 2005
[6]

Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer

Fong ZV , Chang DC, Hur C, et al. Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer. Am J Surg. 2020;220(1):29-34. doi:10.1016/j.amjsurg.2020.03.035

work page doi:10.1016/j.amjsurg.2020.03.035 2020
[7]

The role of National Cancer Institute-designated cancer center status: observed variation in surgical care depends on the level of evidence

In H, Neville BA, Lipsitz SR, Corso KA, Weeks JC, Greenberg CC. The role of National Cancer Institute-designated cancer center status: observed variation in surgical care depends on the level of evidence. Ann Surg. 2012;255(5):890-895. doi:10.1097/SLA.0b013e31824deae6

work page doi:10.1097/sla.0b013e31824deae6 2012
[8]

Changes in Length and Complexity of Clinical Practice Guidelines in Oncology, 1996-2019

Kann BH, Johnson SB, Aerts HJWL, Mak RH, Nguyen PL. Changes in Length and Complexity of Clinical Practice Guidelines in Oncology, 1996-2019. JAMA Netw Open. 2020;3(3):e200841. Published 2020 Mar

work page 1996
[9]

doi:10.1001/jamanetworkopen.2020.0841

work page doi:10.1001/jamanetworkopen.2020.0841 2020
[10]

M., Sebire, N., Robinson, R., Peters, C., Sridharan, S., & Pimenta, D

Asgari, E., Kaur, J., Nuredini, G., Balloch, J., Taylor, A. M., Sebire, N., Robinson, R., Peters, C., Sridharan, S., & Pimenta, D. (2024). Impact of Electronic Health Record Use on Cognitive Load and Burnout Among Clinicians: Narrative Review. JMIR medical informatics, 12, e55499. https://doi.org/10.2196/55499

work page doi:10.2196/55499 2024
[11]

A., Branford-White, H., Orrell, L., Osman, A., Bradley, K

Lajmi, N., Alves-Vasconcelos, S., Tsiachristas, A., Haworth, A., Woods, K., Crichton, C., Noble, T., Salih, H., Várnai, K. A., Branford-White, H., Orrell, L., Osman, A., Bradley, K. M., Bonney, L., McGowan, D. R., Davies, J., Prime, M. S., & Hassan, A. B. (2024). Challenges and solutions to system-wide use of precision oncology as the standard of care par...

work page doi:10.1017/pcm.2024.1 2024
[12]

J., Craig, D

Lenz, H. J., Craig, D. W., Johnson, K. C., Verhaak, R., Bhattacharyya, O., Davis, B., Wesley, C., Byron, S. A., Willman, C., Kelley, L., Claus, E. B., Trent, J., Culver, J. O., Gray, S. W., & Church, A. J. (2025). Challenges in the return of molecular tumor profiling results. Journal of the National Cancer Institute, djaf251. Advance online publication. h...

work page doi:10.1093/jnci/djaf251 2025
[13]

Prospects and challenges for clinical decision support in the era of big data

Naqa IE, Kosorok MR, Jin J, Mierzwa M, Ten Haken RK. Prospects and challenges for clinical decision support in the era of big data. JCO Clin Cancer Inform. 2018;2:CCI.18.00002. doi:10.1200/CCI.18.00002

work page doi:10.1200/cci.18.00002 2018
[14]

Nafees, A., Khan, M., Chow, R., Fazelzad, R., Hope, A., Liu, G., Letourneau, D., & Raman, S. (2023). Evaluation of clinical decision support systems in oncology: An updated systematic review. Critical reviews in oncology/hematology, 192, 104143. https://doi.org/10.1016/j.critrevonc.2023.104143

work page doi:10.1016/j.critrevonc.2023.104143 2023
[15]

Lu, Z., Peng, Y ., Cohen, T., Ghassemi, M., Weng, C., & Tian, S. (2024). Large language models in biomedicine and health: current research landscape and future directions. Journal of the American Medical Informatics Association : JAMIA, 31(9), 1801–1811. https://doi.org/10.1093/jamia/ocae202

work page doi:10.1093/jamia/ocae202 2024
[16]

A., & Pimenta, D

Asgari, E., Montaña-Brown, N., Dubois, M., Khalil, S., Balloch, J., Yeung, J. A., & Pimenta, D. (2025). A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. NPJ digital medicine, 8(1),

work page 2025
[17]

https://doi.org/10.1038/s41746-025-01670-7

work page doi:10.1038/s41746-025-01670-7
[18]

Craft, D. (2013). Multi-criteria optimization methods in radiation therapy planning: a review of technologies and directions. arXiv preprint arXiv:1305.1546

work page internal anchor Pith review Pith/arXiv arXiv 2013
[19]

Wong, J. Y . K., Leung, V . W. S., Hung, R. H. M., & Ng, C. K. C. (2024). Comparative Study of Eclipse and RayStation Multi-Criteria Optimization-Based Prostate Radiotherapy Treatment Planning Quality. Diagnostics (Basel, Switzerland), 14(5),

work page 2024
[20]

https://doi.org/10.3390/diagnostics14050465

work page doi:10.3390/diagnostics14050465
[21]

Li, X., Feng, H., Li, J., Huang, H., Kong, Z., & Hu, W. (2025). Effectiveness of RapidPlan in Combination with Multicriteria Optimization for Cervix Radiotherapy Planning. Journal of medical physics, 50(3), 471–479. https://doi.org/10.4103/jmp.jmp_78_25

work page doi:10.4103/jmp.jmp_78_25 2025
[22]

& Valdes, G

Garcia-Fernandez, C., Felipe, L., Shotande, M., Zitu, M., Tripathi, A., Rasool, G., ... & Valdes, G. (2025). Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK. arXiv preprint arXiv:2506.11129

work page arXiv 2025

[1] [1]

C., Charlton, M

Tucker, T. C., Charlton, M. E., Schroeder, M. C., Jacob, J., Tolle, C. L., Evers, B. M., & Mullett, T. W. (2021). Improving the Quality of Cancer Care in Community Hospitals. Annals of surgical oncology, 28(2), 632–638. https://doi.org/10.1245/s10434-020-08867-y

work page doi:10.1245/s10434-020-08867-y 2021

[2] [2]

Are community oncology practices with or without clinical research programs different? A comparison of patient and practice characteristics

Altomare I, Wang X, Kaur M, et al. Are community oncology practices with or without clinical research programs different? A comparison of patient and practice characteristics. JNCI Cancer Spectr. 2024;8(4):pkae060. doi:10.1093/jncics/pkae060

work page doi:10.1093/jncics/pkae060 2024

[3] [3]

G., Rubin, D

Pfister, D. G., Rubin, D. M., Elkin, E. B., Neill, U. S., Duck, E., Radzyner, M., & Bach, P. B. (2015). Risk Adjusting Survival Outcomes in Hospitals That Treat Patients With Cancer Without Information on Cancer Stage. JAMA oncology, 1(9), 1303–1310. https://doi.org/10.1001/jamaoncol.2015.3151

work page doi:10.1001/jamaoncol.2015.3151 2015

[4] [4]

A., Sun, C

Wolfson, J. A., Sun, C. L., Wyatt, L. P., Hurria, A., & Bhatia, S. (2015). Impact of care at comprehensive cancer centers on outcome: Results from a population-based study. Cancer, 121(21), 3885–3893. https://doi.org/10.1002/cncr.29576

work page doi:10.1002/cncr.29576 2015

[5] [5]

J., Goodney, P

Birkmeyer, N. J., Goodney, P. P., Stukel, T. A., Hillner, B. E., & Birkmeyer, J. D. (2005). Do cancer centers designated by the National Cancer Institute have better surgical outcomes?. Cancer, 103(3), 435–441. https://doi.org/10.1002/cncr.20785

work page doi:10.1002/cncr.20785 2005

[6] [6]

Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer

Fong ZV , Chang DC, Hur C, et al. Variation in long-term oncologic outcomes by type of cancer center accreditation: An analysis of a SEER-Medicare population with pancreatic cancer. Am J Surg. 2020;220(1):29-34. doi:10.1016/j.amjsurg.2020.03.035

work page doi:10.1016/j.amjsurg.2020.03.035 2020

[7] [7]

The role of National Cancer Institute-designated cancer center status: observed variation in surgical care depends on the level of evidence

In H, Neville BA, Lipsitz SR, Corso KA, Weeks JC, Greenberg CC. The role of National Cancer Institute-designated cancer center status: observed variation in surgical care depends on the level of evidence. Ann Surg. 2012;255(5):890-895. doi:10.1097/SLA.0b013e31824deae6

work page doi:10.1097/sla.0b013e31824deae6 2012

[8] [8]

Changes in Length and Complexity of Clinical Practice Guidelines in Oncology, 1996-2019

Kann BH, Johnson SB, Aerts HJWL, Mak RH, Nguyen PL. Changes in Length and Complexity of Clinical Practice Guidelines in Oncology, 1996-2019. JAMA Netw Open. 2020;3(3):e200841. Published 2020 Mar

work page 1996

[9] [9]

doi:10.1001/jamanetworkopen.2020.0841

work page doi:10.1001/jamanetworkopen.2020.0841 2020

[10] [10]

M., Sebire, N., Robinson, R., Peters, C., Sridharan, S., & Pimenta, D

Asgari, E., Kaur, J., Nuredini, G., Balloch, J., Taylor, A. M., Sebire, N., Robinson, R., Peters, C., Sridharan, S., & Pimenta, D. (2024). Impact of Electronic Health Record Use on Cognitive Load and Burnout Among Clinicians: Narrative Review. JMIR medical informatics, 12, e55499. https://doi.org/10.2196/55499

work page doi:10.2196/55499 2024

[11] [11]

A., Branford-White, H., Orrell, L., Osman, A., Bradley, K

Lajmi, N., Alves-Vasconcelos, S., Tsiachristas, A., Haworth, A., Woods, K., Crichton, C., Noble, T., Salih, H., Várnai, K. A., Branford-White, H., Orrell, L., Osman, A., Bradley, K. M., Bonney, L., McGowan, D. R., Davies, J., Prime, M. S., & Hassan, A. B. (2024). Challenges and solutions to system-wide use of precision oncology as the standard of care par...

work page doi:10.1017/pcm.2024.1 2024

[12] [12]

J., Craig, D

Lenz, H. J., Craig, D. W., Johnson, K. C., Verhaak, R., Bhattacharyya, O., Davis, B., Wesley, C., Byron, S. A., Willman, C., Kelley, L., Claus, E. B., Trent, J., Culver, J. O., Gray, S. W., & Church, A. J. (2025). Challenges in the return of molecular tumor profiling results. Journal of the National Cancer Institute, djaf251. Advance online publication. h...

work page doi:10.1093/jnci/djaf251 2025

[13] [13]

Prospects and challenges for clinical decision support in the era of big data

Naqa IE, Kosorok MR, Jin J, Mierzwa M, Ten Haken RK. Prospects and challenges for clinical decision support in the era of big data. JCO Clin Cancer Inform. 2018;2:CCI.18.00002. doi:10.1200/CCI.18.00002

work page doi:10.1200/cci.18.00002 2018

[14] [14]

Nafees, A., Khan, M., Chow, R., Fazelzad, R., Hope, A., Liu, G., Letourneau, D., & Raman, S. (2023). Evaluation of clinical decision support systems in oncology: An updated systematic review. Critical reviews in oncology/hematology, 192, 104143. https://doi.org/10.1016/j.critrevonc.2023.104143

work page doi:10.1016/j.critrevonc.2023.104143 2023

[15] [15]

Lu, Z., Peng, Y ., Cohen, T., Ghassemi, M., Weng, C., & Tian, S. (2024). Large language models in biomedicine and health: current research landscape and future directions. Journal of the American Medical Informatics Association : JAMIA, 31(9), 1801–1811. https://doi.org/10.1093/jamia/ocae202

work page doi:10.1093/jamia/ocae202 2024

[16] [16]

A., & Pimenta, D

Asgari, E., Montaña-Brown, N., Dubois, M., Khalil, S., Balloch, J., Yeung, J. A., & Pimenta, D. (2025). A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. NPJ digital medicine, 8(1),

work page 2025

[17] [17]

https://doi.org/10.1038/s41746-025-01670-7

work page doi:10.1038/s41746-025-01670-7

[18] [18]

Craft, D. (2013). Multi-criteria optimization methods in radiation therapy planning: a review of technologies and directions. arXiv preprint arXiv:1305.1546

work page internal anchor Pith review Pith/arXiv arXiv 2013

[19] [19]

Wong, J. Y . K., Leung, V . W. S., Hung, R. H. M., & Ng, C. K. C. (2024). Comparative Study of Eclipse and RayStation Multi-Criteria Optimization-Based Prostate Radiotherapy Treatment Planning Quality. Diagnostics (Basel, Switzerland), 14(5),

work page 2024

[20] [20]

https://doi.org/10.3390/diagnostics14050465

work page doi:10.3390/diagnostics14050465

[21] [21]

Li, X., Feng, H., Li, J., Huang, H., Kong, Z., & Hu, W. (2025). Effectiveness of RapidPlan in Combination with Multicriteria Optimization for Cervix Radiotherapy Planning. Journal of medical physics, 50(3), 471–479. https://doi.org/10.4103/jmp.jmp_78_25

work page doi:10.4103/jmp.jmp_78_25 2025

[22] [22]

& Valdes, G

Garcia-Fernandez, C., Felipe, L., Shotande, M., Zitu, M., Tripathi, A., Rasool, G., ... & Valdes, G. (2025). Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK. arXiv preprint arXiv:2506.11129

work page arXiv 2025