From Fuzzy to Formal: Scaling Hospital Quality Improvement with AI

Amy Ou; Christopher Peabody; Christopher Ross; Dana Freiser; Hemal Kanzaria; James Marks; Jean Feng; Lucas Zier; Patrick Vossler; Robert Gallo

arxiv: 2604.20055 · v1 · submitted 2026-04-21 · 💻 cs.AI · cs.HC

From Fuzzy to Formal: Scaling Hospital Quality Improvement with AI

Patrick Vossler , Jean Feng , Venkat Sivaraman , Robert Gallo , Hemal Kanzaria , Dana Freiser , Christopher Ross , Amy Ou

show 4 more authors

James Marks Susan Ehrlich Christopher Peabody Lucas Zier

This is my paper

Pith reviewed 2026-05-10 01:40 UTC · model grok-4.3

classification 💻 cs.AI cs.HC

keywords hospital quality improvementAI for healthcarefactor discoveryhuman-AI co-optimizationmodifiable factorsLean healthcarereadmissionslength of stay

0 comments

The pith

AI pipelines for hospital quality improvement reach at least 70 percent concordance with expert annotations by iteratively co-optimizing natural-language specifications and models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to turn the traditionally fuzzy process of discovering modifiable factors in hospital quality improvement into a more formal, scalable AI-supported workflow. Experts and AI agents jointly refine both high-level natural language goals and the specific AI pipelines until the outputs align closely with expert judgments on clinical problems such as prolonged length of stay and readmissions. This approach recovered prior manual findings, identified additional actionable factors, ran much faster than classic Lean methods, and left behind traceable reasoning steps. A sympathetic reader would care because current QI work is slow, hard to reproduce across sites, and limited in how many cases it can examine.

Core claim

The authors map QI factor discovery to the classical AI/ML steps of problem formalization, model learning, and validation, treating the overarching natural-language specifications as tunable hyperparameters. Domain experts and AI agents iteratively adjust both the specifications and the pipeline until AI extractions reach at least 70 percent concordance with expert annotations while remaining aligned with clinical objectives. When applied at an urban safety-net hospital, the resulting pipelines recovered findings from earlier manual Lean analyses, surfaced new modifiable factors, ran with far greater efficiency, and generated auditable reasoning traces.

What carries the argument

The Human-AI Spec-Solution Co-optimization framework, which treats natural-language specifications as adjustable hyperparameters and iteratively refines both the specifications and the AI pipeline until outputs match expert annotations on exploratory clinical tasks.

If this is right

The AI pipeline recovers prior manual Lean findings while running substantially faster.
Additional modifiable factors can be identified that were not found in earlier analyses.
The process generates auditable reasoning traces for every extracted factor.
High concordance with experts supports applying the same workflow to other hospital conditions or sites.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The co-optimization loop could be tested on other exploratory clinical tasks such as root-cause analysis for adverse events.
If the method generalizes, hospitals might run QI reviews on larger patient cohorts without proportional increases in expert time.
Auditable traces might increase clinician trust in AI-assisted QI compared with opaque black-box outputs.

Load-bearing premise

That repeatedly adjusting natural-language specifications and AI pipelines to match expert annotations will identify the true modifiable clinical factors without bias from the co-optimization loop or the particular experts chosen.

What would settle it

An independent panel of clinicians not involved in the refinement process rates the AI-surfaced factors as substantially less actionable or misses known key drivers that traditional chart reviews had identified.

Figures

Figures reproduced from arXiv: 2604.20055 by Amy Ou, Christopher Peabody, Christopher Ross, Dana Freiser, Hemal Kanzaria, James Marks, Jean Feng, Lucas Zier, Patrick Vossler, Robert Gallo, Susan Ehrlich, Venkat Sivaraman.

**Figure 1.** Figure 1: Scaling up QI factor discovery using AI requires acknowledging that it is inherently complex, exploratory, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: To facilitate validation collaboration between QI team and AI agent as part of Human-AI Spec-Solution [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Example of an online implementation of Human-AI Spec-Solution Co-Optimization, which is more time [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: AI pipeline evaluation and calibration results for LOS (left) and 30-day unplanned readmission (right). [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of manual Lean analysis and AI pipeline results for prolonged LOS. Left: the six categories [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 8.** Figure 8: Comparison of manual Lean A3 analysis and AI pipeline results for 30-day unplanned readmission. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Hospital Quality Improvement (QI) plays a critical role in optimizing healthcare delivery by translating high-level hospital goals into actionable solutions. A critical step of QI is to identify the key modifiable contributing factors, a process we call QI factor discovery, typically through expert-driven semi-structured qualitative tools like fishbone diagrams, chart reviews, and Lean Healthcare methods. AI has the potential to transform and accelerate QI factor discovery, which is traditionally time- and resource-intensive and limited in reproducibility and auditability. Nevertheless, current AI alignment methods assume the task is well-defined, whereas QI factor discovery is an exploratory, fuzzy, and iterative sense-making process that relies on complex implicit expert judgments. To design an AI pipeline that formalizes the QI process while preserving its exploratory components, we propose viewing the task as learning not only LLM prompts but also the overarching natural-language specifications. In particular, we map QI factor discovery to steps of the classical AI/ML development process (problem formalization, model learning, and model validation) where the specifications are tunable hyperparameters. Domain experts and AI agents iteratively refine both the overarching specifications and AI pipeline until AI extractions are concordant with expert annotations and aligned with clinical objectives. We applied this "Human-AI Spec-Solution Co-optimization" framework at an urban safety-net hospital to identify factors driving prolonged length of stay and unplanned 30-day readmissions. The resulting AI-for-QI pipelines achieved $\ge 70\%$ concordance with expert annotations. Compared to prior manual Lean analyses, the AI pipeline was substantially more efficient, recovered previous findings, surfaced new modifiable factors, and produced auditable reasoning traces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The co-optimization framework for hospital QI is a practical step but validation remains circular.

read the letter

The paper introduces a framework called Human-AI Spec-Solution Co-optimization for turning the vague task of finding quality improvement factors in hospitals into a more structured AI process. Experts and the model refine both the high-level specifications and the specific prompts together until the AI matches what the experts think. This is new in how it explicitly links QI steps to ML pipeline stages and makes the specifications adjustable like hyperparameters. In their hospital case, the approach was quicker than traditional methods, found the expected factors plus extras, and left behind traceable outputs that show how decisions were made. What works is the practical focus and the emphasis on keeping human judgment in the loop while adding auditability. It shows one way to scale expert-driven analysis without losing the exploratory side. The weak point is the validation. The 70% concordance is measured against annotations that come from the same co-optimization process, so it does not test against an independent standard. No information appears on held-out data, multiple expert raters, or whether the factors actually led to better patient results. This makes the efficiency and discovery claims harder to evaluate fully. The paper is for people in healthcare AI and quality improvement who are looking for ways to automate parts of their workflow. It offers a template that could be tried in other settings. It deserves peer review to sort out the validation issues and see if the full methods section provides stronger support.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a 'Human-AI Spec-Solution Co-optimization' framework that treats QI factor discovery as an iterative process of jointly refining natural-language specifications and LLM pipelines, mapped to classical ML development stages. Applied to prolonged length of stay and 30-day readmissions at an urban safety-net hospital, it reports that the resulting pipelines achieve ≥70% concordance with expert annotations, recover prior manual Lean findings, identify additional modifiable factors, and generate auditable traces while being substantially more efficient than traditional expert-driven methods.

Significance. If the validation methodology can be strengthened with independent benchmarks, the framework offers a principled way to scale exploratory QI processes while retaining expert oversight and auditability. This could meaningfully accelerate identification of actionable clinical factors in resource-constrained settings and provide a template for applying LLMs to other ill-defined, iterative sense-making tasks in healthcare and beyond.

major comments (2)

[Abstract] Abstract: The central empirical claim of ≥70% concordance with expert annotations is stated without any accompanying details on annotation sample size, how concordance was operationalized (e.g., exact vs. partial match, per-factor vs. per-case), inter-rater reliability among the experts, exclusion criteria, or the existence of a held-out validation set separate from the iterative co-optimization loop. These omissions make it impossible to assess whether the reported figure supports the claim of successful formalization.
[Abstract] Abstract and methods description of Human-AI Spec-Solution Co-optimization: The iterative refinement of specifications and pipelines continues until AI outputs match the same expert annotations used as the target. This setup creates a circularity risk in which concordance is achieved by construction rather than by independent discovery; no pre-specified validation protocol, blinded external expert panel, or correlation with downstream clinical outcomes is described to separate genuine factor identification from overfitting to the co-optimization process or the particular experts involved.

minor comments (2)

[Abstract] The abstract would be strengthened by a single sentence specifying the clinical department or patient cohort studied and the total number of cases or factors annotated.
The mapping of QI steps to ML development stages (problem formalization, model learning, model validation) is conceptually clear but would benefit from an explicit diagram or table showing which tunable natural-language elements correspond to each stage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. These have prompted us to strengthen the clarity of our validation methodology and address potential concerns about circularity. We provide point-by-point responses below and have made revisions to the abstract and methods sections accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claim of ≥70% concordance with expert annotations is stated without any accompanying details on annotation sample size, how concordance was operationalized (e.g., exact vs. partial match, per-factor vs. per-case), inter-rater reliability among the experts, exclusion criteria, or the existence of a held-out validation set separate from the iterative co-optimization loop. These omissions make it impossible to assess whether the reported figure supports the claim of successful formalization.

Authors: We agree that the abstract would benefit from additional methodological context to allow independent evaluation of the concordance claim. Although the Methods section of the manuscript describes the annotation process, sample characteristics, exact per-factor matching for concordance, inter-rater reliability, and the separation of cases for validation, we have revised the abstract to concisely incorporate these elements. The updated abstract now summarizes the key validation details to improve transparency and standalone readability. revision: yes
Referee: [Abstract] Abstract and methods description of Human-AI Spec-Solution Co-optimization: The iterative refinement of specifications and pipelines continues until AI outputs match the same expert annotations used as the target. This setup creates a circularity risk in which concordance is achieved by construction rather than by independent discovery; no pre-specified validation protocol, blinded external expert panel, or correlation with downstream clinical outcomes is described to separate genuine factor identification from overfitting to the co-optimization process or the particular experts involved.

Authors: We appreciate this important observation about the risk of circularity. The framework intentionally uses iterative refinement, but the manuscript already employs a pre-specified protocol that partitions cases into those used for co-optimization and a separate held-out set for final concordance evaluation, along with independent comparison against prior manual Lean findings. To make this separation explicit and address the referee's concern, we have revised the methods description to detail the data partitioning, validation protocol, and evaluation criteria. We have also added a limitations paragraph acknowledging that a fully blinded external expert panel and direct correlation with clinical outcomes would provide further safeguards against overfitting and are planned for future extensions of this work. revision: yes

Circularity Check

1 steps flagged

Co-optimization of specs and pipelines risks circular concordance without independent validation

specific steps

fitted input called prediction [Abstract (Human-AI Spec-Solution Co-optimization description)]
"Domain experts and AI agents iteratively refine both the overarching specifications and AI pipeline until AI extractions are concordant with expert annotations and aligned with clinical objectives. We applied this 'Human-AI Spec-Solution Co-optimization' framework ... The resulting AI-for-QI pipelines achieved ≥70% concordance with expert annotations."

The ≥70% concordance is not measured on a fixed, pre-specified test set or external benchmark; it is the explicit stopping criterion of the joint refinement process. Specifications and pipelines are tuned until the AI outputs match the expert annotations, after which the match rate is reported as evidence of effectiveness. This renders the numerical result tautological with the optimization procedure rather than an independent validation of discovered modifiable factors.

full rationale

The paper's central performance claim (≥70% concordance) is obtained by iteratively refining natural-language specifications and LLM pipelines against the same expert annotations used as the target, with refinement continuing until concordance is reached. This makes the reported metric equivalent to the tuning procedure by construction rather than an independent test. No held-out annotation set, pre-specified validation split, inter-rater reliability metrics, or external clinical outcome benchmark is described. The approach may still be practically useful for exploration, but the derivation of 'success' reduces to the co-optimization loop itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; the framework implicitly assumes expert annotations serve as reliable ground truth and that iterative refinement converges to clinically valid factors without further specification.

pith-pipeline@v0.9.0 · 5624 in / 1128 out tokens · 59162 ms · 2026-05-10T01:40:04.353842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

contributing operational factors that, if modified, would likely shorten hospital stay

Problem Formalization ObjectiveIdentify operational factors that would shorten hospital length of stay: “contributing operational factors that, if modified, would likely shorten hospital stay.” PopulationAdult inpatients in the top five DRGs at ZSFG (Sepsis, Skin and Soft Tissue Infection, Ischemic Stroke, Blunt Head Injury, Alcohol Use Disorder) with LOS...

work page
[2]

Model Learning Estimator inputsAll clinical notes and order events for the inpatient encounter, concatenated into a single text block. Estimator outputSingle JSON object: (1) Gantt chart with timestamped events, (2) list of contributing factors, each with a combinedexplanationfield,relevant quotes, andconfidence(0–3 Likert). Model familyGPT-5 Mini (gpt-5-...

work page 2025
[3]

How output is validatedLow-cost single reader (data scientist), 2 patients

Model Validation 20 What gets validatedGantt chart, extracted factors, explanation, quotes, confidence scores. How output is validatedLow-cost single reader (data scientist), 2 patients. G.1.2 Final Specifications Prompt author:Jean.Reviewed by:Group review (Luke, Ross, Hemal, Toff, Rob, Dana) — 52 patients, 4 reviewed by all annotators.LLM:Claude Opus 4.5

work page
[4]

This contributing factor is a modifiable gap that if improved would streamline patient flow

Problem Formalization Objective“This contributing factor is a modifiable gap that if improved would streamline patient flow.” Bed capacity explicitly excluded as a directly modifiable factor. PopulationAdult inpatients in the top five DRGs at ZSFG with LOS between 4 and 20 days. (Unchanged from v1.) Label definitionExpert annotation: 1–5 Likert scale. AI ...

work page
[5]

Model Learning Estimator inputsAll clinical notes and order events for the inpatient encounter. (Unchanged from v1.) Estimator outputThree-stage: (1) Gantt chart JSON with timestamped events; (2) per factor:reason, explanation support,explanation contrary,relevant quotes,process improvement; (3)confidence (0–100%) in separate LLM call. Three clinician-ann...

work page
[6]

How output is validatedHigh-cost multi-reader (six clinical experts)

Model Validation What gets validatedGantt chart, factors, supportive/contrary reasoning, quotes, process improvements, confidence scores. How output is validatedHigh-cost multi-reader (six clinical experts). 1–5 Likert scale via validation UI. 52 patients; 4 reviewed by all annotators. Inter-rater exact agreement: 31.5%, within-one-point: 72.6%. LLM-human...

work page
[7]

This contributing factor is a modifiable gap that if improved would likely have prevented this readmission

Problem Formalization Objective“This contributing factor is a modifiable gap that if improved would likely have prevented this readmission.” Factors must be both modifiable and causal. PopulationAll adult patients with 30-day unplanned readmissions at ZSFG. No diagnosis group filtering. Label definition0–100% probability, rounded to nearest decile

work page
[8]

Readmission: ED note, admission note, H&P

Model Learning Estimator inputsIndex admission: admission notes. Readmission: ED note, admission note, H&P. No outpatient, consult, or discharge instruction notes. 21 Estimator outputTwo-stage: (1) Gantt chart + factor extraction in one call (withreadmission summary), each factor includingreason,explanation support,explanation contrary,relevant quotes, pr...

work page
[9]

How output is validatedLow-cost multi-reader (data scientist + clinicians), 4 patients

Model Validation What gets validatedGantt chart, factors, supportive/contrary reasoning, quotes, process improvements, confidence scores. How output is validatedLow-cost multi-reader (data scientist + clinicians), 4 patients. Only first reason reviewed per patient. G.2.2 Final Specifications Prompt author:Jean.Reviewed by:Group review (Luke, Ross, Hemal, ...

work page
[10]

This contributing factor is a modifiable gap that if improved would reduce readmission risk

Problem Formalization Objective“This contributing factor is a modifiable gap that if improved would reduce readmission risk.” The AND requirement made explicit: “we are looking for factors that are both modifiable AND causal.” PopulationCMS readmission diagnosis groups: COPD, Heart Failure, AMI, Pneumonia. (Unchanged from v5.) Label definitionExpert annot...

work page
[11]

Post-Discharge Care Coordination

Model Learning Estimator inputsIndex admission: consult notes, discharge summary, discharge instructions. Intervening outpatient notes. Readmission: ED provider note, H&P, discharge summary. Excluded: care plan notes, readmission consult notes. Estimator outputThree-stage: (1) Gantt chart spanning index admission through readmission; (2) per fac- tor:reas...

work page
[12]

How output is validatedHigh-cost multi-reader (six clinical experts)

Model Validation What gets validatedGantt chart, factors, supportive/contrary reasoning, quotes, process improvements, confidence scores. How output is validatedHigh-cost multi-reader (six clinical experts). 1–5 Likert scale via validation UI. 52 patients; 4 reviewed by all annotators. Inter-rater exact agreement: 23.0%, within-one-point: 72.5%. LLM-human...

work page
[13]

**Map the patient journey, emphasizing events that extended LOS**: Capture essential care phases, major treatments, and delays that extended the hospital stay,→

work page
[14]

actual discharge,→

**Include entire hospital timeline**: Cover admission through discharge, noting when the patient was medically ready for discharge vs. actual discharge,→

work page
[15]

**Identify bottlenecks**: Note waiting periods, care coordination delays, resource availability issues, etc (if any),→

work page
[16]

index_admission_summary

**Assign event timings**: Assign event timing. If exact timestamps aren't available, provide reasonable estimates and mark them as approximate. If there are important events that extend beyond discharge, set the end timestamp to the time of discharge. ,→ ,→ Event categories to consider, though you can introduce others: - **admission**: Initial care phases...

work page 2024
[17]

Going through the Gantt chart, identify opportunities where there was excessive delay, suboptimal coordination/processes, or prolonged duration, which likely led to LOS being lengthened by 12+ hours. Only list contributors that are actionable, such as resource availability, guideline-directed medical therapy, care coordination issues; avoid listing a pati...

work page
[18]

Explanation Support: Provide detailed step-by-step reasoning for why this represents a contributing factor that led to prolonged LOS or suboptimal patient flow, referencing both the Gantt chart timeline and clinical notes when applicable. ,→ ,→

work page
[19]

Explanation Contrary: Provide explanations for why this factor may not need to be or cannot be optimized further.,→

work page
[20]

Quotes should support all components of your explanation.,→

Relevant Quotes: For each identified contributing factor, provide EXACT quotes (word-for-word) from the note. Quotes should support all components of your explanation.,→

work page
[21]

reasons": [ {

Process Improvement: For each factor, describe what specific process change could be implemented, which may ultimately shorten LOS. Focus on timing and workflow changes within the hospital's control.,→ Example categories of factors to consider: HIGHLY ACTIONABLE factors (should be assigned high confidence >= 90- Lack of weekend hospital services 24 => Add...

work page
[22]

Explanation Support: A detailed step-by-step reasoning for why this represents a contributing factor that is a modifiable gap and, if improved, would decrease inpatient length of stay or streamline patient flow. ,→ ,→

work page
[23]

Explanation Contrary: Explanations for why this factor may not need to be or cannot be optimized further.,→

work page
[24]

Relevant Quotes: For each identified contributing factor, quotes from the note supporting the explanations.,→ 26

work page
[25]

This contributing factor is a modifiable gap that if improved would streamline patient flow

Process Improvement: For each factor, specific process changes that could be implemented, which may ultimately shorten LOS.,→ Here is the list of operational factors you and the team listed: <EXTRACTED FACTORS JSON FROM STAGE 2 INSERTED HERE> YOUR TASK: Assign a confidence probability (0-100) for the following statement: "This contributing factor is a mod...

work page
[28]

Focus on the discharge process, any transitions of care, post-discharge period, any intervening outside hospital readmissions, and events leading to the readmission

Notes from the READMISSION, including ED provider note, H&P, and Discharge Summary (when patient returned within 30 days),→ <CLINICAL NOTES INSERTED HERE> 28 === STEP 1: VALUE STREAM MAPPING - PATIENT JOURNEY GANTT CHART === Create a Gantt chart that maps the key phases of the patient's journey from the index admission through readmission at the same hosp...

work page
[29]

**Map the patient journey from index discharge to readmission**: Capture key events during the index admission that relate to discharge planning, the post-discharge period, and all unplanned hospital readmissions ,→ ,→

work page
[30]

**Include the full timeline**: Cover the index admission discharge planning through the readmission, noting key transitions,→

work page
[31]

(if any),→

**Identify potential gaps**: Note missed follow-up appointments, medication issues, inadequate discharge planning, premature discharge, all unplanned readmissions, etc. (if any),→

work page
[32]

index_admission_summary

**Assign event timings**: Assign event timing. If exact timestamps aren't available, provide reasonable estimates.,→ Events to consider extracting: - **index_admission**: Index admission event - **ED/readmission**: Subsequent ED visits or readmissions - **treatment**: treatments given during index admission or readmission - **procedure**: procedures given...

work page 2024
[33]

Consult notes, discharge summary, and discharge instructions from the INDEX admission (the initial hospitalization),→

work page
[34]

Intervening outpatient notes

work page
[35]

This is an AND statement -- we are looking for factors that are both modifiable AND causal

Notes from the READMISSION, including ED provider note, H&P, and Discharge Summary (when patient returned within 30 days),→ <CLINICAL NOTES INSERTED HERE> === STEP 1: VALUE STREAM MAPPING - PATIENT JOURNEY GANTT CHART === The following Gantt chart has already been created mapping the key phases of the patient's journey: <GANTT CHART JSON FROM STAGE 1 INSE...

work page
[36]

List at most 5 factors

For each potential modifiable factor, identify likely causal chains: - What specific decision, action, or omission occurred during the index admission or post-discharge period?,→ - How did this directly lead to the clinical state that required readmission? - If this had been different, would the readmission likely have been prevented? Prioritize MODIFIABL...

work page
[37]

Show how this factor led to readmission.,→

Explanation Support: Provide the causal reasoning, referencing specific events from the Gantt chart and exact details from the clinical notes. Show how this factor led to readmission.,→

work page
[38]

Explanation Contrary: Provide explanations for why this factor may not have been a cause or contributor to the readmission outcome.,→

work page
[39]

Relevant Quotes: Provide EXACT quotes (word-for-word) from the notes that support the causal chain

work page
[40]

reasons": [ {

Process Improvement: Describe specific process changes that could have been implemented to address this factor and reduced readmission risk. Provide evidence-based recommendations. If the hospital is already following these evidence-based practices but doing so incompletely, emphasize the specific aspects that need improvement. ,→ ,→ ,→ 30 Example categor...

work page
[41]

The discharge summary from the INDEX admission (the initial hospitalization)

work page
[42]

The admission note from the READMISSION (when patient returned within 30 days)

work page
[43]

It focuses on the transition of care, post-discharge period, and events leading to the readmission

The discharge summary from the READMISSION <CLINICAL NOTES INSERTED HERE> === STEP 1: VALUE STREAM MAPPING - PATIENT JOURNEY GANTT CHART === This is the Gantt chart that you and the team created, which maps the key phases of the patient's journey from the index admission through the readmission. It focuses on the transition of care, post-discharge period,...

work page
[44]

Explanation Support: A detailed step-by-step reasoning for why this represents a contributing factor that is a modifiable gap and, if improved, would likely have prevented this readmission.,→ 32

work page
[45]

Explanation Contrary: Explanations for why this factor may not have been preventable or may not have changed the outcome.,→

work page
[46]

Relevant Quotes: For each identified contributing factor, quotes from the notes supporting the explanations.,→

work page
[47]

This contributing factor is a modifiable gap that if improved would reduce readmission risk

Process Improvement: For each factor, specific process changes that could be implemented, which may ultimately prevent similar readmissions.,→ Here is the list of factors you and the team listed: <EXTRACTED FACTORS JSON FROM STAGE 2 INSERTED HERE> YOUR TASK: Assign a confidence probability (0-100) for the following statement: "This contributing factor is ...

work page

[1] [1]

contributing operational factors that, if modified, would likely shorten hospital stay

Problem Formalization ObjectiveIdentify operational factors that would shorten hospital length of stay: “contributing operational factors that, if modified, would likely shorten hospital stay.” PopulationAdult inpatients in the top five DRGs at ZSFG (Sepsis, Skin and Soft Tissue Infection, Ischemic Stroke, Blunt Head Injury, Alcohol Use Disorder) with LOS...

work page

[2] [2]

Model Learning Estimator inputsAll clinical notes and order events for the inpatient encounter, concatenated into a single text block. Estimator outputSingle JSON object: (1) Gantt chart with timestamped events, (2) list of contributing factors, each with a combinedexplanationfield,relevant quotes, andconfidence(0–3 Likert). Model familyGPT-5 Mini (gpt-5-...

work page 2025

[3] [3]

How output is validatedLow-cost single reader (data scientist), 2 patients

Model Validation 20 What gets validatedGantt chart, extracted factors, explanation, quotes, confidence scores. How output is validatedLow-cost single reader (data scientist), 2 patients. G.1.2 Final Specifications Prompt author:Jean.Reviewed by:Group review (Luke, Ross, Hemal, Toff, Rob, Dana) — 52 patients, 4 reviewed by all annotators.LLM:Claude Opus 4.5

work page

[4] [4]

This contributing factor is a modifiable gap that if improved would streamline patient flow

Problem Formalization Objective“This contributing factor is a modifiable gap that if improved would streamline patient flow.” Bed capacity explicitly excluded as a directly modifiable factor. PopulationAdult inpatients in the top five DRGs at ZSFG with LOS between 4 and 20 days. (Unchanged from v1.) Label definitionExpert annotation: 1–5 Likert scale. AI ...

work page

[5] [5]

Model Learning Estimator inputsAll clinical notes and order events for the inpatient encounter. (Unchanged from v1.) Estimator outputThree-stage: (1) Gantt chart JSON with timestamped events; (2) per factor:reason, explanation support,explanation contrary,relevant quotes,process improvement; (3)confidence (0–100%) in separate LLM call. Three clinician-ann...

work page

[6] [6]

How output is validatedHigh-cost multi-reader (six clinical experts)

Model Validation What gets validatedGantt chart, factors, supportive/contrary reasoning, quotes, process improvements, confidence scores. How output is validatedHigh-cost multi-reader (six clinical experts). 1–5 Likert scale via validation UI. 52 patients; 4 reviewed by all annotators. Inter-rater exact agreement: 31.5%, within-one-point: 72.6%. LLM-human...

work page

[7] [7]

This contributing factor is a modifiable gap that if improved would likely have prevented this readmission

Problem Formalization Objective“This contributing factor is a modifiable gap that if improved would likely have prevented this readmission.” Factors must be both modifiable and causal. PopulationAll adult patients with 30-day unplanned readmissions at ZSFG. No diagnosis group filtering. Label definition0–100% probability, rounded to nearest decile

work page

[8] [8]

Readmission: ED note, admission note, H&P

Model Learning Estimator inputsIndex admission: admission notes. Readmission: ED note, admission note, H&P. No outpatient, consult, or discharge instruction notes. 21 Estimator outputTwo-stage: (1) Gantt chart + factor extraction in one call (withreadmission summary), each factor includingreason,explanation support,explanation contrary,relevant quotes, pr...

work page

[9] [9]

How output is validatedLow-cost multi-reader (data scientist + clinicians), 4 patients

Model Validation What gets validatedGantt chart, factors, supportive/contrary reasoning, quotes, process improvements, confidence scores. How output is validatedLow-cost multi-reader (data scientist + clinicians), 4 patients. Only first reason reviewed per patient. G.2.2 Final Specifications Prompt author:Jean.Reviewed by:Group review (Luke, Ross, Hemal, ...

work page

[10] [10]

This contributing factor is a modifiable gap that if improved would reduce readmission risk

Problem Formalization Objective“This contributing factor is a modifiable gap that if improved would reduce readmission risk.” The AND requirement made explicit: “we are looking for factors that are both modifiable AND causal.” PopulationCMS readmission diagnosis groups: COPD, Heart Failure, AMI, Pneumonia. (Unchanged from v5.) Label definitionExpert annot...

work page

[11] [11]

Post-Discharge Care Coordination

Model Learning Estimator inputsIndex admission: consult notes, discharge summary, discharge instructions. Intervening outpatient notes. Readmission: ED provider note, H&P, discharge summary. Excluded: care plan notes, readmission consult notes. Estimator outputThree-stage: (1) Gantt chart spanning index admission through readmission; (2) per fac- tor:reas...

work page

[12] [12]

How output is validatedHigh-cost multi-reader (six clinical experts)

Model Validation What gets validatedGantt chart, factors, supportive/contrary reasoning, quotes, process improvements, confidence scores. How output is validatedHigh-cost multi-reader (six clinical experts). 1–5 Likert scale via validation UI. 52 patients; 4 reviewed by all annotators. Inter-rater exact agreement: 23.0%, within-one-point: 72.5%. LLM-human...

work page

[13] [13]

**Map the patient journey, emphasizing events that extended LOS**: Capture essential care phases, major treatments, and delays that extended the hospital stay,→

work page

[14] [14]

actual discharge,→

**Include entire hospital timeline**: Cover admission through discharge, noting when the patient was medically ready for discharge vs. actual discharge,→

work page

[15] [15]

**Identify bottlenecks**: Note waiting periods, care coordination delays, resource availability issues, etc (if any),→

work page

[16] [16]

index_admission_summary

**Assign event timings**: Assign event timing. If exact timestamps aren't available, provide reasonable estimates and mark them as approximate. If there are important events that extend beyond discharge, set the end timestamp to the time of discharge. ,→ ,→ Event categories to consider, though you can introduce others: - **admission**: Initial care phases...

work page 2024

[17] [17]

Going through the Gantt chart, identify opportunities where there was excessive delay, suboptimal coordination/processes, or prolonged duration, which likely led to LOS being lengthened by 12+ hours. Only list contributors that are actionable, such as resource availability, guideline-directed medical therapy, care coordination issues; avoid listing a pati...

work page

[18] [18]

Explanation Support: Provide detailed step-by-step reasoning for why this represents a contributing factor that led to prolonged LOS or suboptimal patient flow, referencing both the Gantt chart timeline and clinical notes when applicable. ,→ ,→

work page

[19] [19]

Explanation Contrary: Provide explanations for why this factor may not need to be or cannot be optimized further.,→

work page

[20] [20]

Quotes should support all components of your explanation.,→

Relevant Quotes: For each identified contributing factor, provide EXACT quotes (word-for-word) from the note. Quotes should support all components of your explanation.,→

work page

[21] [21]

reasons": [ {

Process Improvement: For each factor, describe what specific process change could be implemented, which may ultimately shorten LOS. Focus on timing and workflow changes within the hospital's control.,→ Example categories of factors to consider: HIGHLY ACTIONABLE factors (should be assigned high confidence >= 90- Lack of weekend hospital services 24 => Add...

work page

[22] [22]

Explanation Support: A detailed step-by-step reasoning for why this represents a contributing factor that is a modifiable gap and, if improved, would decrease inpatient length of stay or streamline patient flow. ,→ ,→

work page

[23] [23]

Explanation Contrary: Explanations for why this factor may not need to be or cannot be optimized further.,→

work page

[24] [24]

Relevant Quotes: For each identified contributing factor, quotes from the note supporting the explanations.,→ 26

work page

[25] [25]

This contributing factor is a modifiable gap that if improved would streamline patient flow

Process Improvement: For each factor, specific process changes that could be implemented, which may ultimately shorten LOS.,→ Here is the list of operational factors you and the team listed: <EXTRACTED FACTORS JSON FROM STAGE 2 INSERTED HERE> YOUR TASK: Assign a confidence probability (0-100) for the following statement: "This contributing factor is a mod...

work page

[26] [28]

Focus on the discharge process, any transitions of care, post-discharge period, any intervening outside hospital readmissions, and events leading to the readmission

Notes from the READMISSION, including ED provider note, H&P, and Discharge Summary (when patient returned within 30 days),→ <CLINICAL NOTES INSERTED HERE> 28 === STEP 1: VALUE STREAM MAPPING - PATIENT JOURNEY GANTT CHART === Create a Gantt chart that maps the key phases of the patient's journey from the index admission through readmission at the same hosp...

work page

[27] [29]

**Map the patient journey from index discharge to readmission**: Capture key events during the index admission that relate to discharge planning, the post-discharge period, and all unplanned hospital readmissions ,→ ,→

work page

[28] [30]

**Include the full timeline**: Cover the index admission discharge planning through the readmission, noting key transitions,→

work page

[29] [31]

(if any),→

**Identify potential gaps**: Note missed follow-up appointments, medication issues, inadequate discharge planning, premature discharge, all unplanned readmissions, etc. (if any),→

work page

[30] [32]

index_admission_summary

**Assign event timings**: Assign event timing. If exact timestamps aren't available, provide reasonable estimates.,→ Events to consider extracting: - **index_admission**: Index admission event - **ED/readmission**: Subsequent ED visits or readmissions - **treatment**: treatments given during index admission or readmission - **procedure**: procedures given...

work page 2024

[31] [33]

Consult notes, discharge summary, and discharge instructions from the INDEX admission (the initial hospitalization),→

work page

[32] [34]

Intervening outpatient notes

work page

[33] [35]

This is an AND statement -- we are looking for factors that are both modifiable AND causal

Notes from the READMISSION, including ED provider note, H&P, and Discharge Summary (when patient returned within 30 days),→ <CLINICAL NOTES INSERTED HERE> === STEP 1: VALUE STREAM MAPPING - PATIENT JOURNEY GANTT CHART === The following Gantt chart has already been created mapping the key phases of the patient's journey: <GANTT CHART JSON FROM STAGE 1 INSE...

work page

[34] [36]

List at most 5 factors

For each potential modifiable factor, identify likely causal chains: - What specific decision, action, or omission occurred during the index admission or post-discharge period?,→ - How did this directly lead to the clinical state that required readmission? - If this had been different, would the readmission likely have been prevented? Prioritize MODIFIABL...

work page

[35] [37]

Show how this factor led to readmission.,→

Explanation Support: Provide the causal reasoning, referencing specific events from the Gantt chart and exact details from the clinical notes. Show how this factor led to readmission.,→

work page

[36] [38]

Explanation Contrary: Provide explanations for why this factor may not have been a cause or contributor to the readmission outcome.,→

work page

[37] [39]

Relevant Quotes: Provide EXACT quotes (word-for-word) from the notes that support the causal chain

work page

[38] [40]

reasons": [ {

Process Improvement: Describe specific process changes that could have been implemented to address this factor and reduced readmission risk. Provide evidence-based recommendations. If the hospital is already following these evidence-based practices but doing so incompletely, emphasize the specific aspects that need improvement. ,→ ,→ ,→ 30 Example categor...

work page

[39] [41]

The discharge summary from the INDEX admission (the initial hospitalization)

work page

[40] [42]

The admission note from the READMISSION (when patient returned within 30 days)

work page

[41] [43]

It focuses on the transition of care, post-discharge period, and events leading to the readmission

The discharge summary from the READMISSION <CLINICAL NOTES INSERTED HERE> === STEP 1: VALUE STREAM MAPPING - PATIENT JOURNEY GANTT CHART === This is the Gantt chart that you and the team created, which maps the key phases of the patient's journey from the index admission through the readmission. It focuses on the transition of care, post-discharge period,...

work page

[42] [44]

Explanation Support: A detailed step-by-step reasoning for why this represents a contributing factor that is a modifiable gap and, if improved, would likely have prevented this readmission.,→ 32

work page

[43] [45]

Explanation Contrary: Explanations for why this factor may not have been preventable or may not have changed the outcome.,→

work page

[44] [46]

Relevant Quotes: For each identified contributing factor, quotes from the notes supporting the explanations.,→

work page

[45] [47]

This contributing factor is a modifiable gap that if improved would reduce readmission risk

Process Improvement: For each factor, specific process changes that could be implemented, which may ultimately prevent similar readmissions.,→ Here is the list of factors you and the team listed: <EXTRACTED FACTORS JSON FROM STAGE 2 INSERTED HERE> YOUR TASK: Assign a confidence probability (0-100) for the following statement: "This contributing factor is ...

work page