pith. sign in

arxiv: 2604.23059 · v1 · submitted 2026-04-24 · 💻 cs.CL

Implicit Framing in Obstetric Counseling Notes: A Grounded LLM Pipeline on a VBAC-Eligible Cohort

Pith reviewed 2026-05-08 11:30 UTC · model grok-4.3

classification 💻 cs.CL
keywords obstetric counselingVBACrepeat cesareanclinical framingLLM pipelinemedical notes analysisdelivery mode decisionshealthcare NLP
0
0 comments X

The pith

Physicians use more risk-focused language when documenting repeat cesarean sections than VBAC options in notes for patients eligible for both.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a controlled cohort of over two thousand obstetric patients for whom both vaginal birth after cesarean and repeat cesarean section are medically viable options. It does so by applying a large language model pipeline that pulls contraindications only from verbatim text in the records. The same model then labels segments of counseling language into framing categories such as risk emphasis. The resulting counts show that risk-focused segments appear in a markedly higher share of the repeat cesarean notes than in the VBAC notes, and statistical tests confirm the category distributions differ. A sympathetic reader would care because the way options are worded in the medical record can shape what patients hear and ultimately choose.

Core claim

In the set of 2,024 notes from patients eligible for both delivery modes, risk-focused language accounts for a substantially larger share of counseling segments in repeat cesarean documentation than in VBAC documentation, with category-level differences confirmed by statistical testing.

What carries the argument

A two-stage grounded LLM pipeline that first extracts contraindications verbatim to form the VBAC-eligible cohort and then applies zero-shot categorization of counseling segments into predefined framing categories.

If this is right

  • The observed framing difference occurs even after medical contraindications are controlled for, isolating a linguistic pattern in documentation.
  • Risk language may tilt patient perception toward one delivery mode over the other in otherwise equivalent clinical situations.
  • Zero-shot LLM categorization can be applied at scale to existing clinical notes without task-specific training data.
  • Statistical confirmation of distribution shifts provides a quantitative basis for comparing communication practices across patient groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar framing imbalances could appear in other high-stakes decisions such as cancer treatment or surgical consent.
  • Feeding framing summaries back into electronic records might prompt physicians to balance language in real time.
  • Linking note-level framing scores to downstream patient choice and outcome data would test whether the linguistic pattern affects actual decisions.
  • Extending the pipeline to spoken counseling transcripts could reveal whether the written pattern matches what patients hear.

Load-bearing premise

The LLM extraction pipeline correctly pulls every relevant contraindication from the text and the zero-shot framing labels accurately reflect physician intent without adding model-specific bias.

What would settle it

A blinded human re-annotation of a random sample of counseling segments that produces framing distributions whose category differences between VBAC and RCS notes fall below statistical significance.

Figures

Figures reproduced from arXiv: 2604.23059 by Barbara Di Eugenio, Baris Karacan, Joanna Tess, Patrick Thornton, Subhash Kumar Kolar.

Figure 1
Figure 1. Figure 1: Distribution of VBAC eligibility categories among view at source ↗
Figure 2
Figure 2. Figure 2: Prompt configurations for eligibility evidence extrac view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of output categories across models for the view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of output categories across models for the view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of framing category proportions between view at source ↗
read the original abstract

Clinical framing -- the linguistic manner in which clinical information is presented -- can influence patient understanding and decision-making, with important implications for healthcare outcomes. Obstetrics is a high-stakes domain in which physicians counsel patients on delivery mode choices such as vaginal birth after cesarean (VBAC) and repeat cesarean section (RCS), yet counseling language remains underexplored in large-scale clinical text analysis. In this work, we analyze physician counseling language in 2,024 obstetric history and physical narratives for a rigorously defined cohort of patients for whom both VBAC and RCS were clinically viable options. To control for confounding due to medical contraindications, we first construct a VBAC-eligible cohort using structured clinical data supplemented by a large language model (LLM)-based extraction pipeline constrained to grounded, verbatim evidence from free-text narratives. We then apply a zero-shot LLM framework to categorize counseling segments into predefined framing categories capturing how physicians linguistically present delivery options. Our analysis reveals a significant difference in counseling framing distributions between VBAC and RCS notes; risk-focused language accounts for a substantially larger share of counseling segments in RCS documentation than in VBAC, with category-level differences confirmed by statistical testing, highlighting the value of controlled LLM-based framing analysis in obstetric care.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to analyze implicit framing in obstetric counseling notes using a two-stage grounded LLM pipeline on a cohort of 2,024 VBAC-eligible patients. Structured data is supplemented by an LLM extraction step (constrained to verbatim spans) to exclude contraindications and define the cohort; zero-shot LLM prompting then categorizes counseling segments into framing categories. The central result is a statistically significant difference in framing distributions, with risk-focused language comprising a substantially larger share of segments in RCS notes than in VBAC notes.

Significance. If the pipeline proves unbiased and the framing differences reflect genuine physician language rather than LLM artifacts, the work would be significant for clinical NLP and obstetrics. It would provide empirical evidence that documentation framing varies systematically with delivery-mode recommendation, with potential implications for patient counseling, shared decision-making, and bias-mitigation training. The constrained, verbatim-grounded LLM approach also offers a reusable template for high-stakes clinical text analysis.

major comments (2)
  1. [Methods] Methods (LLM extraction and cohort construction): No precision, recall, Cohen's kappa, or error analysis is reported for the LLM-based contraindication extraction or the zero-shot framing categorization. Because the central claim rests on an unbiased VBAC-eligible cohort and on framing labels that accurately capture physician intent, the absence of any validation against clinician annotations leaves open the possibility that extraction errors or LLM biases are correlated with note type (RCS vs. VBAC).
  2. [Results] Results (statistical comparison): The abstract states that category-level differences are 'confirmed by statistical testing,' yet provides no description of the test(s), sample sizes per category, effect sizes, or multiple-comparison correction. Without these details it is impossible to assess whether the reported significance is robust or driven by a few high-frequency categories.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'rigorously defined cohort' is used without enumerating the exact structured-data criteria or the precise verbatim constraints applied to the LLM step.
  2. [Methods] Notation: The framing categories are described as 'predefined' but their exact definitions, number, and any overlap rules are not stated in the visible summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important opportunities to improve the transparency and rigor of our methods and results. We address each major comment below and have revised the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [Methods] Methods (LLM extraction and cohort construction): No precision, recall, Cohen's kappa, or error analysis is reported for the LLM-based contraindication extraction or the zero-shot framing categorization. Because the central claim rests on an unbiased VBAC-eligible cohort and on framing labels that accurately capture physician intent, the absence of any validation against clinician annotations leaves open the possibility that extraction errors or LLM biases are correlated with note type (RCS vs. VBAC).

    Authors: We agree that the absence of quantitative validation metrics represents a gap in the original submission. In the revised manuscript we will add a dedicated Validation subsection to Methods. Two obstetricians will independently review a stratified random sample of 200 notes (100 VBAC, 100 RCS) for both contraindication extraction and framing labels. We will report precision, recall, F1-score, and Cohen's kappa for each task, along with an error analysis stratified by note type to directly address the concern about correlated biases. These additions will allow readers to evaluate the reliability of the pipeline. revision: yes

  2. Referee: [Results] Results (statistical comparison): The abstract states that category-level differences are 'confirmed by statistical testing,' yet provides no description of the test(s), sample sizes per category, effect sizes, or multiple-comparison correction. Without these details it is impossible to assess whether the reported significance is robust or driven by a few high-frequency categories.

    Authors: We acknowledge that the statistical reporting was incomplete. The revised Results section will explicitly state that a chi-squared test of independence was applied to the 2-by-K contingency table of note type by framing category. We will report the total number of segments per note type, per-category counts, Cramer's V effect sizes, and Bonferroni-adjusted p-values for the six framing categories. The abstract will be updated to reference the chi-squared test and correction. These changes ensure the significance claims can be fully evaluated. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observational pipeline with external data grounding

full rationale

The paper describes an empirical workflow: structured data plus LLM extraction (constrained to verbatim spans) to build a VBAC-eligible cohort, followed by zero-shot LLM framing categorization and statistical comparison of distributions. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear. The central claim (framing differences) is produced by applying external models to independent clinical notes rather than reducing to its own inputs by construction. Minor risk of LLM bias exists but is a validity concern, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on unvalidated assumptions about LLM performance in clinical text rather than on external benchmarks or formal proofs. No free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Zero-shot LLM classification can reliably assign counseling segments to predefined framing categories without examples or fine-tuning.
    Invoked in the second stage of the pipeline for categorizing physician language.
  • domain assumption LLM extraction from free-text narratives, when restricted to verbatim quotes, accurately captures all medical contraindications relevant to VBAC eligibility.
    Central to the first stage that defines the controlled cohort.

pith-pipeline@v0.9.0 · 5534 in / 1495 out tokens · 67970 ms · 2026-05-08T11:30:34.802469+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

  1. [1]

    Evaluating shared decision making in trial of labor after cesarean counseling using objective structured clinical examinations,

    B. Tucker Edmonds, S. M. Hoffman, T. Laitano, F. McKenzie, J. Panoch, A. Litwiller, and M. J. Di Corcia, “Evaluating shared decision making in trial of labor after cesarean counseling using objective structured clinical examinations,” MedEdPORTAL, vol. 16, p. 10891, 2020

  2. [2]

    All frames are not created equal: A typology and critical analysis of framing effects,

    I. P. Levin, S. L. Schneider, and G. J. Gaeth, “All frames are not created equal: A typology and critical analysis of framing effects,” Organizational Behavior and Human Decision Processes, vol. 76, no. 2, pp. 149–188, 1998

  3. [3]

    Influence of framing on medical decision making,

    J. Gong, Y . Zhang, J. Feng, Y . Huang, Y . Wei, and W. Zhang, “Influence of framing on medical decision making,” EXCLI J., vol. 12, p. 20, 2013

  4. [4]

    Framing effect,

    D. Pilat and K. Sekoul, “Framing effect,” The Decision Lab, 2021. [On- line]. Available: https://thedecisionlab.com/biases/framing-effect. Ac- cessed: Nov. 26, 2025

  5. [5]

    Implicit bias in healthcare professionals: a systematic review,

    C. FitzGerald and S. Hurst, “Implicit bias in healthcare professionals: a systematic review,” BMC Medical Ethics, vol. 18, no. 1, p. 19, 2017

  6. [6]

    Recommendations for intrauterine contraception: a randomized trial of the effects of patients’ race/ethnicity and socioeconomic status,

    C. Dehlendorf, R. Ruskin, K. Grumbach, E. Vittinghoff, K. Bibbins- Domingo, D. Schillinger, and J. Steinauer, “Recommendations for intrauterine contraception: a randomized trial of the effects of patients’ race/ethnicity and socioeconomic status,” Am. J. Obstet. Gynecol., vol. 203, no. 4, pp. 319–e1, 2010

  7. [7]

    Patient counseling and preferences for elective repeat cesarean delivery,

    S. Folsom, M. S. Esplin, S. Edmunds, T. D. Metz, G. M. Jackson, T. F. Porter, and M. W. Varner, “Patient counseling and preferences for elective repeat cesarean delivery,” Am. J. Perinatol. Rep., vol. 6, no. 2, pp. e226–e231, 2016

  8. [8]

    Providers’ per- spective on vaginal birth after cesarean birth: a qualitative systematic review,

    A. Kanjanakaew, A. Jiramanee, and M. Srimoragot, “Providers’ per- spective on vaginal birth after cesarean birth: a qualitative systematic review,” BMC Pregnancy and Childbirth, vol. 24, no. 1, p. 723, 2024

  9. [9]

    Clinical documen- tation in the 21st century: executive summary of a policy position paper from the American College of Physicians,

    T. Kuhn, P. Basch, M. Barr, T. Yackel, and the Medical Informatics Committee of the American College of Physicians, “Clinical documen- tation in the 21st century: executive summary of a policy position paper from the American College of Physicians,” Ann. Intern. Med., vol. 162, no. 4, pp. 301–303, 2015

  10. [10]

    A narrative review on the validity of electronic health record-based research in epidemiology,

    M. A. Gianfrancesco and N. D. Goldstein, “A narrative review on the validity of electronic health record-based research in epidemiology,” BMC Medical Research Methodology, vol. 21, no. 1, p. 234, 2021

  11. [11]

    The impact of electronic health record systems on clinical documentation times: a systematic review,

    L. A. Baumann, J. Baker, and A. G. Elshaug, “The impact of electronic health record systems on clinical documentation times: a systematic review,” Health Policy, vol. 122, no. 8, pp. 827–836, 2018

  12. [12]

    Challenges in and opportunities for electronic health record-based data analysis and interpretation,

    M. K. Kim, C. Rouphael, J. McMichael, N. Welch, and S. Dasarathy, “Challenges in and opportunities for electronic health record-based data analysis and interpretation,” Gut Liver, vol. 18, no. 2, p. 201, 2023

  13. [13]

    Challenges and opportunities beyond structured data in analysis of electronic health records,

    M. Tayefi, P. Ngo, T. Chomutare, H. Dalianis, E. Salvi, A. Budrionis, and F. Godtliebsen, “Challenges and opportunities beyond structured data in analysis of electronic health records,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 13, no. 6, p. e1549, 2021

  14. [14]

    Natural language processing-based structured data extrac- tion from unstructured clinical notes,

    L. Gautam, “Natural language processing-based structured data extrac- tion from unstructured clinical notes,” Notes, vol. 6, p. 9, 2024

  15. [15]

    Clinical errors from acronym use in electronic health record: a review of NLP-based disambiguation techniques,

    T. I. Amosa, L. I. B. Izhar, P. Sebastian, I. B. Ismail, O. Ibrahim, and S. L. Ayinla, “Clinical errors from acronym use in electronic health record: a review of NLP-based disambiguation techniques,” IEEE Access, vol. 11, pp. 59297–59316, 2023

  16. [16]

    Podder, V

    V . Podder, V . Lew, and S. Ghassemzadeh,SOAP notes. Treasure Island, FL, USA: StatPearls Publishing, 2025. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK482263/

  17. [17]

    The essential SOAP note in an EHR age,

    P. F. Pearce, L. A. Ferguson, G. S. George, and C. A. Langford, “The essential SOAP note in an EHR age,” Nurse Pract., vol. 41, no. 2, pp. 29–36, 2016

  18. [18]

    Development and validation of A- SOAP notes: assessment of efficiency in documenting patient therapeutic records,

    P. Sudarsan, A. G. M. Balakrishna, J. A. R. Asir, D. Balu, S. G. Krishnamoorthy, and S. S. Borra, “Development and validation of A- SOAP notes: assessment of efficiency in documenting patient therapeutic records,” J. Appl. Pharm. Sci., vol. 11, no. 10, pp. 001–006, 2021

  19. [19]

    Recognition and evaluation of clinical section headings in clinical documents using token- based formulation with conditional random fields,

    H.-J. Dai, S.-A. Shabbir, C.-W. Chen, and C.-C. Wu, “Recognition and evaluation of clinical section headings in clinical documents using token- based formulation with conditional random fields,” BioMed Research International, vol. 2015, p. 873012, 2015

  20. [20]

    A general natural-language text processor for clinical radiology,

    C. Friedman, P. O. Alderson, J. H. M. Austin, J. J. Cimino, and S. B. Johnson, “A general natural-language text processor for clinical radiology,” J. Am. Med. Inform. Assoc., vol. 1, no. 2, pp. 161–174, 1994

  21. [21]

    An overview of MetaMap: historical perspective and recent advances,

    A. R. Aronson and F.-M. Lang, “An overview of MetaMap: historical perspective and recent advances,” J. Am. Med. Inform. Assoc., vol. 17, no. 3, pp. 229–236, 2010

  22. [22]

    Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications,

    G. K. Savova, J. J. Masanz, P. V . Ogren, J. Zheng, S. Sohn, K. C. Kipper- Schuler, and C. G. Chute, “Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications,” J. Am. Med. Inform. Assoc., vol. 17, no. 5, pp. 507–513, 2010

  23. [23]

    The unified medical language system (UMLS): in- tegrating biomedical terminology,

    O. Bodenreider, “The unified medical language system (UMLS): in- tegrating biomedical terminology,” Nucleic Acids Res., vol. 32, no. suppl 1, pp. D267–D270, 2004

  24. [24]

    Large language models in healthcare and medical applications: a review,

    S. Maity and M. J. Saikia, “Large language models in healthcare and medical applications: a review,” Bioengineering, vol. 12, no. 6, p. 631, 2025

  25. [25]

    Generalizable clinical note section identifi- cation with large language models,

    W. Zhou and T. A. Miller, “Generalizable clinical note section identifi- cation with large language models,” JAMIA Open, vol. 7, no. 3, 2024

  26. [26]

    LLMs in biomedicine: a study on clinical named entity recognition,

    M. Monajatipoor, J. Yang, J. Stremmel, M. Emami, F. Mohaghegh, M. Rouhsedaghat, and K.-W. Chang, “LLMs in biomedicine: a study on clinical named entity recognition,” arXiv:2404.07376, 2024 (only available as preprint)

  27. [27]

    Hallucinations in LLMs: understanding and addressing challenges,

    G. Perkovi ´c, A. Drobnjak, and I. Boti ˇcki, “Hallucinations in LLMs: understanding and addressing challenges,” in Proc. 47th Int. Convention on Information, Communication and Electronic Technology (MIPRO), 2024, pp. 2084–2088

  28. [28]

    Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support,

    P. A. Harris, R. Taylor, R. Thielke, J. Payne, N. Gonzalez, and J. G. Conde, “Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support,” J. Biomed. Inform., vol. 42, no. 2, pp. 377–381, 2009

  29. [29]

    The REDCap consortium: building an international community of software platform partners,

    P. A. Harriset al., “The REDCap consortium: building an international community of software platform partners,” J. Biomed. Inform., vol. 95, p. 103208, 2019

  30. [30]

    Spark NLP: natural language understanding at scale,

    V . Kocaman and D. Talby, “Spark NLP: natural language understanding at scale,” Software Impacts, vol. 8, p. 100058, 2021

  31. [31]

    Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus,

    A. Stubbs and ¨O. Uzuner, “Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus,” J. Biomed. Inform., vol. 58, pp. S20–S29, 2015

  32. [32]

    Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics

    B. Karacan, B. Di Eugenio, and P. Thornton, “Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics,” arXiv:2602.17513, 2026 (to appear in LREC 2026)

  33. [33]

    Sung and H

    S. Sung and H. Mahdy,Cesarean section. Treasure Island, FL, USA: StatPearls Publishing, 2023. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/

  34. [34]

    ACOG practice bulletin no. 205: vaginal birth after cesarean delivery,

    American College of Obstetricians and Gynecologists, “ACOG practice bulletin no. 205: vaginal birth after cesarean delivery,” Obstet. Gynecol., vol. 133, no. 2, pp. e110–e127, Feb. 2019

  35. [35]

    Risk of uterine rupture associated with an interdelivery interval between 18 and 24 months,

    E. Bujold and R. J. Gauthier, “Risk of uterine rupture associated with an interdelivery interval between 18 and 24 months,” Obstet. Gynecol., vol. 115, no. 5, pp. 1003–1006, 2010

  36. [36]

    Maternal and neonatal outcomes for women giving birth after previous cesarean,

    A. Charitou, D. Charos, I. Vamenou, and V . G. Vivilaki, “Maternal and neonatal outcomes for women giving birth after previous cesarean,” Eur. J. Midwifery, vol. 3, p. 8, 2019

  37. [37]

    Effects of prompt length on domain- specific tasks for large language models,

    Q. Liu, W. Wang, and J. Willard, “Effects of prompt length on domain- specific tasks for large language models,” arXiv:2502.14255, 2025 (only available as preprint)

  38. [38]

    An empirical study on prompt compression for large language models,

    Z. Zhang, J. Li, Y . Lan, X. Wang, and H. Wang, “An empirical study on prompt compression for large language models,” arXiv:2505.00019, 2025 (only available as preprint)

  39. [39]

    Getting to the heart of the matter in later life: the central role of affect in health message framing,

    J. A. Mikels, N. A. Young, X. Liu, and E. A. L. Stine-Morrow, “Getting to the heart of the matter in later life: the central role of affect in health message framing,” Gerontologist, vol. 61, no. 5, pp. 756–762, 2021

  40. [40]

    The effect of message framing and the presentation of health vs. social consequences on health risk perception,

    F. Unger and M. Steul-Fischer, “The effect of message framing and the presentation of health vs. social consequences on health risk perception,” Z. Gesamte Versicherungswiss., vol. 109, no. 5, pp. 399–411, 2020

  41. [41]

    Health message framing effects on attitudes, intentions, and behavior: a meta-analytic review,

    K. M. Gallagher and J. A. Updegraff, “Health message framing effects on attitudes, intentions, and behavior: a meta-analytic review,” Ann. Behav. Med., vol. 43, no. 1, pp. 101–116, 2012

  42. [42]

    Four models of the physician–patient relationship,

    E. J. Emanuel and L. L. Emanuel, “Four models of the physician–patient relationship,” JAMA, vol. 267, no. 16, pp. 2221–2226, 1992

  43. [43]

    Decision-making in the physician–patient encounter: revisiting the shared treatment decision- making model,

    C. Charles, A. Gafni, and T. Whelan, “Decision-making in the physician–patient encounter: revisiting the shared treatment decision- making model,” Soc. Sci. Med., vol. 49, no. 5, pp. 651–661, 1999

  44. [44]

    Shared decision making: a model for clinical practice,

    G. Elwynet al., “Shared decision making: a model for clinical practice,” J. Gen. Intern. Med., vol. 27, no. 10, pp. 1361–1367, 2012

  45. [45]

    Balancing the presentation of information and options in patient decision aids: an updated review,

    P. Abhyankaret al., “Balancing the presentation of information and options in patient decision aids: an updated review,” BMC Med. Inform. Decis. Mak., vol. 13, suppl. 2, p. S6, 2013

  46. [46]

    Presenting quantitative information about decision outcomes: a risk communication primer for patient decision aid devel- opers,

    L. J. Trevenaet al., “Presenting quantitative information about decision outcomes: a risk communication primer for patient decision aid devel- opers,” BMC Med. Inform. Decis. Mak., vol. 13, suppl. 2, p. S7, 2013

  47. [47]

    The impact of health professionals’ language on patient experience: a case study,

    N. T. Katz, J. Jones, L. Mansfield, and M. Gold, “The impact of health professionals’ language on patient experience: a case study,” J. Patient Exp., vol. 9, p. 23743735221092572, 2022

  48. [48]

    Role prompting guided domain adaptation with general capability preserve for large language models,

    R. Wanget al., “Role prompting guided domain adaptation with general capability preserve for large language models,” inProc. Findings Assoc. Comput. Linguistics (NAACL), Mexico City, Mexico, Jun. 2024, pp. 2243–2255

  49. [49]

    LEAP: LLM instruction-example adaptive prompting framework for biomedical rela- tion extraction,

    H. Zhou, M. Li, Y . Xiao, H. Yang, and R. Zhang, “LEAP: LLM instruction-example adaptive prompting framework for biomedical rela- tion extraction,” J. Am. Med. Inform. Assoc., vol. 31, no. 9, pp. 2010– 2018, 2024

  50. [50]

    Shorten, C

    C. Shortenet al., “StructuredRAG: JSON response formatting with large language models,” arXiv:2408.11061, 2024 (only available as preprint)

  51. [51]

    The analysis of residuals in cross-classified tables,

    S. J. Haberman, “The analysis of residuals in cross-classified tables,” Biometrics, vol. 29, no. 1, pp. 205–220, 1973

  52. [52]

    Agresti,Categorical Data Analysis, 3rd ed

    A. Agresti,Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley, 2012