A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models
Pith reviewed 2026-05-10 19:05 UTC · model grok-4.3
The pith
A multi-stage validation process allows large language models to extract substance use disorder diagnoses reliably from nearly a million clinical notes without exhaustive manual labeling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The multi-stage validation framework integrates prompt calibration, rule-based plausibility filtering, semantic grounding assessment, targeted confirmatory evaluation using an independent higher-capacity judge LLM, selective expert review, and external predictive validity analysis to quantify uncertainty and characterize error modes without exhaustive manual annotation, as demonstrated by substantial agreement with experts and superior predictive performance.
What carries the argument
The multi-stage validation framework that combines automated filters with selective review by a higher-capacity judge model and limited experts to assess LLM outputs under weak supervision.
If this is right
- Rule-based filtering and semantic grounding can remove roughly 15 percent of unsupported or implausible LLM extractions.
- Judge LLM assessments can serve as scalable references that agree substantially with expert review.
- LLM-extracted diagnoses can predict subsequent clinical engagement more accurately than structured data alone.
- Population-scale clinical extraction becomes feasible without annotation-intensive reference standards.
Where Pith is reading between the lines
- The same staged approach could be adapted to extract other clinical entities such as medications or procedures from notes.
- Lower annotation costs might allow smaller health systems to adopt LLM tools for record review.
- Combining LLMs with predictive validity checks against outcomes offers one route to building trust in deployed models.
Load-bearing premise
The higher-capacity judge LLM supplies reliable confirmatory labels for uncertain cases and the rule-based plus semantic filters capture most errors without introducing new biases.
What would settle it
Substantial disagreement between the judge LLM and independent expert review on a new set of high-uncertainty extractions would undermine the claim of trustworthy validation.
Figures
read the original abstract
Large language models (LLMs) show promise for extracting clinically meaningful information from unstructured health records, yet their translation into real-world settings is constrained by the lack of scalable and trustworthy validation approaches. Conventional evaluation methods rely heavily on annotation-intensive reference standards or incomplete structured data, limiting feasibility at population scale. We propose a multi-stage validation framework for LLM-based clinical information extraction that enables rigorous assessment under weak supervision. The framework integrates prompt calibration, rule-based plausibility filtering, semantic grounding assessment, targeted confirmatory evaluation using an independent higher-capacity judge LLM, selective expert review, and external predictive validity analysis to quantify uncertainty and characterize error modes without exhaustive manual annotation. We applied this framework to extraction of substance use disorder (SUD) diagnoses across 11 substance categories from 919,783 clinical notes. Rule-based filtering and semantic grounding removed 14.59% of LLM-positive extractions that were unsupported, irrelevant, or structurally implausible. For high-uncertainty cases, the judge LLM's assessments showed substantial agreement with subject matter expert review (Gwet's AC1=0.80). Using judge-evaluated outputs as references, the primary LLM achieved an F1 score of 0.80 under relaxed matching criteria. LLM-extracted SUD diagnoses also predicted subsequent engagement in SUD specialty care more accurately than structured-data baselines (AUC=0.80). These findings demonstrate that scalable, trustworthy deployment of LLM-based clinical information extraction is feasible without annotation-intensive evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a multi-stage validation framework for trustworthy large-scale clinical information extraction using LLMs. The framework combines prompt calibration, rule-based plausibility filtering, semantic grounding, confirmatory evaluation by a higher-capacity judge LLM, selective expert review, and external predictive validity analysis. It is demonstrated on extracting SUD diagnoses from 919,783 clinical notes across 11 substance categories, reporting that 14.59% of LLM-positive extractions were filtered, substantial agreement (Gwet's AC1 = 0.80) with experts on high-uncertainty cases, F1 = 0.80 for the primary LLM, and superior predictive validity (AUC = 0.80) for care engagement compared to structured baselines. The authors conclude that scalable, trustworthy LLM-based clinical IE is feasible without annotation-intensive evaluation.
Significance. If the reported metrics are robust, this work is significant for enabling population-scale clinical NLP applications by reducing the need for exhaustive manual annotations. The application to a very large corpus (nearly 1 million notes) and the inclusion of external validation against real-world care engagement records provide concrete evidence of feasibility and utility. Strengths include the integration of multiple validation stages and the focus on error mode characterization.
major comments (3)
- [Abstract] The claim of 'trustworthy' extraction depends on the judge LLM's reliability for high-uncertainty cases, yet only selective expert review is reported (Gwet AC1=0.80); there is no validation reported for cases where the primary and judge LLMs agree, which constitutes the bulk of outputs and leaves open the possibility of shared biases.
- [Methods (framework)] Exact thresholds for rule-based plausibility filtering and semantic grounding assessment are not detailed, nor are the criteria for identifying 'high-uncertainty cases' for judge LLM review; these are load-bearing for evaluating whether the 14.59% filtering introduces selection bias or misses error modes.
- [Results (predictive validity)] While AUC=0.80 for predicting SUD specialty care engagement is promising, this external validity does not directly confirm the correctness of the extracted diagnoses, as the association could arise from correlated but inaccurate signals; additional analyses to rule out this are needed to support the trustworthiness claim.
minor comments (2)
- [Abstract] The 'relaxed matching criteria' for the F1 score of 0.80 should be explicitly defined in the methods section for reproducibility.
- Consider adding a table summarizing the multi-stage framework components and their roles to improve clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which identifies key areas to strengthen the claims around trustworthiness in our multi-stage LLM validation framework. We respond point-by-point to the major comments below, committing to revisions that enhance transparency and address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] The claim of 'trustworthy' extraction depends on the judge LLM's reliability for high-uncertainty cases, yet only selective expert review is reported (Gwet AC1=0.80); there is no validation reported for cases where the primary and judge LLMs agree, which constitutes the bulk of outputs and leaves open the possibility of shared biases.
Authors: We acknowledge this as a valid limitation: while the judge LLM provides confirmatory evaluation on high-uncertainty cases and expert review on a selective subset yields substantial agreement, the bulk of outputs (where primary and judge LLMs concur) lack direct expert validation, leaving room for shared biases. To address this, the revised manuscript will include a post-hoc expert review on a random sample of agreed cases, with results reported in the Results section to quantify agreement and characterize potential biases. revision: yes
-
Referee: [Methods (framework)] Exact thresholds for rule-based plausibility filtering and semantic grounding assessment are not detailed, nor are the criteria for identifying 'high-uncertainty cases' for judge LLM review; these are load-bearing for evaluating whether the 14.59% filtering introduces selection bias or misses error modes.
Authors: We agree that the absence of exact thresholds and criteria in the Methods section hinders reproducibility and evaluation of selection bias from the 14.59% filtering. In the revised manuscript, we will expand the Methods to detail the specific thresholds for rule-based plausibility filtering and semantic grounding assessment, along with the precise criteria (e.g., confidence scores or disagreement flags) used to identify high-uncertainty cases for judge LLM review. We will also add a sensitivity analysis on these parameters. revision: yes
-
Referee: [Results (predictive validity)] While AUC=0.80 for predicting SUD specialty care engagement is promising, this external validity does not directly confirm the correctness of the extracted diagnoses, as the association could arise from correlated but inaccurate signals; additional analyses to rule out this are needed to support the trustworthiness claim.
Authors: This is a substantive point: the predictive validity analysis demonstrates utility for downstream tasks but remains indirect and could reflect correlated signals rather than diagnostic accuracy. In the revised manuscript, we will add analyses comparing LLM-extracted diagnoses to overlapping structured ICD codes and explicitly discuss this limitation in the Discussion, while maintaining that the multi-stage internal validations combined with external utility provide supportive evidence for scalable trustworthiness. revision: partial
Circularity Check
No significant circularity detected in the multi-stage validation framework
full rationale
The paper proposes and applies a multi-stage framework consisting of prompt calibration, rule-based plausibility filtering, semantic grounding assessment, confirmatory evaluation by an independent higher-capacity judge LLM on high-uncertainty cases, selective expert review (with Gwet's AC1=0.80 agreement), and external predictive validity against care engagement records (AUC=0.80). The primary LLM's F1=0.80 is computed using judge outputs as references, but this is an explicit component of the framework and is supported by the reported expert agreement on the judge assessments rather than reducing to a self-referential fit or definition. No equations, self-citations, or ansatzes are invoked in a load-bearing way that collapses the central claim to its inputs by construction. The derivation remains self-contained against the described external benchmarks and selective human validation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Higher-capacity judge LLMs can serve as reliable proxies for expert review on uncertain cases
- domain assumption Rule-based plausibility filters and semantic grounding capture the majority of LLM error modes
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The framework integrates prompt calibration, rule-based plausibility filtering, semantic grounding assessment, targeted confirmatory evaluation using an independent higher-capacity judge LLM, selective expert review, and external predictive validity analysis
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Rule-based filtering and semantic grounding removed 14.59% of LLM-positive extractions... Gwet's AC1=0.80... AUC=0.80
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2205.12689 , year=
M. Agrawal, S. Hegselmann, H. Hunter, and D. Sontag, “Large language models are few-shot clinical information extractors,”arXiv preprint arXiv:2205.12689, 2022
-
[2]
Arlington, VA: American Psychiatric Publishing, 2013
American Psychiatric Association,Diagnostic and statistical manual of mental disorders (DSM-5®). Arlington, VA: American Psychiatric Publishing, 2013
work page 2013
-
[3]
S. Abuse and M. H. S. Administration, “Key substance use and mental health indicators in the united states: Results from the 2024 national survey on drug use and health (hhs publication no. pep25-07-007, nsduh series h-60),”Center for Behavioral Health Statistics and Quality, Substance Abuse and Mental Health Services Administration, 2025. [Online]. Avail...
work page 2024
-
[4]
L. Degenhardt, F. Charlson, A. Ferrari, et al., “The global burden of disease attributable to alcohol and drug use in 195 countries and territories, 1990–2016: A systematic analysis for the global burden of disease study 2016,”The Lancet Psychiatry, vol. 5, no. 12, pp. 987–1012, 2018
work page 1990
-
[5]
Clinical implications of using administrative data to identify substance use disorders,
R. H. Perlis, D. V. Iosifescu, V. Castro, et al., “Clinical implications of using administrative data to identify substance use disorders,”Psychiatric Services, vol. 63, no. 8, pp. 837–837, 2012
work page 2012
-
[6]
A. Dubey, A. Jauhri, A. Pandey, et al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
J. Achiam, S. Adler, S. Agarwal, et al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [8]
-
[9]
Large language models encode clinical knowledge,
K. Singhal, S. Azizi, T. Tu, et al., “Large language models encode clinical knowledge,”Nature, vol. 620, no. 7972, pp. 172–180, 2023
work page 2023
-
[10]
Large language models in medicine,
A. J. Thirunavukarasu, D. S. W. Ting, K. Elangovan, et al., “Large language models in medicine,” Nature Medicine, vol. 29, no. 8, pp. 1930–1940, 2023
work page 1930
-
[11]
Automated extraction of substance use information from clinical texts,
Y. Wang et al., “Automated extraction of substance use information from clinical texts,” inAMIA Annual Symposium Proceedings, vol. 2015, 2015, p. 2121
work page 2015
-
[12]
Decoding substance use disorder severity from clinical notes using a large language model,
M. Mahbub et al., “Decoding substance use disorder severity from clinical notes using a large language model,”npj Mental Health Research, vol. 4, no. 1, p. 5, 2025
work page 2025
-
[13]
B. Wang, D. Kabir, C. R. Clark, K. W. Choi, and J. W. Smoller, “Extracting social determinants of health from electronic health records: Development and comparison of rule-based and large language models-based methods,”medRxiv, pp. 2025–11, 2025.doi:10.1101/2025.11.15.25339520
-
[14]
Llms accelerate annotation for medical information extraction,
A. Goel et al., “Llms accelerate annotation for medical information extraction,” inmachine learning for health (ML4H), PMLR, 2023, pp. 82–100
work page 2023
-
[15]
Clinical text annotation–what factors are associated with the cost of time?
Q. Wei, A. Franklin, T. Cohen, and H. Xu, “Clinical text annotation–what factors are associated with the cost of time?” InAMIA Annual Symposium Proceedings, vol. 2018, 2018, p. 1552
work page 2018
-
[16]
Geneva: World Health Organization, 1992, vol
W.H.Organization,The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines. Geneva: World Health Organization, 1992, vol. 1
work page 1992
-
[17]
M. Peng et al., “Coding reliability and agreement of international classification of disease, 10th revision (icd-10) codes in emergency department data,”International journal of population data science, vol. 3, no. 1, p. 445, 2018
work page 2018
-
[18]
Coding rules for uncertain and “ruled out
O. O. Atolagbe, P. S. Romano, D. A. Southern, W. Wongtanasarasin, and W. A. Ghali, “Coding rules for uncertain and “ruled out” diagnoses in icd-10 and icd-11,”BMC Medical Informatics and Decision Making, vol. 21, no. Suppl 6, p. 386, 2021
work page 2021
-
[19]
J. F. Scherrer, M. D. Sullivan, M. R. LaRochelle, and R. Grucza, “Validating opioid use disorder diagnoses in administrative data: A commentary on existing evidence and future directions,”Addiction Science & Clinical Practice, vol. 18, no. 1, p. 49, 2023
work page 2023
-
[20]
Diagnosis and coding of opioid misuse: A systematic scoping review and implementation framework,
R. W. Hurley, K. T. Bland, M. D. Chaskes, E. L. Hill, and M. C. Adams, “Diagnosis and coding of opioid misuse: A systematic scoping review and implementation framework,”Pain Medicine, pnaf019, 2025
work page 2025
-
[21]
A large language model for electronic health records,
X. Yang et al., “A large language model for electronic health records,”NPJ digital medicine, vol. 5, no. 1, p. 194, 2022
work page 2022
-
[22]
L. Builtjes, J. Bosma, M. Prokop, B. van Ginneken, and A. Hering, “Leveraging open-source large language models for clinical information extraction in resource-constrained settings,”JAMIA open, vol. 8, no. 5, ooaf109, 2025
work page 2025
-
[23]
E. Croxford et al., “Automating evaluation of ai text generation in healthcare with a large language model (llm)-as-a-judge,”medRxiv, pp. 2025–04, 2025
work page 2025
-
[24]
Llm-as-a-judge for software engineering: Literature review, vision, and the road ahead,
J. He et al., “Llm-as-a-judge for software engineering: Literature review, vision, and the road ahead,” ACM Transactions on Software Engineering and Methodology, 2025
work page 2025
-
[25]
E. Asgari et al., “A framework to assess clinical safety and hallucination rates of llms for medical text summarisation,”npj Digital Medicine, vol. 8, no. 1, p. 274, 2025
work page 2025
-
[26]
D. Anh-Hoang, V. Tran, and L.-M. Nguyen, “Survey and analysis of hallucinations in large language models: Attribution to prompting strategies or model behavior,”Frontiers in Artificial Intelligence, vol. 8, p. 1622292, 2025
work page 2025
-
[27]
S. Pandit et al., “Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models,”arXiv preprint arXiv:2502.14302, 2025
-
[28]
Available: https://arxiv.org/abs/2503.05777
Y. Kim et al., “Medical hallucinations in foundation models and their impact on healthcare,”arXiv preprint arXiv:2503.05777, 2025. 17
-
[29]
C. Garcia-Fernandez et al., “Trustworthy ai for medicine: Continuous hallucination detection and elimination with check,”arXiv preprint arXiv:2506.11129, 2025
-
[30]
Faithfulness hallucination detection in healthcare ai,
P. R. Vishwanath et al., “Faithfulness hallucination detection in healthcare ai,” inArtificial Intelligence and Data Science for Healthcare: Bridging Data-Centric AI and People-Centric Healthcare, 2024
work page 2024
-
[31]
S.Sivarajkumar,M.Kelley,A.Samolyk-Mazzanti,S.Visweswaran,andY.Wang,“Anempiricalevalua- tion of prompting strategies for large language models in zero-shot clinical natural language processing: Algorithm development and validation study,”JMIR Medical Informatics, vol. 12, e55318, 2024
work page 2024
-
[32]
Prompt engineering paradigms for medical applications: Scoping review,
J. Zaghir, M. Naguib, M. Bjelogrlic, A. Névéol, X. Tannier, and C. Lovis, “Prompt engineering paradigms for medical applications: Scoping review,”Journal of Medical Internet Research, vol. 26, e60501, 2024
work page 2024
-
[33]
Y. Zhang and W. Liao, “Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering,”Journal of Biomedical Informatics, p. 104893, 2025
work page 2025
-
[34]
Evaluation and mitigation of the limitations of large language models in clinical decision-making,
P. Hager et al., “Evaluation and mitigation of the limitations of large language models in clinical decision-making,”Nature medicine, vol. 30, no. 9, pp. 2613–2622, 2024
work page 2024
-
[35]
Prompt engineering in clinical practice: Tutorial for clinicians,
J. Liu, F. Liu, C. Wang, and S. Liu, “Prompt engineering in clinical practice: Tutorial for clinicians,” Journal of Medical Internet Research, vol. 27, e72644, 2025
work page 2025
-
[36]
Streamlining evidence based clinical recommendations with large language models,
D. Li et al., “Streamlining evidence based clinical recommendations with large language models,”npj Digital Medicine, 2025
work page 2025
-
[37]
R. Srivastava, L. Bhat, S. Prasad, S. Deshpande, B. Das, and K. Jadhav, “Medpromptextract (med- ical data extraction tool): Anonymization and high-fidelity automated data extraction using natural language processing and prompt engineering,”The Journal of Applied Laboratory Medicine, vol. 10, no. 4, pp. 793–805, 2025
work page 2025
-
[38]
Construct validity in psychological tests.,
L. J. Cronbach and P. E. Meehl, “Construct validity in psychological tests.,”Psychological bulletin, vol. 52, no. 4, p. 281, 1955
work page 1955
-
[39]
The phq-9: Validity of a brief depression severity measure,
K. Kroenke, R. L. Spitzer, and J. B. Williams, “The phq-9: Validity of a brief depression severity measure,”Journal of general internal medicine, vol. 16, no. 9, pp. 606–613, 2001
work page 2001
-
[40]
Between-visit changes in suicidal ideation and risk of subsequent suicide attempt,
G. E. Simon et al., “Between-visit changes in suicidal ideation and risk of subsequent suicide attempt,” Depression and anxiety, vol. 34, no. 9, pp. 794–800, 2017
work page 2017
-
[41]
Predicting dsm-iv dependence diagnoses from addiction severity index composite scores,
S. H. Rikoon, J. S. Cacciola, D. Carise, A. I. Alterman, and A. T. McLellan, “Predicting dsm-iv dependence diagnoses from addiction severity index composite scores,”Journal of substance abuse treatment, vol. 31, no. 1, pp. 17–24, 2006
work page 2006
-
[42]
Predictive capacity of the audit questionnaire for alcohol-related harm,
K. M. Conigrave, J. B. Saunders, and R. B. Reznik, “Predictive capacity of the audit questionnaire for alcohol-related harm,”Addiction, vol. 90, no. 11, pp. 1479–1485, 1995
work page 1995
-
[43]
The audit questionnaire: Choosing a cut-off score,
K. M. Conigrave, W. D. Hall, and J. B. Saunders, “The audit questionnaire: Choosing a cut-off score,” Addiction, vol. 90, no. 10, pp. 1349–1356, 1995
work page 1995
-
[44]
Dsm-5 criteria for substance use disorders: Recommendations and rationale,
D. S. Hasin et al., “Dsm-5 criteria for substance use disorders: Recommendations and rationale,” American Journal of Psychiatry, vol. 170, no. 8, pp. 834–851, 2013
work page 2013
-
[45]
M. Afshar et al., “Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (smart-ai): A retrospective deep learning study,”The Lancet Digital Health, vol. 4, no. 6, e426–e435, 2022
work page 2022
-
[46]
M. Afshar et al., “Deployment of real-time natural language processing and deep learning clinical decision support in the electronic health record: Pipeline implementation for an opioid misuse screener in hospitalized adults,”JMIR Medical Informatics, vol. 11, e44977, 2023
work page 2023
-
[47]
Y. Ni, A. Bachtel, K. Nause, and S. Beal, “Automated detection of substance use information from electronic health records for a pediatric population,”Journal of the American Medical Informatics Association, vol. 28, no. 10, pp. 2116–2127, 2021. 18
work page 2021
-
[48]
F. Nateghi Haredasht et al., “Predicting treatment retention in medication for opioid use disorder: A machine learning approach using nlp and llm-derived clinical features,”Journal of the American Medical Informatics Association, vol. 32, no. 12, pp. 1865–1876, 2025
work page 2025
-
[49]
Large language model applications for health information extraction in oncology: Scoping review,
D. Chen, S. A. Alnassar, K. E. Avison, R. S. Huang, and S. Raman, “Large language model applications for health information extraction in oncology: Scoping review,”JMIR cancer, vol. 11, e65984, 2025
work page 2025
-
[50]
D. Reichenpfader, H. Müller, and K. Denecke, “A scoping review of large language model based ap- proaches for information extraction from radiology reports,”npj Digital Medicine, vol. 7, no. 1, p. 222, 2024
work page 2024
-
[51]
C. D. Manning, P. Raghavan, and H. Schütze,Introduction to information retrieval. Cambridge uni- versity press, 2008
work page 2008
-
[52]
G. K. Savova, J. J. Masanz, P. V. Ogren, et al., “Mayo clinical text analysis and knowledge extrac- tion system (ctakes): Architecture, component evaluation and applications,”Journal of the American Medical Informatics Association, vol. 17, no. 5, pp. 507–513, 2010
work page 2010
-
[53]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,”arXiv preprint arXiv:1908.10084, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[54]
Handbook of inter-rater reliability,
K. Gwet, “Handbook of inter-rater reliability,”Gaithersburg, MD: STATAXIS Publishing Company, pp. 223–246, 2001
work page 2001
-
[55]
B. M. Derksen, W. Bruinsma, J. C. Goslings, and N. W. Schep, “The kappa paradox explained,”The Journal of hand surgery, vol. 49, no. 5, pp. 482–485, 2024
work page 2024
-
[56]
Themeasurementofobserveragreementforcategoricaldata,
J.R.LandisandG.G.Koch,“Themeasurementofobserveragreementforcategoricaldata,”biometrics, pp. 159–174, 1977
work page 1977
-
[57]
gpt-oss-120b & gpt-oss-20b Model Card
S. Agarwal et al., “Gpt-oss-120b & gpt-oss-20b model card,”arXiv preprint arXiv:2508.10925, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
American Psychiatric Association,Diagnostic and Statistical Manual of Mental Disorders: DSM-IV- TR®, 4th ed., text rev. Washington, DC: American Psychiatric Publishing, 2000,isbn: 978-0-89042- 024-9.doi:10.1176/appi.books.9780890420249.dsm-iv-tr Acknowledgment This initiative is sponsored by the U.S. Department of Veterans Affairs (VA) and utilizes VA-fun...
work page doi:10.1176/appi.books.9780890420249.dsm-iv-tr 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.