pith. machine review for the scientific record. sign in

arxiv: 2604.12258 · v2 · submitted 2026-04-14 · 💻 cs.CL · cs.AI

Recognition: unknown

Coding-Free and Privacy-Preserving Agentic Framework for Data-Driven Clinical Research

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:05 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords clinical research automationagentic AIlarge language modelsIRB documentationprivacy-preserving analysiscohort constructionhuman-in-the-loopdata-driven medicine
0
0 comments X

The pith

CARIS uses language models to automate clinical research from planning to reports without any coding or direct data access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CARIS as a system that lets clinicians drive data research through natural language commands alone. It automates the full pipeline of planning, literature search, cohort building, IRB paperwork, machine learning runs, and report writing while routing all outputs through human review and keeping raw patient records inaccessible to the models. The evaluation on three different clinical datasets showed the system finishing planning and IRB steps in four or fewer rounds and reaching 96 percent completeness under automated checks and 82 percent under human review. This approach aims to lower the technical and documentation hurdles that currently limit who can perform rigorous clinical studies on real data.

Core claim

CARIS integrates large language models with modular tools via the Model Context Protocol to execute end-to-end clinical research workflows, completing research planning and IRB documentation within four iterations, supporting Vibe ML, and producing reports with 96 percent LLM-evaluated completeness and 82 percent human-evaluated completeness across three heterogeneous datasets, all while preserving privacy by exposing only outputs to users.

What carries the argument

The Model Context Protocol (MCP) that links LLMs to specialized tools for tasks such as cohort construction and IRB drafting, allowing the entire workflow to be driven by natural language while keeping patient data private.

If this is right

  • Clinicians without programming skills can generate IRB-ready documentation and cohort definitions directly from study questions.
  • Research teams can iterate on analysis plans while keeping raw patient records inside secure environments.
  • Report generation becomes a repeatable output of the same agent loop rather than a separate manual step.
  • The same framework can be applied to both public and private datasets without changing the user interface.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the four-iteration bound holds across more institutions, the time from question to IRB submission could drop from weeks to days for many observational studies.
  • The privacy design opens the possibility of running the agent on federated hospital data without moving records to a central server.
  • Extending the tool set to include statistical power calculations or regulatory checklist completion could further compress the pre-study phase.
  • Human oversight remains essential, so the framework may be most useful as a co-pilot rather than a fully autonomous researcher.

Load-bearing premise

Large language models connected through MCP can reliably carry out clinical tasks such as cohort construction and IRB documentation with only the small number of human corrections reported.

What would settle it

Running CARIS on a fresh clinical dataset where planning or IRB documents require more than four rounds of correction or contain critical factual errors that human reviewers must rewrite.

Figures

Figures reproduced from arXiv: 2604.12258 by Hyeonhoon Lee, Hyeryun Park, Hyung-Chul Lee, Kyungsang Kim, Taehun Kim, Yushin Lee.

Figure 1
Figure 1. Figure 1: Overview of Clinical Agentic Research Intelligence System (CARIS). [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the clinical research workflow. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Revision patterns in IRB document generation. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Radar chart of IRB document evaluation results across four criteria. Each axis represents the pass rate (%) for each criteria assessed across three tasks. iterative refinement process that enhances clini￾cal relevance through domain expertise, PubMed evidence, and dataset-specific details. The final IRB documents were evaluated by LLM using four criteria: content completeness (9 items), non-expert accessib… view at source ↗
Figure 5
Figure 5. Figure 5: Checklist coverage of the final report across nine criteria. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Clinical data-driven research requires clinical expertise, programming skills, access to patient data, and extensive documentation, creating barriers and slowing the pace for clinicians and external researchers. To address this, we developed the Clinical Agentic Research Intelligence System (CARIS) that automates the workflow: research planning, literature search, cohort construction, Institutional Review Board (IRB) documentation, Vibe Machine Learning (ML), and report generation, with human-in-the-loop refinement. CARIS integrates Large Language Models (LLMs) with modular tools through the Model Context Protocol (MCP), enabling natural language-driven research without coding while allowing users to access only outputs. We evaluated CARIS on three heterogeneous datasets with distinct clinical tasks, where it completed planning and IRB documentation within four iterations, supported Vibe ML, and generated reports, achieving 96% completeness in LLM-based evaluation and 82% in human evaluation. CARIS demonstrates potential to reduce documentation burden and technical barriers, accelerating data-driven clinical research across public and private data environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CARIS (Clinical Agentic Research Intelligence System), an LLM-based agentic framework using the Model Context Protocol (MCP) to automate end-to-end data-driven clinical research workflows—research planning, literature search, cohort construction, IRB documentation, Vibe ML, and report generation—without requiring user coding while preserving privacy by exposing only outputs. Evaluated on three heterogeneous datasets with distinct clinical tasks, the system is reported to complete planning and IRB documentation in four human-in-the-loop iterations, support Vibe ML, and generate reports, achieving 96% completeness under LLM-based evaluation and 82% under human evaluation.

Significance. If the performance and safety claims are substantiated, CARIS could meaningfully reduce technical and documentation barriers for clinicians and external researchers working with sensitive clinical data, enabling faster iteration on cohort definition and regulatory documentation across public and private environments. The modular MCP integration for tool use without code exposure offers a practical template for privacy-preserving agentic systems in regulated domains.

major comments (3)
  1. [Abstract / Evaluation] Evaluation methodology (abstract and results): The central performance claims rest on 96% LLM-based and 82% human completeness scores, yet no definition of 'completeness,' error typology (e.g., factual inaccuracies in IRB text or incorrect cohort logic), baseline comparisons, or inter-rater reliability is provided. This leaves the metrics vulnerable to superficial coverage rather than verified clinical accuracy or safety.
  2. [Results] Human evaluation protocol: The 82% human score is reported without specifying the number or expertise of raters, blinding procedures, ground-truth references for the three datasets, or breakdown of error types (e.g., medical logic errors vs. formatting issues), which is required to support the claim that outputs are usable after only four iterations.
  3. [Evaluation] Generalizability across datasets: While three heterogeneous datasets are mentioned, no per-dataset or per-task performance breakdown is given, nor any analysis of failure modes on varying data schemas or clinical domains, undermining the assertion of broad applicability.
minor comments (2)
  1. [Abstract] The acronym 'Vibe ML' is used without an explicit expansion or reference on first use in the abstract.
  2. [Methods] The description of MCP integration would benefit from a brief diagram or pseudocode showing the tool-calling loop to clarify how privacy is enforced at the protocol level.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. The comments highlight important opportunities to strengthen the transparency and rigor of the evaluation methodology, and we address each major comment point by point below. We will incorporate revisions to address the identified gaps.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Evaluation methodology (abstract and results): The central performance claims rest on 96% LLM-based and 82% human completeness scores, yet no definition of 'completeness,' error typology (e.g., factual inaccuracies in IRB text or incorrect cohort logic), baseline comparisons, or inter-rater reliability is provided. This leaves the metrics vulnerable to superficial coverage rather than verified clinical accuracy or safety.

    Authors: We agree that the evaluation methodology requires greater specificity. The submitted manuscript reports the aggregate completeness scores but does not define 'completeness,' provide an error typology, include baseline comparisons, or report inter-rater reliability. In the revised version we will add an explicit definition of completeness (proportion of workflow components generated without critical factual or logical errors), a categorized error typology, a discussion of baseline limitations given the novelty of the end-to-end task, and inter-rater reliability statistics to better substantiate claims regarding clinical accuracy and safety. revision: yes

  2. Referee: [Results] Human evaluation protocol: The 82% human score is reported without specifying the number or expertise of raters, blinding procedures, ground-truth references for the three datasets, or breakdown of error types (e.g., medical logic errors vs. formatting issues), which is required to support the claim that outputs are usable after only four iterations.

    Authors: We acknowledge that the human evaluation protocol details are not described in the current manuscript. We will revise the Results section to specify the number and expertise of raters, blinding procedures, how ground-truth references were constructed for each dataset, and a breakdown of error types. These additions will provide the necessary context to evaluate the 82% score and the usability claim after four iterations. revision: yes

  3. Referee: [Evaluation] Generalizability across datasets: While three heterogeneous datasets are mentioned, no per-dataset or per-task performance breakdown is given, nor any analysis of failure modes on varying data schemas or clinical domains, undermining the assertion of broad applicability.

    Authors: We agree that the absence of per-dataset and per-task breakdowns, together with failure-mode analysis, limits the strength of the generalizability claim. The manuscript currently reports only aggregate results across the three datasets. In the revision we will add a table with performance metrics disaggregated by dataset and task, plus a dedicated analysis of observed failure modes related to data schemas and clinical domains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system description without derivations or self-referential predictions

full rationale

The paper presents CARIS as an LLM-integrated agentic framework for clinical research workflows and reports empirical completeness metrics (96% LLM-based, 82% human) on three datasets. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text or abstract. Claims rest on iterative human-in-the-loop execution and external evaluations rather than any chain that reduces to its own inputs by construction. Self-citations, if present, are not load-bearing for any mathematical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unverified assumption that current LLMs can be reliably orchestrated for high-stakes clinical documentation and analysis tasks; no independent evidence or formal verification of this capability is provided.

axioms (1)
  • domain assumption Large language models can be safely and accurately guided through clinical research workflows using modular tools and human-in-the-loop refinement.
    This assumption underpins the entire automation claim and is not independently tested in the provided abstract.
invented entities (2)
  • CARIS no independent evidence
    purpose: End-to-end automation of clinical research tasks
    Newly introduced system name and architecture.
  • Vibe ML no independent evidence
    purpose: Machine learning component within the agentic workflow
    Mentioned as a supported capability but not defined or evidenced.

pith-pipeline@v0.9.0 · 5496 in / 1309 out tokens · 79745 ms · 2026-05-10T16:05:35.731871+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

162 extracted references · 22 canonical work pages · 2 internal anchors

  1. [1]

    Watson J, Hutyra CA, Clancy SM, et al. Overcoming barriers to the adoption and implementation of predictive modeling and machine learning in clinical care: what can we learn from us academic medical cen- ters? JAMIA Open 2020;3:167–172. DOI: 10.1093/jamiaopen/ooz046

  2. [2]

    Democratizing artificial intel- ligence imaging analysis with automated machine learning: tutorial

    Thirunavukarasu AJ, Elangovan K, Gutier- rez L, et al. Democratizing artificial intel- ligence imaging analysis with automated machine learning: tutorial. J Med Internet Res 2023;25:e49949. DOI: 10.2196/49949

  3. [3]

    GPT-4 Technical Report

    Achiam J, Adler S, Agarwal S, et al. Gpt- 4 technical report. March 15, 2023 (https:// arxiv.org/abs/2303.08774). Preprint

  4. [4]

    Palm: Scaling language modeling with path- ways

    Chowdhery A, Narang S, Devlin J, et al. Palm: Scaling language modeling with path- ways. J Mach Learn Res 2023;24:1–113. (http: //jmlr.org/papers/v24/22-1144.html)

  5. [5]

    The llama 4 herd: The beginning of a new era of natively multimodal ai inno- vation

    Meta AI. The llama 4 herd: The beginning of a new era of natively multimodal ai inno- vation. April 5, 2025 (https://ai.meta.com/ blog/llama-4-multimodal-intelligence/)

  6. [6]

    Gemma 3 technical report

    Gemma Team. Gemma 3 technical report. March 25, 2025 (https://arxiv.org/abs/2503. 19786). Preprint

  7. [7]

    A survey on large language model based autonomous agents,

    Wang L, Ma C, Feng X, et al. A survey on large language model based autonomous agents. Front Comput Sci 2024;18:186345. DOI: 10.1007/s11704-024-40231-1

  8. [8]

    A survey on llm-based multi-agent systems: workflow, infrastructure, and challenges

    Li X, Wang S, Zeng S, Wu Y, Yang Y. A survey on llm-based multi-agent systems: workflow, infrastructure, and challenges. Vic- inagearth 2024;1:9. DOI: 10.1007/s44336-024- 00009-2

  9. [9]

    Llm-based agentic reasoning frame- works: A survey from methods to scenarios

    Zhao B, Foo LG, Hu P, Theobalt C, Rahmani H, Liu J. Llm-based agentic reasoning frame- works: A survey from methods to scenarios. August 25, 2025 (https://arxiv.org/abs/2508. 17692). Preprint

  10. [10]

    Autogen: Enabling next-gen llm applications via multi- agent conversations In: First Conference on Language Modeling (COLM)

    Wu Q, Bansal G, Zhang J, et al. Autogen: Enabling next-gen llm applications via multi- agent conversations In: First Conference on Language Modeling (COLM). 2024 (https:// openreview.net/forum?id=BAakY1hNKS)

  11. [11]

    Multi-agent col- laboration mechanisms: A survey of llms

    Tran KT, Dao D, Nguyen MD, Pham QV, O’Sullivan B, Nguyen HD. Multi-agent col- laboration mechanisms: A survey of llms. Jan- uary 10, 2025 (https://arxiv.org/abs/2501. 06322). Preprint

  12. [12]

    Baek J, Jauhar SK, Cucerzan S, Hwang SJ. Researchagent: Iterative research idea generation over scientific literature with large language models In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies (Volume 1: Long Papers). ACL: 2025:6709–6738 (https://...

  13. [13]

    DS - Agent : Automated Data Science by Empowering Large Language Models with Case - Based Reasoning

    Guo S, Deng C, Wen Y, Chen H, Chang Y, Wang J. Ds-agent: Automated data sci- ence by empowering large language mod- els with case-based reasoning. February 27, 2024 (https://arxiv.org/abs/2402.17453). Preprint

  14. [14]

    Agentic ai for scientific discovery: A survey of progress, challenges, and future directions,

    Gridach M, Nanavati J, Abidine KZE, Mendes L, Mack C. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions. March 12, 2025 (https: //arxiv.org/abs/2503.08979). Preprint

  15. [15]

    The virtual lab: Ai agents design new sars-cov-2 nanobodies with exper- imental validation

    Swanson K, Wu W, Bulaong NL, Pak JE, Zou J. The virtual lab: Ai agents design new sars-cov-2 nanobodies with exper- imental validation. November 11, 2024 (https://www.biorxiv.org/content/10.1101/ 2024.11.11.623004v1). Preprint

  16. [16]

    Frontiers in Digital Health , author =

    Artsi Y, Sorin V, Glicksberg BS, Korfiatis P, Nadkarni GN, Klang E. Large language models in real-world clinical workflows: a sys- tematic review of applications and implemen- tation. Front Digit Health 2025;7:1659134. DOI:10.3389/fdgth.2025.1659134

  17. [17]

    Bar- riers to and facilitators of artificial intel- ligence adoption in health care: scoping review

    Hassan M, Kushniruk A, Borycki E. Bar- riers to and facilitators of artificial intel- ligence adoption in health care: scoping review. JMIR Hum Factors 2024;11:e48633. DOI:10.2196/48633

  18. [18]

    Multi-site research using electronic health record data: Lessons learned from a case study

    Garcia B, Hogarth M, Wang Y, Zhu X, Tu SP. Multi-site research using electronic health record data: Lessons learned from a case study. Learn Health Syst 2025;9:e70039. DOI:10.1002/lrh2.10439

  19. [19]

    Advancements in electronic medical records for clinical trials: Enhancing data manage- ment and research efficiency

    Lee M, Kim K, Shin Y, Lee Y, Kim TJ. Advancements in electronic medical records for clinical trials: Enhancing data manage- ment and research efficiency. Cancers (Basel) 2025;17:1552. DOI:10.3390/cancers17091552

  20. [20]

    Beyond Sleep Staging: Advancing End-to-End Event Scoring in Sleep Medicine

    Holmes JH, Beinlich J, Boland MR, et al. Why is the electronic health record so chal- lenging for research and clinical care? Meth- ods Inf Med 2021;60:032–048. DOI:10.1055/s- 0041-1731784

  21. [21]

    Enhanc- ing clinical decision support and ehr insights through llms and the model context pro- tocol: An open-source mcp-fhir framework

    Ehtesham A, Singh A, Kumar S. Enhanc- ing clinical decision support and ehr insights through llms and the model context pro- tocol: An open-source mcp-fhir framework. June 18, 2025 (https://ieeexplore.ieee.org/ abstract/document/11105280). Preprint

  22. [22]

    A survey of the model context protocol (mcp): Standardizing context to enhance large lan- guage models (llms)

    Singh A, Ehtesham A, Kumar S, Khoei TT. A survey of the model context protocol (mcp): Standardizing context to enhance large lan- guage models (llms). April 3, 2025 (https: //www.preprints.org/frontend/manuscript/ b45407370ad06ed48b5ebc462c1d8a2c/ download pub). Preprint

  23. [23]

    Conversational llms simplify secure clinical data access, understanding, and anal- ysis

    Attrach RA, Moreira P, Fani R, Umeton R, Celi LA. Conversational llms simplify secure clinical data access, understanding, and anal- ysis. July 1, 2025 (https://doi.org/10.48550/ arXiv.2507.01053). Preprint

  24. [24]

    TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods.BMJ, 385:e078378, April 2024

    Collins GS, Moons KG, Dhiman P, et al. Tripod+ ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj 2024;385. DOI:10.1136/bmj-2023-078378

  25. [25]

    Evidence-based medicine: How to practice & teach EBM

    Haynes RB, Sackett DL, Richardson WS, Rosenberg W, Langley GR. Evidence-based medicine: How to practice & teach EBM. Canadian Medical Association. Journal 1997;157:788

  26. [26]

    Best match: new relevance search for pubmed

    Fiorini N, Canese K, Starchenko G, et al. Best match: new relevance search for pubmed. PLoS Biol 2018;16:e2005343. DOI:10.1371/journal.pbio.2005343

  27. [27]

    A unified approach to interpreting model predictions In: Advances in Neural Information Pro- cessing Systems

    Lundberg SM, Lee SI. A unified approach to interpreting model predictions In: Advances in Neural Information Pro- cessing Systems. 2017:4765–4774 (https: //proceedings.neurips.cc/paper/2017/hash/ 8a20a8621978632d76c43dfd28b67767-Abstract. html)

  28. [28]

    MIMIC-IV , a freely accessible electronic health record dataset,

    Johnson AE, Bulgarelli L, Shen L, et al. Mimic-iv, a freely accessible electronic health record dataset. Sci data 2023;10:1. DOI:10.1038/s41597-022-01899-x

  29. [29]

    Mimic-iv

    Johnson A, Bulgarelli L, Pollard T, et al. Mimic-iv. October 11, 2024 (https:// physionet.org/content/mimiciv/3.1/)

  30. [30]

    Inspire, a publicly available research dataset for peri- operative medicine

    Lim L, Lee H, Jung CW, et al. Inspire, a publicly available research dataset for peri- operative medicine. Sci Data 2024;11:655. 9 DOI:10.1038/s41597-024-03517-4

  31. [31]

    Inspire, a publicly available research dataset for perioperative medicine

    Lim L, Lee HC. Inspire, a publicly available research dataset for perioperative medicine. August 11, 2024 (https://physionet.org/ content/inspire/1.3/)

  32. [32]

    Syntheticmass data, version 2

    Walonoski J, Kramer M, Nichols J, et al. Syntheticmass data, version 2. May 24, 2017 (https://synthea.mitre.org/downloads)

  33. [33]

    Explainable machine learning for icu readmission prediction.arXiv preprint arXiv:2309.13781, 2023

    de S´a AG, Gould D, Fedyukova A, et al. Explainable machine learning for icu readmis- sion prediction. September 13, 2024 (https: //arxiv.org/abs/2309.13781). Preprint

  34. [34]

    Develop- ment of interpretable machine learning mod- els for prediction of acute kidney injury after noncardiac surgery: a retrospective cohort study

    Sun R, Li S, Wei Y, et al. Develop- ment of interpretable machine learning mod- els for prediction of acute kidney injury after noncardiac surgery: a retrospective cohort study. Int J Surg 2024;110:2950–2962. DOI:10.1097/JS9.0000000000001237

  35. [35]

    Risk stratification at prediabetes onset and association with dia- betes outcomes using EHR data

    Luo J, Hu D, Han R, et al. Risk stratification at prediabetes onset and association with dia- betes outcomes using EHR data. NPJ Metab Health Dis 2025;3:48. DOI:10.1038/s44324- 025-00091-0

  36. [36]

    MedGemma Technical Report

    Sellergren A, Kazemzadeh S, Jaroensri T, et al. Medgemma technical report. July 7, 2025 (https://arxiv.org/abs/2507.05201). Preprint. 10 Supplementary Appendix Supplementary Note 1. Orchestration prompt. The LLM orchestrates the workflow by interpreting user input and mapping it to appropriate tool invocations, guided by the prompt below, along with the a...

  37. [37]

    Take initiative: When given a research question, outline the full approach (data source -> cohort selection -> variables -> analysis plan -> expected outputs) before asking for confirmation

  38. [38]

    Check distributions, missing values, and sample sizes

    Be thorough: Run exploratory queries to understand the data before jumping to analysis. Check distributions, missing values, and sample sizes

  39. [39]

    Document everything: Save intermediate results as CSV files, produce well-structured Word reports, and keep a clear audit trail

  40. [40]

    Communicate like a researcher: Use precise terminology, cite statistical rationale, and present results with context (confidence intervals, effect sizes, limitations)

  41. [41]

    Present all results clearly with proper formatting

    Chain tools effectively: Combine database queries, file operations, ML pipelines, and document generation in a single workflow when appropriate. Present all results clearly with proper formatting. Supplementary Note 2. Tools descriptions. Table S1 provides a complete list and descriptions of key tools available to the agents. 1 Table S1 Key Tools for Clin...

  42. [42]

    Refine into a noun-phrase style topic

  43. [43]

    Remove extra background details and verbose wording

  44. [44]

    Keep essential domain terminology

  45. [45]

    topic_refined

    Provide one English version IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json {"topic_refined": "Refined research topic in English", "topic_en": "Refined research topic in English"} ``` The LLM prompt used to generate questions to draft the research plan is shown below. You are an ...

  46. [46]

    Generate exactly 12 questions total

  47. [47]

    Cover all four PIMO categories (P, I, M, O) at least once

  48. [48]

    Questions must not overlap in intent

  49. [49]

    Each question must include target_section and pimo_category

  50. [50]

    questions

    Keep each question specific and answerable by a user IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json { "questions": [ {"question_id": "Q1", "target_section": "research_purpose", "pimo_category": "P", "question": "Which prediabetic patient population is the primary target, and wha...

  51. [51]

    Clearly explain why the research is needed and what it aims to solve

  52. [52]

    Include background context, core objective, and hypothesis direction

  53. [53]

    Do not include implementation-level methods

  54. [55]

    Do not repeat claims or wording from previous sections

  55. [56]

    research_purpose

    Define all abbreviations and technical terms on first use; write so that non-experts can follow IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json {"research_purpose": "Research purpose paragraph in English"} ``` The LLM prompt used to generate the research design section based on t...

  56. [57]

    Describe study type and structural framework

  57. [58]

    Include cohort/group structure, comparison setup, and timeline scope; also address informed consent procedures (or justification for waiver), potential risks to participants and mitigation strategies, and applicable ethical guidelines (e.g., Declaration of Helsinki, IRB policies)

  58. [59]

    Minimize repetition of purpose rationale or analysis details

  59. [61]

    Do not overlap with previous sections

  60. [62]

    research_design

    Define all abbreviations and technical terms on first use; write so that non-experts can follow IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json {"research_design": "Research design paragraph in English"} ``` The LLM prompt to generate the research method section based on the resp...

  61. [63]

    Describe subject criteria, data collection, and variable definition

  62. [64]

    Include privacy protections (de-identification methods, access controls), data storage location, security measures, retention and deletion policies, and quality control procedures

  63. [65]

    Focus on operational workflow, not high-level design claims

  64. [67]

    Avoid repeated statements from previous sections

  65. [68]

    research_method

    Define all abbreviations and technical terms on first use; write so that non-experts can follow IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json {"research_method": "Research method paragraph in English"} ``` The LLM prompt to generate the validity evaluation section based on the ...

  66. [69]

    Specify validation strategy, statistical analyses, and performance metrics

  67. [70]

    Include confounder control, sensitivity analysis, and reproducibility checks

  68. [71]

    Keep it concrete and methodologically explicit

  69. [73]

    Do not repeat previous sections

  70. [74]

    validity_evaluation

    Define all abbreviations and technical terms on first use; write so that non-experts can follow IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json {"validity_evaluation": "Validity evaluation paragraph in English"} ``` The LLM prompt to generate the expected effects section based on...

  71. [75]

    Cover realistic academic, clinical, and practical impacts

  72. [76]

    Avoid overly optimistic language

  73. [77]

    Disclose potential conflicts of interest if applicable

  74. [78]

    Focus on significance and applicability, not detailed result patterns

  75. [79]

    Write one academic paragraph

  76. [80]

    Avoid overlap with previous sections

  77. [81]

    expected_effects

    Define all abbreviations and technical terms on first use; write so that non-experts can follow IMPORTANT: You must respond in English only, regardless of the input language. Respond in the following JSON format: ```json {"expected_effects": "Expected effects paragraph in English"} ``` The LLM prompt to generate the anticipated results section based on th...

  78. [82]

    Describe expected trends and directional outcomes for key endpoints

  79. [83]

    Differentiate primary and secondary outcomes when relevant

  80. [84]

    Keep this section outcome-focused, not significance-focused

Showing first 80 references.