pith. machine review for the scientific record. sign in

arxiv: 2604.23674 · v1 · submitted 2026-04-26 · 💻 cs.AI

Recognition: unknown

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:12 UTC · model grok-4.3

classification 💻 cs.AI
keywords Vibe Medicinehuman-AI co-workAI agentsbiomedical workflowslarge language modelsmedical skills collectiondrug repurposingrare disease diagnosis
0
0 comments X

The pith

Clinicians direct AI agents via natural language to run complex biomedical workflows while retaining oversight and decision authority.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Vibe Medicine as a human-AI co-work model for biomedical research. Researchers describe objectives in everyday language, and skill-augmented AI agents carry out the detailed multi-step work using large language models and a library of over one thousand curated medical skills. Humans keep the director role by setting goals, inspecting intermediate outputs, and applying domain judgment to final choices. Case studies cover rare disease diagnosis, drug repurposing, and clinical trial design to show complete pipelines in action. The stated aim is to lower barriers for independent and low-resource researchers while noting risks such as hallucination and privacy.

Core claim

Vibe Medicine is a co-work paradigm in which clinicians and researchers direct skill-augmented AI agents through natural language to execute complex, multi-step biomedical workflows. The enabling layers are capable LLMs, agent frameworks such as OpenClaw and Hermes Agent, and the OpenClaw medical skills collection of more than 1,000 curated skills drawn from open-source repositories across ten domains. Humans retain the role of research director by specifying objectives, reviewing results, and making domain-informed decisions. Demonstrations in rare disease diagnosis, drug repurposing, and clinical trial design illustrate end-to-end execution, with the broader goal of advancing research and,

What carries the argument

The OpenClaw medical skills collection of more than 1,000 curated skills across ten biomedical domains, which augments agent frameworks so that natural-language instructions can trigger reliable multi-step analysis on heterogeneous data.

If this is right

  • End-to-end biomedical workflows become executable from natural language instructions without requiring the user to write code.
  • Applications extend to rare disease diagnosis, drug repurposing, and clinical trial design through demonstrated case studies.
  • Specialized labor demands decrease, enabling more independent researchers and those in low-resource settings to conduct advanced work.
  • Risks of hallucination, privacy breaches, and over-reliance must be addressed to reach clinically trustworthy integration.
  • Future development focuses on more reliable and equitable agent-assisted biomedical research systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar skill-augmented agent collections could be built for non-biomedical domains to support analogous co-work models.
  • Training for biomedical researchers may shift emphasis toward task specification and result interpretation rather than tool implementation details.
  • Real-world deployment would require quantitative benchmarks of agent accuracy against expert performance on representative pipelines.
  • Wider use could shorten the time from hypothesis to analyzed results by enabling rapid iteration on analytical choices.

Load-bearing premise

Current large language models together with existing agent frameworks and the medical skills collection can handle mixed biomedical data types and multi-step pipelines at acceptably low rates of hallucination or error.

What would settle it

A head-to-head test on a multi-step task such as drug repurposing where an expert performs the analysis manually and an agent performs it under Vibe Medicine direction, then measures the frequency of factual errors or invalid conclusions in the agent output.

Figures

Figures reproduced from arXiv: 2604.23674 by Bowen Chen, Dajiang Zhu, Lin Zhao, Shaowen Wan, Siyuan Li, Steven Xu, Tianming Liu, Wei Ruan, Yanjun Lyu, Yiwei Li, Zihao Wu.

Figure 1
Figure 1. Figure 1: Conceptual overview of Vibe Medicine. The figure contrasts traditional research with Vibe Medicine and shows how LLMs, agent frameworks, medical skills, and biomedical tools and databases work together in a human-in-the-loop process, while also highlighting key applications and challenges. OpenClaw Medical Skills Scientific Literature Clinical Documentation Drug & Protein Design Genomics & Bioinformatics M… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the OpenClaw medical skills collection, which spans the full stack of biomedical research workflows that Vibe Medicine is designed to support. The left panel organizes biomedical capabilities across major domains, covering a diverse range of biomedical data modalities and tasks. The right panel lists representative skills for each domain. require programming skills, database access, statistical… view at source ↗
Figure 3
Figure 3. Figure 3: Diagnostic workflow for FBN1-related Marfan syndrome. The pipeline integrates Human Phenotype Ontology (HPO) phenotype mapping with Orphanet/OMIM searches and ClinVar variant interpretation to confirm clinical findings and prioritize genetic testing. linked to automated experimental execution, bringing anal￾ysis, decision-making, and laboratory action into a more complete closed loop. 5. Case Studies: Vibe… view at source ↗
Figure 4
Figure 4. Figure 4: Drug repurposing pipeline for Myasthenia Gravis. The execution highlights disease resolution via OpenTargets, survey of active trials for zilucoplan, and safety label interrogation for eculizumab using the OpenFDA tool branch. 5.2. Case Study 2: Drug Repurposing Pipeline This case study followed the medical-research-toolki t, tooluniverse-gwas-drug-discovery, and tooluniverse-d rug-drug-interaction skills.… view at source ↗
Figure 5
Figure 5. Figure 5: Clinical trial design workflow for EGFR-mutant NSCLC. The schematic shows the synthesis of biomarker prevalence and precedent trial benchmarking to generate an osimertinib-centered Phase II concept with stratified endpoints. and clinical-trial-protocol-skill workflows. As shown in view at source ↗
read the original abstract

With the emergence of large language models (LLMs) and AI agent frameworks, the human-AI co-work paradigm known as Vibe Coding is changing how people code, making it more accessible and productive. In scientific research, where workflows are more complex and the burden of specialized labor limits independent researchers and those in low-resource areas, the potential impact is even greater, particularly in biomedicine, which involves heterogeneous data modalities and multi-step analytical pipelines. In this paper, we introduce Vibe Medicine, a co-work paradigm in which clinicians and researchers direct skill-augmented AI agents through natural language to execute complex, multi-step biomedical workflows, while retaining the role of research director who specifies objectives, reviews intermediate results, and makes domain-informed decisions. The enabling infrastructure consists of three layers: capable LLMs, agent frameworks such as OpenClaw and Hermes Agent, and the OpenClaw medical skills collection, which includes more than 1,000 curated skills from multiple open-source repositories. We analyze the architecture and skill categories of this collection across ten biomedical domains, and present case studies covering rare disease diagnosis, drug repurposing, and clinical trial design that demonstrate end-to-end workflows in practice. We also identify the principal risks, such as hallucination, data privacy, and over-reliance, and outline directions toward more reliable, trustworthy, and clinically integrated agent-assisted research that advances research and technological equity and reduces health care resource disparities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes 'Vibe Medicine' as a human-AI co-work paradigm in which clinicians and researchers direct skill-augmented AI agents (via LLMs, frameworks such as OpenClaw and Hermes, and a collection of >1,000 curated medical skills) through natural language to execute complex multi-step biomedical workflows, while humans retain oversight as research directors. It describes the three-layer enabling infrastructure, analyzes skill categories across ten biomedical domains, presents three illustrative case studies (rare disease diagnosis, drug repurposing, and clinical trial design) as end-to-end demonstrations, and discusses risks including hallucination, data privacy, and over-reliance along with future directions for trustworthy integration.

Significance. If the core assumption holds, the paradigm could meaningfully lower barriers to advanced biomedical research for independent investigators and low-resource settings by shifting specialized labor to AI agents, thereby supporting greater equity in research output and healthcare innovation. The structured analysis of the skills collection across domains provides a useful inventory of current agent capabilities, and the explicit treatment of risks contributes to responsible framing of AI-assisted science.

major comments (3)
  1. [Case Studies] The case studies (rare disease diagnosis, drug repurposing, clinical trial design) are presented as demonstrations of end-to-end workflows, yet they consist solely of qualitative prompt-and-response traces with no reported quantitative metrics such as success rates, step-wise error rates, hallucination frequency, or expert-validated accuracy. This absence directly undermines the central claim that current LLMs combined with the OpenClaw/Hermes frameworks and skills collection can reliably handle heterogeneous data modalities and multi-step pipelines.
  2. [Infrastructure Description] The infrastructure section asserts that the OpenClaw medical skills collection (>1,000 curated skills from open-source repositories) enables the described co-work, but provides no details on curation criteria, accuracy validation of individual skills, or empirical testing against multi-modal biomedical data; without such grounding, the feasibility of the 'skill-augmented agents' premise remains untested.
  3. [Case Studies] The paper states that humans retain the role of research director who reviews results and makes domain-informed decisions, but the case studies supply no data on intervention frequency, error propagation, or how often AI outputs require correction, leaving the practicality of this oversight model unsupported.
minor comments (2)
  1. [Abstract] The abstract claims the case studies 'demonstrate end-to-end workflows in practice,' but this phrasing should be qualified to reflect their illustrative rather than evaluative nature.
  2. [Introduction] The distinction between 'Vibe Medicine' and the referenced 'Vibe Coding' paradigm could be clarified with a brief explicit comparison to avoid potential reader confusion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments identify important gaps in the empirical support for our claims, and we have revised the manuscript to clarify the illustrative nature of the case studies, expand details on the skills collection, and add explicit discussion of limitations and future validation needs.

read point-by-point responses
  1. Referee: [Case Studies] The case studies (rare disease diagnosis, drug repurposing, clinical trial design) are presented as demonstrations of end-to-end workflows, yet they consist solely of qualitative prompt-and-response traces with no reported quantitative metrics such as success rates, step-wise error rates, hallucination frequency, or expert-validated accuracy. This absence directly undermines the central claim that current LLMs combined with the OpenClaw/Hermes frameworks and skills collection can reliably handle heterogeneous data modalities and multi-step pipelines.

    Authors: We agree that the case studies are qualitative demonstrations rather than quantitative benchmarks and do not include metrics such as success rates or hallucination frequency. The manuscript frames them as illustrations of the Vibe Medicine paradigm in practice. In the revised version, we have added explicit statements in the case studies section and a new 'Limitations and Future Directions' subsection clarifying their illustrative purpose and outlining plans for systematic quantitative evaluation, including error rates and expert validation in follow-up studies. This addresses the concern by tempering the claims accordingly. revision: partial

  2. Referee: [Infrastructure Description] The infrastructure section asserts that the OpenClaw medical skills collection (>1,000 curated skills from open-source repositories) enables the described co-work, but provides no details on curation criteria, accuracy validation of individual skills, or empirical testing against multi-modal biomedical data; without such grounding, the feasibility of the 'skill-augmented agents' premise remains untested.

    Authors: The referee is correct that the original infrastructure section lacked sufficient detail on curation and validation. We have revised this section to describe the curation criteria, which prioritized skills from established open-source biomedical repositories based on task relevance, documentation quality, and existing community adoption. We also note that comprehensive per-skill accuracy validation and large-scale empirical testing on multi-modal data remain ongoing community efforts rather than completed work. The updated text now presents the collection as an enabling starting point while acknowledging the need for further grounding. revision: partial

  3. Referee: [Case Studies] The paper states that humans retain the role of research director who reviews results and makes domain-informed decisions, but the case studies supply no data on intervention frequency, error propagation, or how often AI outputs require correction, leaving the practicality of this oversight model unsupported.

    Authors: We acknowledge that the case studies are narrative and do not quantify human interventions, error propagation, or correction frequency. In the revision, we have added text in the case studies and limitations sections recognizing this gap and proposing that future agent frameworks incorporate logging to enable such measurements. This supports the conceptual oversight model while making clear that empirical data on its practicality will require dedicated follow-on studies. revision: partial

Circularity Check

0 steps flagged

No derivation chain or fitted results; purely descriptive proposal

full rationale

The paper introduces the Vibe Medicine paradigm as a human-AI co-work model enabled by existing LLMs, agent frameworks (OpenClaw, Hermes), and a curated skills collection. It analyzes skill categories across domains and presents three illustrative case studies as qualitative demonstrations. No equations, predictions, first-principles derivations, or parameter fittings appear anywhere in the text. The central claim does not reduce any result to quantities defined by its own inputs, nor does it rely on self-citation chains for uniqueness theorems or ansatzes. The proposal remains self-contained as a descriptive framework without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central proposal rests on untested assumptions about LLM reliability in biomedical domains and introduces a conceptual framework without new empirical grounding or formal definitions.

axioms (1)
  • domain assumption LLMs and agent frameworks can be reliably augmented with curated domain skills to execute multi-step biomedical workflows under human direction
    Invoked when describing the enabling infrastructure and case study success.
invented entities (1)
  • Vibe Medicine paradigm no independent evidence
    purpose: To organize human-AI co-work specifically for biomedical research
    New named framework introduced to encompass the described layers and workflows.

pith-pipeline@v0.9.0 · 5592 in / 1381 out tokens · 62833 ms · 2026-05-08T06:12:27.334302+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

129 extracted references · 26 canonical work pages · 5 internal anchors

  1. [1]

    Abramson,J.,Adler,J.,Dunger,J.,Evans,R.,Green,T.,Pritzel,A., Ronneberger, O., Willmore, L., Ballard, A.J., Bambrick, J., et al.,

  2. [2]

    Nature 630, 493–500

    Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500

  3. [3]

    medical-research-skills.https://github.com/ aipoch/medical-research-skills.GitHubrepository,accessed2026- 04-04

    AIPOCH, 2026. medical-research-skills.https://github.com/ aipoch/medical-research-skills.GitHubrepository,accessed2026- 04-04

  4. [4]

    The repertoire of mutational signatures in human cancer

    Alexandrov,L.B.,Kim,J.,Haradhvala,N.J.,Huang,M.N.,TianNg, A.W., Wu, Y., Boot, A., et al., 2020. The repertoire of mutational signatures in human cancer. Nature 578, 94–101

  5. [5]

    Introducing the Model Context Protocol.https: //www.anthropic.com/news/model-context-protocol

    Anthropic, 2024. Introducing the Model Context Protocol.https: //www.anthropic.com/news/model-context-protocol. Open standard for connecting AI assistants to data sources and tools

  6. [6]

    MOFA+:astatisticalframeworkfor comprehensiveintegrationofmulti-modalsingle-celldata

    Argelaguet, R., Arnol, D., Bredikhin, D., Deloro, Y., Velten, B., Marioni,J.C.,Stegle,O.,2020. MOFA+:astatisticalframeworkfor comprehensiveintegrationofmulti-modalsingle-celldata. Genome Biology 21, 111

  7. [7]

    Genomics in the cloud: Using Docker, GATK, and WDL in Terra

    Van der Auwera, G.A., O’Connor, B.D., 2020. Genomics in the cloud: Using Docker, GATK, and WDL in Terra

  8. [8]

    Integrating taxonomic, functional, and strain-level profiling of diverse microbial communi- ties with bioBakery 3

    Beghini, F., McIver, L.J., Blanco-Miguez, A., Dubois, L., Asnicar, F., Maharjan, S., Mailyan, A., et al., 2021. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communi- ties with bioBakery 3. eLife 10, e65088

  9. [9]

    Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

    Bolyen, E., Rideout, J.R., Dillon, M.R., Bokulich, N.A., Abnet, C.C., Al-Ghalith, G.A., Alexander, H., et al., 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37, 852–857

  10. [10]

    Augmenting large language models with chemistry tools

    Bran, A.M., Cox, S., Schilter, O., Baldassari, C., White, A.D., Schwaller, P., 2024. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535

  11. [11]

    Borg, omega, and kubernetes

    Burns, B., et al., 2016. Borg, omega, and kubernetes. Communica- tions of the ACM

  12. [12]

    Chai-1: Decoding the molecular interactions of life

    Chai Discovery team, Boitreaud, J., Dent, J., McPartlon, M., Meier, J., Reis, V., Rogozhonikov, A., Wu, K., 2024. Chai-1: Decoding the molecular interactions of life. bioRxiv , 2024–10

  13. [13]

    Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden

    Chalmers, Z.R., Connelly, C.F., Fabrizio, D., Gay, L., Ali, S.M., Ennis, R., Schrock, A., et al., 2017. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Medicine 9, 34

  14. [14]

    fastp:anultra-fastall-in- one FASTQ preprocessor

    Chen,S.,Zhou,Y.,Chen,Y.,Gu,J.,2018. fastp:anultra-fastall-in- one FASTQ preprocessor. Bioinformatics 34, i884–i890

  15. [15]

    Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

    Chen,X.,Schulz-Trieglaff,O.,Shaw,R.,Barnes,B.,Schlesinger,F., Källberg,M.,Cox,A.J.,Kruglyak,S.,Saunders,C.T.,2016. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222

  16. [16]

    PRSice-2: Polygenic Risk Score software for biobank-scale data

    Choi, S.W., O’Reilly, P.F., 2019. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience 8, giz082

  17. [17]

    Integration of customised LLM for discharge summary generation in real-world clinical settings: a pilot study on RUSSELL GPT

    Chua, C.E., Clara, N.L.Y., Furqan, M.S., Kit, J.L.W., Makmur, A., Tham, Y.C., Santosa, A., Ngiam, K.Y., 2024. Integration of customised LLM for discharge summary generation in real-world clinical settings: a pilot study on RUSSELL GPT. The Lancet Regional Health–Western Pacific 51

  18. [18]

    NanoClaw: Secure AI agent for WhatsApp, Telegram and more.https://nanoclaw.dev/

    Cohen, G., 2026. NanoClaw: Secure AI agent for WhatsApp, Telegram and more.https://nanoclaw.dev/. Built on Anthropic Claude Agent SDK; container-isolated execution

  19. [19]

    Regression models and life-tables

    Cox, D.R., 1972. Regression models and life-tables. J. R. Stat. Soc. Ser. B 34, 187–202

  20. [20]

    Agentic AI in radiology: emerging potential and unresolved challenges

    Dietrich, N., 2025. Agentic AI in radiology: emerging potential and unresolved challenges. Br. J. Radiol. 98, 1582–1584. doi:10.1093/ bjr/tqaf173

  21. [21]

    STAR: ultrafast universal RNA-seq aligner

    Dobin,A.,Davis,C.A.,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha, S., Batut, P., Chaisson, M., Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21

  22. [22]

    Regulation (EU) 2024/1689 laying down harmonised rules on ar- tificialintelligence(AIact)

    European Parliament and Council of the European Union, 2024. Regulation (EU) 2024/1689 laying down harmonised rules on ar- tificialintelligence(AIact). OfficialJournaloftheEuropeanUnion; high-risk AI medical device obligations effective August 2027

  23. [23]

    MultiQC: summarizeanalysisresultsformultipletoolsandsamplesinasingle report

    Ewels, P., Magnusson, M., Lundin, S., Käller, M., 2016. MultiQC: summarizeanalysisresultsformultipletoolsandsamplesinasingle report. Bioinformatics 32, 3047–3048

  24. [24]

    Uncover this tech term: Agentic artificial intelligence in radiology

    Faghani, S., Moassefi, M., Rouzrokh, P., Khosravi, B., Erickson, B.J., 2025. Uncover this tech term: Agentic artificial intelligence in radiology. Korean J. Radiol. 26, 888–892. doi:10.3348/kjr.2025. 0370

  25. [25]

    Towards trustworthy AI: A review of ethical and robust large language models

    Ferdaus, M.M., Abdelguerfi, M., Ioup, E., Niles, K.N., Pathak, K., Sloan, S., 2026. Towards trustworthy AI: A review of ethical and robust large language models. ACM Computing Surveys 58, 1–43. doi:10.1145/3777382

  26. [26]

    The evolving role of preprints in the disseminationofCOVID-19researchandtheirimpactonthescience Z

    Fraser, N., Brierley, L., Dey, G., Polka, J.K., Pálfy, M., Nanni, F., Coates, J.A., 2021. The evolving role of preprints in the disseminationofCOVID-19researchandtheirimpactonthescience Z. Wu et al.:Preprint submitted to ElsevierPage 17 of 20 Vibe Medicine communication landscape. PLOS Biology 19, e3000959. doi:10. 1371/journal.pbio.3000959

  27. [27]

    OpenClaw Medical Skills: The largest open-source medical AI skills library.https://github.com/ FreedomIntelligence/OpenClaw-Medical-Skills

    FreedomIntelligence, 2025. OpenClaw Medical Skills: The largest open-source medical AI skills library.https://github.com/ FreedomIntelligence/OpenClaw-Medical-Skills

  28. [28]

    The REporting of A disproportionality analysis forDrUgsafetysignaldetectionusingindividualcasesafetyreports inPharmacoVigilance(READUS-PV):explanationandelaboration

    Fusaroli, M., Salvo, F., Begaud, B., AlShammari, T.M., Bate, A., Battini, V., Brueckner, A., Candore, G., Carnovale, C., Crisafulli, S., et al., 2024. The REporting of A disproportionality analysis forDrUgsafetysignaldetectionusingindividualcasesafetyreports inPharmacoVigilance(READUS-PV):explanationandelaboration. Drug Safety 47, 585–599

  29. [29]

    Txagent: an ai agent for therapeutic reasoning across a universe of tools.arXiv preprint arXiv:2503.10970, 2025

    Gao, S., Zhu, R., Kong, Z., Noori, A., Su, X., Ginder, C., Tsiligkaridis, T., Zitnik, M., 2025a. TxAgent: An AI agent for therapeutic reasoning across a universe of tools. arXiv preprint arXiv:2503.10970

  30. [30]

    Democratizing ai scientists using tooluniverse.arXiv preprint arXiv:2509.23426, 2025

    Gao, S., Zhu, R., Sui, P., Kong, Z., Aldogom, S., Huang, Y., Noori, A.,Shamji,R.,Parvataneni,K.,Tsiligkaridis,T.,Zitnik,M.,2025b. Democratizing AI scientists using ToolUniverse. arXiv preprint arXiv:2509.23426

  31. [31]

    ChEMBL:alarge-scalebioactivitydatabasefordrug discovery

    Gaulton, A., Bellis, L.J., Bento, A.P., Chambers, J., Davies, M., Hersey,A.,Light,Y.,McGlinchey,S.,Michalovich,D.,Al-Lazikani, B.,etal.,2012. ChEMBL:alarge-scalebioactivitydatabasefordrug discovery. Nucleic Acids Research 40, D1100–D1107

  32. [32]

    arXiv preprint arXiv:2510.12399 , year=

    Ge, Y., Mei, L., Duan, Z., Li, T., Zheng, Y., Wang, Y., Wang, L., Yao, J., Liu, T., Cai, Y., et al., 2025. A survey of vibe coding with large language models. arXiv preprint arXiv:2510.12399

  33. [33]

    GRADE: an emerging consensus on rating quality of evidence and strength of recommen- dations

    Guyatt, G.H., Oxman, A.D., Vist, G.E., Kunz, R., Falck-Ytter, Y., Alonso-Coello, P., Schünemann, H.J., 2008. GRADE: an emerging consensus on rating quality of evidence and strength of recommen- dations. BMJ 336, 924–926

  34. [34]

    Integratedanalysisofmultimodal single-cell data

    Hao,Y.,Hao,S.,Andersen-Nissen,E.,MauckIII,W.M.,Zheng,S., Butler,A.,Lee,M.J.,etal.,2021. Integratedanalysisofmultimodal single-cell data. Cell 184, 3573–3587

  35. [35]

    Metagpt:Metaprogrammingfor amulti-agentcollaborativeframework

    Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., Zhou, L., Ran, C., Xiao, L.,Wu,C.,Schmidhuber,J.,2024. Metagpt:Metaprogrammingfor amulti-agentcollaborativeframework. Int.Conf.Learn.Represent

  36. [36]

    Huang, K., Zhang, S., Wang, H., Qu, Y., Lu, Y., Roohani, Y., et al.,

  37. [37]

    Biomni: A general-purpose biomedical ai agent.bioRxiv preprint, 2025

    Biomni: A general-purpose biomedical AI agent. bioRxiv doi:10.1101/2025.05.30.656746

  38. [38]

    A survey of safety and trustworthiness of large language models through the lens of verification and validation

    Huang, X., Ruan, W., Huang, W., Jin, G., Dong, Y., Wu, C., Ben- salem, S., Mu, R., Qi, Y., Zhao, X., Cai, K., Zhang, Y., Wu, S., Xu, P., Wu, D., Freitas, A., Mustafa, M.A., 2024a. A survey of safety and trustworthiness of large language models through the lens of verification and validation. Artificial Intelligence Review 57, 175. doi:10.1007/s10462-024-10824-0

  39. [39]

    Trustllm: Trustwor- thiness in large language models.arXiv preprint arXiv:2401.05561, 3, 2024

    Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al., 2024b. Trustllm: Trustwor- thiness in large language models. arXiv preprint arXiv:2401.05561

  40. [40]

    Hunter,J.D.,2007.Matplotlib:A2dgraphicsenvironment.Comput. Sci. Eng. 9, 90–95

  41. [41]

    Agentmd: Empowering language agents for risk predictionwithlarge-scaleclinicaltoollearning

    Jin, Q., Wang, Z., Yang, Y., Zhu, Q., Wright, D., Huang, T., Khan- dekar,N.,Wan,N.,Ai,X.,Wilbur,W.J.,He,Z.,Taylor,R.A.,Chen, Q., Lu, Z., 2025. Agentmd: Empowering language agents for risk predictionwithlarge-scaleclinicaltoollearning. Nat.Commun.16,

  42. [42]

    doi:10.1038/s41467-025-64430-x

  43. [43]

    Inference and analysis of cell-cell communication using CellChat

    Jin, S., Guerrero-Juarez, C.F., Zhang, L., Chang, I., Ramos, R., Kuan, C.H., Myung, P., Plikus, M.V., Nie, Q., 2021. Inference and analysis of cell-cell communication using CellChat. Nature Communications 12, 1088

  44. [44]

    Highly accurate protein structure prediction with AlphaFold

    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ron- neberger,O.,Tunyasuvunakool,K.,Bates,R.,Žídek,A.,Potapenko, A., et al., 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589

  45. [45]

    Antibody engineering and its therapeutic applications

    Kandari, D., Bhatnagar, R., 2023. Antibody engineering and its therapeutic applications. International Reviews of Immunology 42, 156–183

  46. [46]

    Nonparametric estimation from incomplete observations

    Kaplan, E.L., Meier, P., 1958. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481

  47. [47]

    The mutational constraint spectrum quantified from variation in 141,456 humans

    Karczewski, K.J., Francioli, L.C., Tiao, G., Cummings, B.B., Alföldi, J., Wang, Q., Collins, R.L., et al., 2020. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443

  48. [48]

    Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

    Kim, D., Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L., 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915

  49. [49]

    EchoFM: Foundation model for generalizable echocardiogram analysis

    Kim, S., Jin, P., Song, S., Chen, C., Li, Y., Ren, H., Li, X., Liu, T., Li, Q., 2025. EchoFM: Foundation model for generalizable echocardiogram analysis. IEEE Transactions on Medical Imaging 44, 4049–4062. doi:10.1109/TMI.2025.3580713

  50. [50]

    Strelka2: fast and accurate calling of germline and somatic variants

    Kim, S., Scheffler, K., Halpern, A.L., Bekritsky, M.A., Noh, E., Källberg, M., Chen, X., et al., 2018. Strelka2: fast and accurate calling of germline and somatic variants. Nature Methods 15, 591– 594

  51. [51]

    The PHQ- 9: Validity of a brief depression severity measure

    Kroenke, K., Spitzer, R.L., Williams, J.B.W., 2001. The PHQ- 9: Validity of a brief depression severity measure. Journal of General Internal Medicine 16, 606–613. doi:10.1046/j.1525-1497. 2001.016009606.x

  52. [52]

    ClinVar: improving access to variant interpretations and supporting evidence

    Landrum, M.J., Lee, J.M., Benson, M., Brown, G.R., Chao, C., Chitipiralla, S., Gu, B., et al., 2018. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Research 46, D1062–D1067

  53. [53]

    Langchain.https://github.com/ langchain-ai/langchain

    LangChain Team, 2023. Langchain.https://github.com/ langchain-ai/langchain

  54. [54]

    How can clinicians leverage vibe coding for machine learning and deep learning research? Endocrinol

    Lee, Y., Huh, S., 2025. How can clinicians leverage vibe coding for machine learning and deep learning research? Endocrinol. Metab. (Seoul) 40, 659–667. doi:10.3803/EnM.2025.2675

  55. [55]

    Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin, D., Ghanem, B.,

  56. [56]

    Camel:Communicativeagentsfor“mind”explorationoflarge language model society. Adv. Neural Inf. Process. Syst. 36, 51991– 52008

  57. [57]

    Deeplearningfordrug- drug interaction prediction: A comprehensive review

    Li,X.,Xiong,Z.,Zhang,W.,Liu,S.,2024a. Deeplearningfordrug- drug interaction prediction: A comprehensive review. Quantitative Biology 12, 30–52

  58. [58]

    Artificial general intelligence for medicalimaginganalysis.IEEEReviewsinBiomedicalEngineering 17, 337–358

    Li, X., Zhao, L., Zhang, L., Wu, Z., Liu, Z., Jiang, H., Cao, C., Xu, S., Li, Y., Dai, H., Yuan, Y., Liu, J., Li, G., Zhu, D., Yan, P., Li, Q., Liu, W., Liu, T., Shen, D., 2024b. Artificial general intelligence for medicalimaginganalysis.IEEEReviewsinBiomedicalEngineering 17, 337–358. doi:10.1109/RBME.2023.3339966

  59. [59]

    Deep generative modeling for single-cell transcriptomics

    Lopez, R., Regier, J., Cole, M.B., Jordan, M.I., Yosef, N., 2018. Deep generative modeling for single-cell transcriptomics. Nature Methods 15, 1053–1058

  60. [60]

    Moderated estimation of foldchangeanddispersionforRNA-seqdatawithDESeq2

    Love, M.I., Huber, W., Anders, S., 2014. Moderated estimation of foldchangeanddispersionforRNA-seqdatawithDESeq2. Genome Biology 15, 550

  61. [61]

    Nature Medicine 30(3), 863–874 (2024)

    Lu,M.Y.,Chen,B.,Williamson,D.F.K.,Chen,R.J.,Liang,I.,Ding, T.,Jaume,G.,Odintsov,I.,Le,L.P.,Gerber,G.,etal.,2024.Avisual- language foundation model for computational pathology. Nature Medicine 30, 863–874. doi:10.1038/s41591-024-02856-4

  62. [62]

    Luo, R., Sun, L., Xia, Y., Qin, T., Zhang, S., Poon, H., Liu, T.Y., 2022.Biogpt:Generativepre-trainedtransformerforbiomedicaltext generation and mining. Brief. Bioinform. 23, bbac409

  63. [63]

    The Ensembl Variant Effect Predictor

    McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thor- mann, A., Flicek, P., Cunningham, F., 2016. The Ensembl Variant Effect Predictor. Genome Biology 17, 122

  64. [64]

    Docker: Lightweight linux containers for consis- tent development and deployment

    Merkel, D., 2014. Docker: Lightweight linux containers for consis- tent development and deployment. Linux Journal

  65. [65]

    Vibe coding: a new paradigm for biomedical software development

    Moore, J.H., Tatonetti, N., 2025a. Vibe coding: a new paradigm for biomedical software development. BioData Mining 18, 46. doi:10.1186/s13040-025-00462-9

  66. [66]

    From prompt engineering to agent engineering: expanding the ai toolbox with autonomous agentic ai collaborators for biomedical discovery

    Moore, J.H., Tatonetti, N.P., 2025b. From prompt engineering to agent engineering: expanding the ai toolbox with autonomous agentic ai collaborators for biomedical discovery. BioData Mining 18, 78. Z. Wu et al.:Preprint submitted to ElsevierPage 18 of 20 Vibe Medicine

  67. [67]

    Drug repurposing using FDA adverse event reporting system (FAERS) database

    Morris, R., Ali, R., Cheng, F., 2024. Drug repurposing using FDA adverse event reporting system (FAERS) database. Current Drug Targets 25, 454–464

  68. [68]

    New evidence pyramid

    Murad, M.H., Asi, N., Alsawas, M., Alahdab, F., 2016. New evidence pyramid. BMJ Evidence-Based Medicine 21, 125–127. doi:10.1136/ebmed-2016-110401

  69. [69]

    Artificial intelligence agents in healthcare research: A scoping review

    Njei, B., Al-Ajlouni, Y.A., Kanmounye, U.S., Boateng, S., Ngue- fang, G.L., Njei, N., Hamouri, S., Al-Ajlouni, A.F., 2026. Artificial intelligence agents in healthcare research: A scoping review. PLoS One 21, e0342182. doi:10.1371/journal.pone.0342182

  70. [70]

    Hermes Agent: The self-improving AI agent.https://github.com/NousResearch/hermes-agent

    Nous Research, 2026. Hermes Agent: The self-improving AI agent.https://github.com/NousResearch/hermes-agent. MIT Li- cense;closedlearningloopwithGEPA-basedskillevolution;Open- Claw skill migration

  71. [71]

    Development and evaluation of a clinical note sum- marization system using large language models

    Oliveira, J.D., Santos, H.D., Ulbrich, A.H.D., Couto, J.C., Arocha, M., Santos, J., Costa, M.M., Faccio, D., Tabalipa, F.O., Nogueira, R.F., 2025. Development and evaluation of a clinical note sum- marization system using large language models. Communications Medicine 5, 376

  72. [72]

    Openai function calling and tool use.https:// platform.openai.com/docs/guides/function-calling

    OpenAI, 2023. Openai function calling and tool use.https:// platform.openai.com/docs/guides/function-calling

  73. [73]

    OpenClaw: Personal AI assis- tant.https://github.com/openclaw/openclaw

    OpenClaw Community, 2025–2026. OpenClaw: Personal AI assis- tant.https://github.com/openclaw/openclaw

  74. [74]

    Opentrons python protocol API documentation

    Opentrons, 2026. Opentrons python protocol API documentation. https://docs.opentrons.com/python-api/

  75. [75]

    OWASPtop10forlargelanguagemodel applications2025.https://genai.owasp.org/

    OWASPFoundation,2025. OWASPtop10forlargelanguagemodel applications2025.https://genai.owasp.org/. LLM01:2025Prompt Injection

  76. [76]

    BindCraft: one-shot design of functional protein binders

    Pacesa, M., Nickel, L., Schellhaas, C., Schmidt, J., Pyatova, E., Kissling,L.,Barendse,P.,Choudhury,J.,Kapoor,S.,Alcaraz-Serna, A., et al., 2024. BindCraft: one-shot design of functional protein binders. bioRxiv , 2024–09

  77. [77]

    The PRISMA 2020 statement: An updated guideline for reporting systematic reviews

    Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Bren- nan, S.E., et al., 2021. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 372, n71. doi:10. 1136/bmj.n71

  78. [78]

    Nature Methods 19, 171–178

    Palla, G., Spitzer, H., Klein, M., Fischer, D., Schaar, A.C., Kuem- merle,L.B.,Rybakov,S.,etal.,2022.Squidpy:ascalableframework for spatial omics analysis. Nature Methods 19, 171–178

  79. [79]

    Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Som- nath,V.R.,Getz,N.,Portnoi,T.,Roy,J.,Stark,H.,etal.,2025.Boltz- 2:Towardsaccurateandefficientbindingaffinityprediction.bioRxiv

  80. [80]

    Gorilla: Large Language Model Connected with Massive APIs

    Patil, S.G., et al., 2023. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334

Showing first 80 references.