Building AI-Ready Data Systems for Space Life Sciences, Aerospace Medicine, and Deep Space Exploration

Afshin Beheshti; Amanda M. Saravia-Butler; Brian M. Evarts; Christopher E. Mason; Gautier Bardi de Fourtou; James A. Casaletto; Jelena Te\v{s}i\'c; Laetitia Frost; Lauren M. Sanders; Pedro Madrigal

arxiv: 2606.28856 · v1 · pith:U6OACFYInew · submitted 2026-06-27 · 🧬 q-bio.OT · cs.AI

Building AI-Ready Data Systems for Space Life Sciences, Aerospace Medicine, and Deep Space Exploration

Sylvain V. Costes , Sergio Garcia Busto , Ryan T. Scott , James A. Casaletto , Gautier Bardi de Fourtou , Brian M. Evarts , Amanda M. Saravia-Butler , Xavier-Lewis Palmer

show 10 more authors

Rodrigo Coutinho de Almeida Laetitia Frost Jelena Te\v{s}i\'c Afshin Beheshti Christopher E. Mason Peter W. Rose Sergio E. Baranzini Lauren M. Sanders Stefania Giacomello Pedro Madrigal

This is my paper

Pith reviewed 2026-06-30 08:41 UTC · model grok-4.3

classification 🧬 q-bio.OT cs.AI

keywords AI-ready dataFAIR principlesspace life sciencesdata infrastructuredeep space explorationaerospace medicinemachine-actionable datainternational governance

0 comments

The pith

Spaceflight biological data requires a three-tier progression from FAIR to AI-ready to space-ready forms to become usable by AI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that open access standards like FAIR enable human reuse but fall short for AI systems because of varying demands on data structure, metadata, and interfaces. It proposes advancing through three tiers: FAIR data, then AI-ready data that is machine-actionable, and finally space-ready data tailored for deep space conditions. This restructuring would close the AI access gap for heterogeneous spaceflight datasets in life sciences and aerospace medicine. A neutral international coordinating body is put forward to provide governance and ensure trustworthy, agent-accessible infrastructure. Without these steps, AI cannot reliably support biological research needed for deep space exploration.

Core claim

The authors state that a three-tier approach proceeding from FAIR to AI-ready to space-ready data, backed by a neutral international coordinating body, is required to systematically restructure heterogeneous spaceflight biological data into machine-actionable forms that close the AI access gap and enable trustworthy, agent-accessible infrastructure for deep space biological research.

What carries the argument

The three-tier data progression from FAIR to AI-ready to space-ready, which carries the argument by defining successive levels of machine accessibility and space-specific optimization for biological datasets.

If this is right

Existing infrastructures can be improved to support AI access to diverse spaceflight datasets.
AI systems gain the capacity to access and analyze heterogeneous scientific data from space missions.
A trustworthy, agent-accessible infrastructure becomes available for deep space biological research.
Systematic restructuring of data into machine-actionable forms is needed beyond current open-access practices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized data handling could emerge across international space agencies to support shared AI tools.
Integrated datasets might allow AI to model combined effects of space environment and biology in real time.
The approach could extend to other high-stakes domains like climate or medical research needing agent-accessible data.

Load-bearing premise

That existing open-access infrastructures cannot meet the distinct demands of growing AI approaches on data structure, metadata, and access interfaces.

What would settle it

Showing that multiple current FAIR-compliant space biology databases can be queried and analyzed accurately by diverse AI models with no added restructuring or new governance.

read the original abstract

While AI holds the potential to revolutionize space life sciences, realizing this promise is contingent upon the systematic restructuring of heterogeneous spaceflight biological data into machine-actionable, AI-ready forms. Even though open access principles support human reuse and scientific reproducibility, this does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets. In addition, the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces. In order to respond to such growing changes we propose a three-tier approach, proceeding from FAIR to AI-ready to space-ready data. We discuss existing infrastructures and how they can be improved to close the AI access gap. We conclude by proposing a neutral international coordinating body as the governance backbone for the trustworthy, agent-accessible space biology infrastructure that deep space biological research will require.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes a three-tier data framework for space biology AI plus a new coordinating body, but the case for needing it over existing FAIR systems stays unshown.

read the letter

The core pitch is that spaceflight biological data needs to progress from FAIR to AI-ready to space-ready, with a neutral international body handling governance so AI systems can actually use it for deep space work. That is the main thing to take away.

It does a reasonable job naming the practical issue: open data helps people but may not give AI the structure, metadata, or interfaces it needs, especially with heterogeneous space datasets. The authors tie this to real infrastructures like NASA GeneLab and ESA archives, and they extend the FAIR conversation into this applied domain without claiming a brand-new theory. That part is straightforward and useful for anyone managing space life sciences data.

The soft spot is the missing evidence for the central claim. The text asserts that current open-access setups do not enable AI and that a systematic three-tier shift plus new governance is required, yet it gives no examples of specific AI methods (transformers on omics, agents, etc.) failing on existing data, no documented failure modes, and no implementation details on what the tiers would change in practice. Without that, the argument that incremental fixes to current systems are insufficient does not land. The stress-test note holds up here.

This is for data managers, funders, and policy people in aerospace biology rather than core method developers. A reader already working on space data infrastructure might pick up framing ideas, but it will not change experiments or models.

Send it to peer review. The topic matters for future missions and the proposal is coherent on its own terms, even if thin on evidence; referees could ask for the concrete cases that are now absent.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that realizing the potential of AI in space life sciences requires restructuring heterogeneous spaceflight biological data into machine-actionable forms via a three-tier progression from FAIR to AI-ready to space-ready data. It asserts that open access and FAIR principles do not suffice for AI systems due to their distinct demands on data structure, metadata, and access interfaces, and proposes a neutral international coordinating body to provide governance for trustworthy, agent-accessible infrastructure.

Significance. Should the proposed three-tier approach and coordinating body be adopted and validated, this work would be significant in bridging the gap between current data infrastructures and the needs of AI for deep space exploration. It identifies a potentially critical limitation in existing open-access systems for supporting advanced AI applications in biology and medicine, offering a conceptual framework that could guide future infrastructure development in the field.

major comments (2)

[Abstract] Abstract: The assertion that open access 'does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets' and that 'the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces' is load-bearing for justifying the three-tier proposal and new coordinating body, yet the text supplies no concrete examples of named AI methods, specific spaceflight datasets, or documented failure modes of existing systems such as NASA GeneLab or ESA archives.
[Abstract] Abstract (final paragraph): The proposal for a 'neutral international coordinating body' as governance backbone rests on the unshown premise that existing bodies cannot meet AI demands; without differentiation from or analysis of current international data-sharing mechanisms, the necessity of a new entity over incremental improvements to existing infrastructures is not secured.

minor comments (2)

The terms 'AI-ready' and 'space-ready' are introduced without explicit definitions or criteria in the abstract, which would improve clarity if provided with examples in the main text.
The discussion of how existing infrastructures can be improved would benefit from at least one illustrative case study or table comparing current metadata standards to AI-specific requirements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the justification for our proposed framework. We respond to each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that open access 'does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets' and that 'the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces' is load-bearing for justifying the three-tier proposal and new coordinating body, yet the text supplies no concrete examples of named AI methods, specific spaceflight datasets, or documented failure modes of existing systems such as NASA GeneLab or ESA archives.

Authors: We agree that the abstract would be strengthened by concrete examples to support the load-bearing claims. The full manuscript discusses limitations of current infrastructures, but does not provide named AI methods or specific failure modes in the abstract. We will revise the abstract to include brief, specific examples (e.g., transformer-based models requiring standardized metadata schemas and challenges with heterogeneous omics data in GeneLab) while maintaining length constraints. revision: yes
Referee: [Abstract] Abstract (final paragraph): The proposal for a 'neutral international coordinating body' as governance backbone rests on the unshown premise that existing bodies cannot meet AI demands; without differentiation from or analysis of current international data-sharing mechanisms, the necessity of a new entity over incremental improvements to existing infrastructures is not secured.

Authors: The manuscript positions the new body as necessary for neutral, cross-agency coordination focused on agent-accessible infrastructure. We acknowledge that the abstract does not differentiate from existing mechanisms. In revision, we will expand the discussion section with a concise analysis of current bodies (e.g., NASA GeneLab governance, ESA data policies, and ISS international agreements) to highlight gaps in AI-specific requirements that incremental changes may not fully address. revision: yes

Circularity Check

0 steps flagged

No circularity: high-level conceptual proposal without derivations or reductions to inputs

full rationale

The manuscript is a policy-style proposal advocating a three-tier data progression (FAIR to AI-ready to space-ready) plus a coordinating body. It contains no equations, no fitted parameters, no predictions derived from data, and no mathematical derivations. The central argument rests on the stated premise that existing open-access systems fall short for AI demands, but this premise is presented as an assumption rather than derived from or reduced to any self-referential construction within the paper. No self-citation chains, ansatzes, or renamings function as load-bearing steps that collapse the claim back onto its own inputs. The text is therefore self-contained as an advocacy document and exhibits none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on domain assumptions about data usability and AI requirements rather than new measurements or derivations; the coordinating body is an invented governance entity without independent evidence.

axioms (2)

domain assumption Open access principles support human reuse and scientific reproducibility but do not necessarily enable AI systems to access and analyze diverse scientific datasets
Invoked in the abstract as the starting premise that motivates the need for AI-ready restructuring.
domain assumption The growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces
Used to justify why FAIR alone is insufficient and why a new tiered approach is required.

invented entities (1)

neutral international coordinating body no independent evidence
purpose: governance backbone for the trustworthy, agent-accessible space biology infrastructure
Proposed as the solution for coordination and trust but introduced without details on structure, authority, or evidence of feasibility.

pith-pipeline@v0.9.1-grok · 5772 in / 1524 out tokens · 36287 ms · 2026-06-30T08:41:49.834524+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 13 canonical work pages · 1 internal anchor

[1]

Afshinnekoo, E. et al. Fundamental biological features of spaceflight: Advancing the field to enable deep-space exploration. Cell 183 , 1162–1184 (2020)

2020
[2]

Gebre, S. G. et al. NASA open science data repository: open science for life in space. Nucleic Acids Res. 53 , D1697–D1710 (2025)

2025
[3]

Otsuki, A. et al. ibSLS: A Biobank for Democratizing Access to Multi-Omics Data and Biospecimens from Spaceflight Research. bioRxiv (2025) doi:10.1101/2025.09.08.675003

work page doi:10.1101/2025.09.08.675003 2025
[4]

Moon Base Igniting Progress

NASA. Moon Base Igniting Progress. NP-2026-04-6806-HQ https://www.nasa.gov/wp-content/uploads/2026/04/moon-base-architecture-users-guide. pdf (2026)

2026
[5]

Overbey, E. G. et al. The Space Omics and Medical Atlas (SOMA) and international astronaut biobank. Nature 632 , 1145–1154 (2024)

2024
[6]

Into the deep

Dolgin, E. Into the deep. Science 391 , 436–441 (2026)

2026
[7]

Mason, C. E. et al. A second space age spanning omics, platforms and medicine across orbits. Nature 632 , 995–1008 (2024)

2024
[8]

Sanders, L. M. et al. Biological research and self-driving labs in deep space supported by artificial intelligence. Nat. Mach. Intell. 5 , 208–219 (2023)

2023
[9]

Scott, R. T. et al. Biomonitoring and precision health in deep space supported by artificial intelligence. Nat. Mach. Intell. 5 , 196–207 (2023)

2023
[10]

Moon to Mars Architecture Definition Document

NASA, Exploration Systems Development Mission Directorate. Moon to Mars Architecture Definition Document . https://www.nasa.gov/wp-content/uploads/2025/12/add-revision-c-20251211.pdf?emrc= 18 02371b

2025
[11]

Ilangovan, H. et al. Harmonizing heterogeneous transcriptomics datasets for machine learning-based analysis to identify spaceflown murine liver-specific changes. NPJ Microgravity 10 , 61 (2024)

2024
[12]

& Cline, M

Casaletto, J., Bernier, A., McDougall, R. & Cline, M. S. Federated analysis for privacy-preserving data sharing: A technical and legal primer. Annu. Rev. Genomics Hum. Genet. 24 , 347–368 (2023)

2023
[13]

Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187 , 6125–6151 (2024)

2024
[14]

Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618 , 616–624 (2023)

2023
[15]

Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21 , 1470–1480 (2024)

2024
[16]

Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637 , 319–326 (2025)

2025
[17]

K., Hernandez, J

Li, B., Saini, A. K., Hernandez, J. G. & Moore, J. H. Agentic AI and the rise of in silico team science in biomedical research. Nat. Biotechnol. (2026) doi:10.1038/s41587-026-03035-1

work page doi:10.1038/s41587-026-03035-1 2026
[18]

Soman, K. et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 40 , btae560 (2024)

2024
[19]

Caufield, H. et al. CurateGPT: A flexible language-model assisted biocuration tool. arXiv [cs.CL] (2024) doi:10.48550/arXiv.2411.00046

work page doi:10.48550/arxiv.2411.00046 2024
[20]

Nelson, C. A. et al. Knowledge network embedding of transcriptomic data from spaceflown mice uncovers signs and symptoms associated with terrestrial diseases. Life 11 , 42 (2021)

2021
[21]

& Zitnik, M

Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci. Data 10 , 67 (2023)

2023
[22]

Huang, K. et al. A foundation model for clinician-centered drug repurposing. medRxiv 19 (2024) doi:10.1101/2023.03.19.23287458

work page doi:10.1101/2023.03.19.23287458 2024
[23]

Casaletto, J. et al. Bridging Earth and space: A flexible and resilient federated learning framework deployed on the International Space Station. bioRxiv (2025) doi:10.1101/2025.01.14.633017

work page doi:10.1101/2025.01.14.633017 2025
[24]

Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10 , 12598 (2020)

2020
[25]

Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13 , 7346 (2022)

2022
[26]

Pereira, T. D. et al. SLEAP: A deep learning system for multi-animal pose tracking. Nat. Methods 19 , 486–495 (2022)

2022
[27]

Bohnslav, J. P. et al. DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. Elife 10 , (2021)

2021
[28]

Ma, J. et al. Segment anything in medical images. Nat. Commun. 15 , 654 (2024)

2024
[29]

Huang, A. S. et al. Artificial intelligence deep learning models to predict Spaceflight Associated Neuro-Ocular Syndrome. Am. J. Ophthalmol. 278 , 115–123 (2025)

2025
[30]

Casaletto, J. A. et al. Analyzing the relationship between gene expression and phenotype in space-flown mice using a causal inference machine learning ensemble. Sci. Rep. 15 , 2363 (2025)

2025
[31]

Gottesman, O. et al. Guidelines for reinforcement learning in healthcare. Nat. Med. 25 , 16–18 (2019)

2019
[32]

& Bez, J

Hiniduma, K., Byna, S. & Bez, J. L. Data readiness for AI: A 360-degree survey. ACM Comput. Surv. 57 , 1–39 (2025)

2025
[33]

Rutter, L. et al. A New Era for Space Life Science: International Standards for Space Omics Processing. Patterns (N Y) 1 , 100148 (2020)

2020
[34]

Manzano, A. et al. Enhancing European capabilities for application of multi-omics studies in biology and biomedicine space research. iScience 26 , 107289 (2023)

2023
[35]

Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3 , 160018 (2016). 20

2016
[36]

& Chafetz, H

Verhulst, S., Zahuranec, A. & Chafetz, H. Moving Toward the FAIR-R principles: Advancing AI-Ready Data. (2025) doi:10.2139/ssrn.5164337

work page doi:10.2139/ssrn.5164337 2025
[37]

Hiniduma, K., Ryan, D., Byna, S., Bez, J. L. & Madduri, R. AIDRIN 2.0: A framework to assess data readiness for AI. arXiv [cs.CY] (2025) doi:10.48550/arXiv.2505.18213

work page doi:10.48550/arxiv.2505.18213 2025
[38]

Clark, T. et al. AI-readiness for biomedical data: Bridge2AI recommendations. bioRxivorg (2024) doi:10.1101/2024.10.23.619844

work page doi:10.1101/2024.10.23.619844 2024
[39]

Rehm, H. L. et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1 , 100029 (2021)

2021
[40]

The White House https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission /

Launching the Genesis Mission. The White House https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission /. (2025)

2025
[41]

V., Gentemann, C

Costes, S. V., Gentemann, C. L., Platts, S. H. & Carnell, L. A. Biological horizons: pioneering open science in the cosmos. Nat. Commun. 15 , 4780 (2024)

2024
[42]

& Jacobsen, A

Mons, B., Schultes, E., Liu, F. & Jacobsen, A. The FAIR principles: First generation implementation choices and challenges. Data Intell. 2 , 1–9 (2020)

2020
[43]

Apache Parquet

Apache Software Foundation. Apache Parquet. Parquet https://parquet.apache.org/ (2026)

2026
[44]

Apache Arrow

Apache Software Foundation. Apache Arrow. Apache Arrow https://arrow.apache.org/ (2026)

2026
[45]

Sanders, L. M. et al. Batch effect correction methods for NASA GeneLab transcriptomic datasets. Frontiers in Astronomy and Space Sciences 10 , (2023)

2023
[46]

Casaletto, J. A. et al. Machine learning ensemble investigates age in the transcriptomic response to spaceflight in Murine mammary tissue: Observational study. JMIRx Bio 4 , e73041–e73041 (2026)

2026
[47]

Overbey, E. G. et al. Challenges and considerations for single-cell and spatially resolved transcriptomics sample collection during spaceflight. Cell Rep. Methods 2 , 100325 (2022)

2022
[48]

& Rocca-Serra, P

González-Beltrán, A., Maguire, E., Sansone, S.-A. & Rocca-Serra, P. linkedISA: 21 semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 15 Suppl 14 , S4 (2014)

2014
[49]

Caufield, J. H. et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics 40 , (2024)

2024
[50]

Moxon, S. A. T. et al. LinkML: an open data modeling framework. GigaScience 15 , (2026)

2026
[51]

(Github)

AI4Curation . (Github). https://github.com/ai4curation
[52]

https://www.w3.org/TR/prov-o/

PROV-O: The PROV Ontology. https://www.w3.org/TR/prov-o/
[53]

Wilkinson, S. R. et al. Applying the FAIR Principles to computational workflows. Sci. Data 12 , 328 (2025)

2025
[54]

Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35 , 316–319 (2017)

2017
[55]

Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18 , 1122–1127 (2021)

2021
[56]

SPD-41a: Scientific Information Policy for the Science Mission Directorate

NASA Science Mission Directorate. SPD-41a: Scientific Information Policy for the Science Mission Directorate . https://science.nasa.gov/wp-content/uploads/2023/08/smd-information-policy-spd-41a.p df (2022)

2023
[57]

Putman, T. E. et al. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Research 52 , D938–D949 (2024)

2024
[58]

Morris, J. H. et al. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics 39 , (2023)

2023
[59]

Şen, B. et al. CROssBARv2: A unified computational framework for heterogeneous biomedical data representation and LLM-driven exploration. bioRxiv (2026) doi: 10.64898/2026.04.12.718028

work page doi:10.64898/2026.04.12.718028 2026
[60]

Morton, K. et al. ROBOKOP: an abstraction layer and user interface for knowledge 22 graphs to support question answering. Bioinformatics 35 , 5382–5384 (2019)

2019
[61]

Bizon, C. et al. ROBOKOP KG and KGB: Integrated knowledge graphs from federated sources. J. Chem. Inf. Model. 59 , 4968–4973 (2019)

2019
[62]

Lobentanzer, S. et al. Democratizing knowledge representation with BioCypher. Nat. Biotechnol. 41 , 1056–1059 (2023)

2023
[63]

Kuehl, M. et al. BioContextAI is a community hub for agentic biomedical systems. Nat. Biotechnol. 43 , 1755–1757 (2025)

2025
[64]

Makarov, V. A. et al. Natural language querying of biological databases with large language models. Drug Discov. Today 31 , 104654 (2026)

2026
[65]

Edge, D. et al. From local to global: A graph RAG approach to query-focused summarization. arXiv [cs.CL] (2024) doi:10.48550/arXiv.2404.16130

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130 2024
[66]

The crisis of biomedical foundation models

Wang, F. The crisis of biomedical foundation models. J. Biomed. Inform. 171 , 104917 (2025)

2025
[67]

Huang, K. et al. Biomni: A general-purpose biomedical AI agent. bioRxivorg (2025) doi:10.1101/2025.05.30.656746

work page doi:10.1101/2025.05.30.656746 2025
[68]

& Donoviel, D

Wu, J., Strangman, G., Bokhari, R. & Donoviel, D. Human and Environmental Research Matrix for Exploration of Space (HERMES) Project. in (International Astronautical Federation, 2025)

2025
[69]

Rutter, L. A. et al. Astronaut omics and the impact of space on the human body at scale. Nat. Commun. 15 , 4952 (2024)

2024
[70]

Camera, A. et al. Aging and putative frailty biomarkers are altered by spaceflight. Sci. Rep. 14 , 13098 (2024)

2024
[71]

D., Chen, Y

Li, R., Romano, J. D., Chen, Y. & Moore, J. H. Centralized and federated models for the analysis of clinical data. Annu. Rev. Biomed. Data Sci. 7 , 179–199 (2024)

2024
[72]

Casaletto, J. et al. Using federated learning to overcome data gravity in space. in 2022 ASGSR Annual Conference (2022)

2022
[73]

A., Dunbar, B

Bloomfield, S. A., Dunbar, B. J., Schmit, C. D., Sawyer, A. J. & Charles, J. B. Developing an international database on long-term health effects of spaceflight. Acta Astronaut. 23 198 , 347–353 (2022)

2022
[74]

Shiba, D. et al. Development of new experimental platform ‘MARS’-Multiple Artificial-gravity Research System-to elucidate the impacts of micro/partial gravity on mice. Sci. Rep. 7 , 10837 (2017)

2017
[75]

Rambla, J. et al. Beacon v2 and Beacon networks: A ‘lingua franca’ for federated data discovery in biomedical genomics, and beyond. Hum. Mutat. 43 , 791–799 (2022)

2022
[76]

Akhtar, M. et al. Croissant: A Metadata Format for ML-Ready Datasets. in Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning 1–6 (ACM, New York, NY, USA, 2024)

2024
[77]

Gebru, T. et al. Datasheets for datasets. Commun. ACM 64 , 86–92 (2021)

2021
[78]

Hespeels, B. et al. Rotifers in Space: Transcriptomic Response of the bdelloid rotifer Adineta vaga aboard the International Space Station. NASA GeneLab https://doi.org/10.26030/K36D-D232 (2025)

work page doi:10.26030/k36d-d232 2025
[79]

Moris, V. C. et al. Rotifers in space: transcriptomic response of the bdelloid rotifer Adineta vaga aboard the International Space Station. BMC Biol. 23 , 182 (2025)

2025
[80]

Qin, C. et al. SciHorizon: Benchmarking AI-for-science readiness from scientific data to large language models. in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 5754–5765 (ACM, New York, NY, USA, 2025)

2025

Showing first 80 references.

[1] [1]

Afshinnekoo, E. et al. Fundamental biological features of spaceflight: Advancing the field to enable deep-space exploration. Cell 183 , 1162–1184 (2020)

2020

[2] [2]

Gebre, S. G. et al. NASA open science data repository: open science for life in space. Nucleic Acids Res. 53 , D1697–D1710 (2025)

2025

[3] [3]

Otsuki, A. et al. ibSLS: A Biobank for Democratizing Access to Multi-Omics Data and Biospecimens from Spaceflight Research. bioRxiv (2025) doi:10.1101/2025.09.08.675003

work page doi:10.1101/2025.09.08.675003 2025

[4] [4]

Moon Base Igniting Progress

NASA. Moon Base Igniting Progress. NP-2026-04-6806-HQ https://www.nasa.gov/wp-content/uploads/2026/04/moon-base-architecture-users-guide. pdf (2026)

2026

[5] [5]

Overbey, E. G. et al. The Space Omics and Medical Atlas (SOMA) and international astronaut biobank. Nature 632 , 1145–1154 (2024)

2024

[6] [6]

Into the deep

Dolgin, E. Into the deep. Science 391 , 436–441 (2026)

2026

[7] [7]

Mason, C. E. et al. A second space age spanning omics, platforms and medicine across orbits. Nature 632 , 995–1008 (2024)

2024

[8] [8]

Sanders, L. M. et al. Biological research and self-driving labs in deep space supported by artificial intelligence. Nat. Mach. Intell. 5 , 208–219 (2023)

2023

[9] [9]

Scott, R. T. et al. Biomonitoring and precision health in deep space supported by artificial intelligence. Nat. Mach. Intell. 5 , 196–207 (2023)

2023

[10] [10]

Moon to Mars Architecture Definition Document

NASA, Exploration Systems Development Mission Directorate. Moon to Mars Architecture Definition Document . https://www.nasa.gov/wp-content/uploads/2025/12/add-revision-c-20251211.pdf?emrc= 18 02371b

2025

[11] [11]

Ilangovan, H. et al. Harmonizing heterogeneous transcriptomics datasets for machine learning-based analysis to identify spaceflown murine liver-specific changes. NPJ Microgravity 10 , 61 (2024)

2024

[12] [12]

& Cline, M

Casaletto, J., Bernier, A., McDougall, R. & Cline, M. S. Federated analysis for privacy-preserving data sharing: A technical and legal primer. Annu. Rev. Genomics Hum. Genet. 24 , 347–368 (2023)

2023

[13] [13]

Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187 , 6125–6151 (2024)

2024

[14] [14]

Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618 , 616–624 (2023)

2023

[15] [15]

Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21 , 1470–1480 (2024)

2024

[16] [16]

Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637 , 319–326 (2025)

2025

[17] [17]

K., Hernandez, J

Li, B., Saini, A. K., Hernandez, J. G. & Moore, J. H. Agentic AI and the rise of in silico team science in biomedical research. Nat. Biotechnol. (2026) doi:10.1038/s41587-026-03035-1

work page doi:10.1038/s41587-026-03035-1 2026

[18] [18]

Soman, K. et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 40 , btae560 (2024)

2024

[19] [19]

Caufield, H. et al. CurateGPT: A flexible language-model assisted biocuration tool. arXiv [cs.CL] (2024) doi:10.48550/arXiv.2411.00046

work page doi:10.48550/arxiv.2411.00046 2024

[20] [20]

Nelson, C. A. et al. Knowledge network embedding of transcriptomic data from spaceflown mice uncovers signs and symptoms associated with terrestrial diseases. Life 11 , 42 (2021)

2021

[21] [21]

& Zitnik, M

Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci. Data 10 , 67 (2023)

2023

[22] [22]

Huang, K. et al. A foundation model for clinician-centered drug repurposing. medRxiv 19 (2024) doi:10.1101/2023.03.19.23287458

work page doi:10.1101/2023.03.19.23287458 2024

[23] [23]

Casaletto, J. et al. Bridging Earth and space: A flexible and resilient federated learning framework deployed on the International Space Station. bioRxiv (2025) doi:10.1101/2025.01.14.633017

work page doi:10.1101/2025.01.14.633017 2025

[24] [24]

Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10 , 12598 (2020)

2020

[25] [25]

Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13 , 7346 (2022)

2022

[26] [26]

Pereira, T. D. et al. SLEAP: A deep learning system for multi-animal pose tracking. Nat. Methods 19 , 486–495 (2022)

2022

[27] [27]

Bohnslav, J. P. et al. DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. Elife 10 , (2021)

2021

[28] [28]

Ma, J. et al. Segment anything in medical images. Nat. Commun. 15 , 654 (2024)

2024

[29] [29]

Huang, A. S. et al. Artificial intelligence deep learning models to predict Spaceflight Associated Neuro-Ocular Syndrome. Am. J. Ophthalmol. 278 , 115–123 (2025)

2025

[30] [30]

Casaletto, J. A. et al. Analyzing the relationship between gene expression and phenotype in space-flown mice using a causal inference machine learning ensemble. Sci. Rep. 15 , 2363 (2025)

2025

[31] [31]

Gottesman, O. et al. Guidelines for reinforcement learning in healthcare. Nat. Med. 25 , 16–18 (2019)

2019

[32] [32]

& Bez, J

Hiniduma, K., Byna, S. & Bez, J. L. Data readiness for AI: A 360-degree survey. ACM Comput. Surv. 57 , 1–39 (2025)

2025

[33] [33]

Rutter, L. et al. A New Era for Space Life Science: International Standards for Space Omics Processing. Patterns (N Y) 1 , 100148 (2020)

2020

[34] [34]

Manzano, A. et al. Enhancing European capabilities for application of multi-omics studies in biology and biomedicine space research. iScience 26 , 107289 (2023)

2023

[35] [35]

Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3 , 160018 (2016). 20

2016

[36] [36]

& Chafetz, H

Verhulst, S., Zahuranec, A. & Chafetz, H. Moving Toward the FAIR-R principles: Advancing AI-Ready Data. (2025) doi:10.2139/ssrn.5164337

work page doi:10.2139/ssrn.5164337 2025

[37] [37]

Hiniduma, K., Ryan, D., Byna, S., Bez, J. L. & Madduri, R. AIDRIN 2.0: A framework to assess data readiness for AI. arXiv [cs.CY] (2025) doi:10.48550/arXiv.2505.18213

work page doi:10.48550/arxiv.2505.18213 2025

[38] [38]

Clark, T. et al. AI-readiness for biomedical data: Bridge2AI recommendations. bioRxivorg (2024) doi:10.1101/2024.10.23.619844

work page doi:10.1101/2024.10.23.619844 2024

[39] [39]

Rehm, H. L. et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1 , 100029 (2021)

2021

[40] [40]

The White House https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission /

Launching the Genesis Mission. The White House https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission /. (2025)

2025

[41] [41]

V., Gentemann, C

Costes, S. V., Gentemann, C. L., Platts, S. H. & Carnell, L. A. Biological horizons: pioneering open science in the cosmos. Nat. Commun. 15 , 4780 (2024)

2024

[42] [42]

& Jacobsen, A

Mons, B., Schultes, E., Liu, F. & Jacobsen, A. The FAIR principles: First generation implementation choices and challenges. Data Intell. 2 , 1–9 (2020)

2020

[43] [43]

Apache Parquet

Apache Software Foundation. Apache Parquet. Parquet https://parquet.apache.org/ (2026)

2026

[44] [44]

Apache Arrow

Apache Software Foundation. Apache Arrow. Apache Arrow https://arrow.apache.org/ (2026)

2026

[45] [45]

Sanders, L. M. et al. Batch effect correction methods for NASA GeneLab transcriptomic datasets. Frontiers in Astronomy and Space Sciences 10 , (2023)

2023

[46] [46]

Casaletto, J. A. et al. Machine learning ensemble investigates age in the transcriptomic response to spaceflight in Murine mammary tissue: Observational study. JMIRx Bio 4 , e73041–e73041 (2026)

2026

[47] [47]

Overbey, E. G. et al. Challenges and considerations for single-cell and spatially resolved transcriptomics sample collection during spaceflight. Cell Rep. Methods 2 , 100325 (2022)

2022

[48] [48]

& Rocca-Serra, P

González-Beltrán, A., Maguire, E., Sansone, S.-A. & Rocca-Serra, P. linkedISA: 21 semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 15 Suppl 14 , S4 (2014)

2014

[49] [49]

Caufield, J. H. et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics 40 , (2024)

2024

[50] [50]

Moxon, S. A. T. et al. LinkML: an open data modeling framework. GigaScience 15 , (2026)

2026

[51] [51]

(Github)

AI4Curation . (Github). https://github.com/ai4curation

[52] [52]

https://www.w3.org/TR/prov-o/

PROV-O: The PROV Ontology. https://www.w3.org/TR/prov-o/

[53] [53]

Wilkinson, S. R. et al. Applying the FAIR Principles to computational workflows. Sci. Data 12 , 328 (2025)

2025

[54] [54]

Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35 , 316–319 (2017)

2017

[55] [55]

Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18 , 1122–1127 (2021)

2021

[56] [56]

SPD-41a: Scientific Information Policy for the Science Mission Directorate

NASA Science Mission Directorate. SPD-41a: Scientific Information Policy for the Science Mission Directorate . https://science.nasa.gov/wp-content/uploads/2023/08/smd-information-policy-spd-41a.p df (2022)

2023

[57] [57]

Putman, T. E. et al. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Research 52 , D938–D949 (2024)

2024

[58] [58]

Morris, J. H. et al. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics 39 , (2023)

2023

[59] [59]

Şen, B. et al. CROssBARv2: A unified computational framework for heterogeneous biomedical data representation and LLM-driven exploration. bioRxiv (2026) doi: 10.64898/2026.04.12.718028

work page doi:10.64898/2026.04.12.718028 2026

[60] [60]

Morton, K. et al. ROBOKOP: an abstraction layer and user interface for knowledge 22 graphs to support question answering. Bioinformatics 35 , 5382–5384 (2019)

2019

[61] [61]

Bizon, C. et al. ROBOKOP KG and KGB: Integrated knowledge graphs from federated sources. J. Chem. Inf. Model. 59 , 4968–4973 (2019)

2019

[62] [62]

Lobentanzer, S. et al. Democratizing knowledge representation with BioCypher. Nat. Biotechnol. 41 , 1056–1059 (2023)

2023

[63] [63]

Kuehl, M. et al. BioContextAI is a community hub for agentic biomedical systems. Nat. Biotechnol. 43 , 1755–1757 (2025)

2025

[64] [64]

Makarov, V. A. et al. Natural language querying of biological databases with large language models. Drug Discov. Today 31 , 104654 (2026)

2026

[65] [65]

Edge, D. et al. From local to global: A graph RAG approach to query-focused summarization. arXiv [cs.CL] (2024) doi:10.48550/arXiv.2404.16130

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130 2024

[66] [66]

The crisis of biomedical foundation models

Wang, F. The crisis of biomedical foundation models. J. Biomed. Inform. 171 , 104917 (2025)

2025

[67] [67]

Huang, K. et al. Biomni: A general-purpose biomedical AI agent. bioRxivorg (2025) doi:10.1101/2025.05.30.656746

work page doi:10.1101/2025.05.30.656746 2025

[68] [68]

& Donoviel, D

Wu, J., Strangman, G., Bokhari, R. & Donoviel, D. Human and Environmental Research Matrix for Exploration of Space (HERMES) Project. in (International Astronautical Federation, 2025)

2025

[69] [69]

Rutter, L. A. et al. Astronaut omics and the impact of space on the human body at scale. Nat. Commun. 15 , 4952 (2024)

2024

[70] [70]

Camera, A. et al. Aging and putative frailty biomarkers are altered by spaceflight. Sci. Rep. 14 , 13098 (2024)

2024

[71] [71]

D., Chen, Y

Li, R., Romano, J. D., Chen, Y. & Moore, J. H. Centralized and federated models for the analysis of clinical data. Annu. Rev. Biomed. Data Sci. 7 , 179–199 (2024)

2024

[72] [72]

Casaletto, J. et al. Using federated learning to overcome data gravity in space. in 2022 ASGSR Annual Conference (2022)

2022

[73] [73]

A., Dunbar, B

Bloomfield, S. A., Dunbar, B. J., Schmit, C. D., Sawyer, A. J. & Charles, J. B. Developing an international database on long-term health effects of spaceflight. Acta Astronaut. 23 198 , 347–353 (2022)

2022

[74] [74]

Shiba, D. et al. Development of new experimental platform ‘MARS’-Multiple Artificial-gravity Research System-to elucidate the impacts of micro/partial gravity on mice. Sci. Rep. 7 , 10837 (2017)

2017

[75] [75]

Rambla, J. et al. Beacon v2 and Beacon networks: A ‘lingua franca’ for federated data discovery in biomedical genomics, and beyond. Hum. Mutat. 43 , 791–799 (2022)

2022

[76] [76]

Akhtar, M. et al. Croissant: A Metadata Format for ML-Ready Datasets. in Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning 1–6 (ACM, New York, NY, USA, 2024)

2024

[77] [77]

Gebru, T. et al. Datasheets for datasets. Commun. ACM 64 , 86–92 (2021)

2021

[78] [78]

Hespeels, B. et al. Rotifers in Space: Transcriptomic Response of the bdelloid rotifer Adineta vaga aboard the International Space Station. NASA GeneLab https://doi.org/10.26030/K36D-D232 (2025)

work page doi:10.26030/k36d-d232 2025

[79] [79]

Moris, V. C. et al. Rotifers in space: transcriptomic response of the bdelloid rotifer Adineta vaga aboard the International Space Station. BMC Biol. 23 , 182 (2025)

2025

[80] [80]

Qin, C. et al. SciHorizon: Benchmarking AI-for-science readiness from scientific data to large language models. in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 5754–5765 (ACM, New York, NY, USA, 2025)

2025