From PhysioNet to Foundation Models -- A History and Potential Futures
Pith reviewed 2026-05-15 22:09 UTC · model grok-4.3
The pith
PhysioNet's history from mailing tapes to large databases shows how to build foundation models on physiological data while managing carbon footprints and reproducibility issues.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The PhysioNet Resource evolved from mailing magnetic tapes and compact discs of curated recordings to high-speed downloads of comprehensive hospital databases, and this trajectory shows that open competitions and edge computing can solve the main problems of carbon footprint, incentives, and repeatability when moving to foundation models on physiological data.
What carries the argument
The PhysioNet Challenges, which operate as public competitions that drive innovation while enforcing validation and repeatability on physiological signal tasks.
If this is right
- Public challenges will raise the repeatability of machine learning results on physiological recordings.
- Tiny-ML and edge computing will cut the carbon emissions tied to training and running large physiological models.
- Revised funding and incentive structures will sustain long-term open data sharing.
- Consistent open-access rules will speed up research that uses massive physiological databases.
Where Pith is reading between the lines
- The same open-competition structure could transfer to sensor data in other medical areas outside cardiology.
- Edge models might enable real-time physiological analysis in settings without reliable internet or large servers.
- Adopting these practices early could establish norms that avoid later backlash over AI energy use in health research.
- Direct comparisons of carbon costs for models built with versus without the suggested mitigations would provide a clear test.
Load-bearing premise
The rapid growth of foundation models on physiological data will deliver substantial benefits if the challenges of carbon footprint, incentives, and repeatability are addressed through open challenges, Tiny-ML, and open-access practices without new empirical validation of those fixes.
What would settle it
A foundation model trained on PhysioNet data through the challenge-based open approach that still shows high energy consumption during training or produces results that independent teams cannot reproduce would indicate the proposed directions fall short.
Figures
read the original abstract
Over the last 35 years, the sharing of medical data and models for research has evolved from sneakernet to the internet - from mailing magnetic tapes and compact discs of a handful of well-curated recordings, to the high-speed download of relatively comprehensive hospital databases. More recently, the fervor around the potential for modern machine learning and 'AI' to catapult us into the next industrial revolution has led to a seemingly insatiable desire to pump almost any source of data into large models. Although this has great potential, it also presents a whole set of new challenges. In this article I examine these trends over the last 30 years, drawing on examples from cardiology, one of the oldest data-intensive fields that is undergoing a renaissance via machine learning. From the early days of computerized cardiology, the Research Resource for Complex Physiologic Signals (PhysioNet) has been at the cutting edge of this field. This article, therefore, includes much of the Resource's history and the contributions drawn from 25 years of firsthand experience of co-developing elements of the Resource with its founders. I identify the most promising future directions for the PhysioNet Resource, and more generally, the growing issues and opportunities around dissemination and use of massive physiological databases, associated open access code, and public competitions, along with potential solutions to the key issues facing our field. Topics range from how we should approach foundation models in the context of the rapidly growing AI carbon footprint, to the potential of Tiny-ML and edge computing. I also cover issues around prizes and incentives, funding models, and scientific repeatability, as well as how we might address these issues by leveraging the PhysioNet Challenges, consistent with the philosophy of open-access from the early days of the PhysioNet Resource.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript traces the evolution of physiological data sharing over the past 35 years, from physical media to internet-based resources like PhysioNet, drawing on the author's 25 years of direct involvement; it then identifies promising future directions for PhysioNet and the broader field, including foundation models on physiological data, Tiny-ML and edge computing, and approaches to challenges such as carbon footprint, incentives, prizes, funding models, and scientific repeatability via open-access competitions.
Significance. The historical synthesis grounded in firsthand experience offers a credible narrative of the field's development in cardiology and data-intensive research; the forward-looking discussion highlights timely issues around large-scale AI on physiological databases and suggests community-oriented solutions, which could inform sustainable practices if the identified directions are pursued.
minor comments (2)
- [Abstract] Abstract: the references to 'last 35 years' and 'last 30 years' are not aligned with the title's 25-year PhysioNet focus; a single consistent timeframe or explicit mapping would improve clarity.
- [Future directions] The forward-looking sections frame potential solutions (e.g., leveraging PhysioNet Challenges for repeatability) as opportunities without citing specific prior challenge outcomes or metrics that demonstrate their effectiveness.
Simulated Author's Rebuttal
We thank the referee for the positive summary of the manuscript and the recommendation for minor revision. The referee accurately captures the paper's historical synthesis of physiological data sharing over 35 years, its grounding in firsthand experience with PhysioNet, and the discussion of future directions including foundation models, Tiny-ML, sustainability, incentives, and open competitions.
Circularity Check
No significant circularity in narrative review
full rationale
The manuscript is a historical review and opinion piece synthesizing 25 years of PhysioNet experience with forward-looking discussion on data sharing, foundation models, Tiny-ML, incentives, and repeatability. It advances no equations, derivations, fitted parameters, quantitative predictions, or formal claims that reduce to self-referential inputs by construction. All statements are framed as narrative synthesis and identification of opportunities rather than load-bearing derivations or self-citation chains that substitute for independent evidence, rendering the text self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Open sharing of physiological data accelerates research and improves model development
Reference graph
Works this paper leans on
-
[1]
Gari D Clifford. Past, Present and Future Challenges in Sharing Science: From Phy- sioNet to Foundation Models.51st Computing in Cardiology, Karlsruhe, Germany, 51:1–4, 2024. [2]{Ian T.}Nabney.Netlab: Algorithms for Pattern Recognition. Springer, United States, 2002
work page 2024
-
[2]
G.B. Moody and R.G. Mark. The MIT-BIH Arrhythmia Database on CD-ROM and software for use with it. InProceedings Computers in Cardiology, volume 17, pages 185–188, 1990
work page 1990
-
[3]
The World Wide Web project, 1991
Tim Berners-Lee. The World Wide Web project, 1991. Accessed: 2025-05-24
work page 1991
-
[4]
People involved in the World Wide Web project, 1991
Tim Berners-Lee. People involved in the World Wide Web project, 1991. Accessed: 2025-05-24
work page 1991
-
[5]
Kevin A. Bryan and Yasin Ozcan. The impact of open access mandates on invention. The Review of Economics and Statistics, 103(5):954–967, 11 2021
work page 2021
-
[6]
The economic impact of open science: A scoping review, Feb 2025
Lena Tsipouri, Sofia Liarti, Silvia Vignetti, and Izabella M Grapengiesser. The economic impact of open science: A scoping review, Feb 2025
work page 2025
-
[7]
G.B. Moody and R.G. Mark. The impact of the MIT-BIH Arrhythmia Database.IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001
work page 2001
-
[8]
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000. 50
work page 2000
-
[9]
Danny J. J. Wang, Kay Jann, Chunlan Fan, Yi Qiao, Yu-Feng Zang, Hanzhang Lu, and Yihong Yang. Neurophysiological basis of multi-scale entropy of brain complexity and its relationship with functional connectivity.Frontiers in Neuroscience, 12:352, May
-
[10]
Erratum in: Front Neurosci. 2018 Jul 30;12:539. doi: 10.3389/fnins.2018.00539
-
[11]
Gari D. Clifford and Matthew A. Reyna. The George B. Moody PhysioNet Challenges. Online. Accessed: 2025/11/07
work page 2025
-
[12]
Gari D Clifford, Chengyu Liu, Benjamin Moody, Li-wei H Lehman, Ikaro Silva, Qiao Li, A E Johnson, and Roger G Mark. AF Classification from a Short Single Lead ECG Recording: the PhysioNet/Computing in Cardiology Challenge 2017. In2017 Computing in Cardiology (CinC), volume 44, pages 1–4. IEEE, 2017
work page 2017
-
[13]
Reyna, Chris Josef, Russell Jeter, Supreeth P
Matthew A. Reyna, Chris Josef, Russell Jeter, Supreeth P. Shashikumar, M. Brandon Westover, Shamim Nemati, Gari D. Clifford, and Ashish Sharma. Early Prediction of Sepsis from Clinical Data: the PhysioNet/Computing in Cardiology Challenge 2019. Critical Care Medicine, 48:210–217, 2019
work page 2019
-
[14]
Matthew A. * Reyna, Edilberto * Amorim, Reza Sameni, James Weigle, Andoni Elola, Ali Bahrami Rad, Salman Seyedi, Hyeokhyen Kwon, Wei-Long Zheng, Mohammad Ghassemi, Michel J.A.M. van Putten, Jeannette Hofmeijer, Nicolas Gaspard, Adithya Sivaraju, Susan Herman, Jong W. Lee, M. Brandon ** Westover, and Gari D.** Clifford. Predicting neurological recovery fro...
work page 2023
-
[15]
Matthew A. Reyna, Deepanshi, James Weigle, Zuzana Koscova, Andoni Elola, Salman Seyedi, Kiersten Campbell, Gari D. Clifford, and Reza Sameni. Digitization and Classi- fication of ECG Images: The George B. Moody PhysioNet Challenge 2024.Computing in Cardiology, 51:1–4, 2024. 51
work page 2024
-
[16]
Matthew A Reyna, Elaine O Nsoesie, and Gari D Clifford. Rethinking algorithm per- formance metrics for artificial intelligence in diagnostic medicine.JAMA, 328:329–330,
-
[17]
Publisher: American Medical Association
-
[18]
Detection of Chagas disease from the ECG: The George B
Matthew A Reyna, Z Koscova, Jan Pavlus, James Weigle, Soheil Saghafi, Paulo Gomes, Andoni Elola, Mohammad Sina Hassannia, Kiersten Campbell, A Bahrami Rad, Antˆ onio H Ribeiro, AL Ribeiro, Reza Sameni, and Gari D Clifford. Detection of Chagas disease from the ECG: The George B. Moody PhysioNet Challenge 2025. In 52nd Computing in Cardiology, S˜ ao Paulo, ...
work page 2025
-
[19]
Kaggle Chronicles: 15 Years of Competitions Com- munity
BwandoWando and Kevin B¨ onisch. Kaggle Chronicles: 15 Years of Competitions Com- munity. Online, July 2025. Kaggle Blog. Accessed: 2025-12-15
work page 2025
-
[20]
Matthew A. Reyna, Deepanshi, James Weigle, Zuzana Koscova, Kiersten Campbell, Salman Seyedi, Andoni Elola, Ali Bahrami Rad, Amit J Shah, Neal K. Bhatia, Yao Yan, Sohier Dane, Addison Howard, Gari D. Clifford, and Reza Sameni. PhysioNet - Digitization of ECG Images. Online, 2025. The PhysioNet-Kaggle Competition. Accessed: 2025-12-15
work page 2025
-
[21]
Welcome Kaggle to Google Cloud
Fei-Fei Li. Welcome Kaggle to Google Cloud. Google Cloud Blog, March 2017. Accessed: December 21 2025
work page 2017
-
[22]
Sanghack Lee and Shi Young Lee. Prize allocation in contests with size effect through prizes.Theoretical Economics Letters, 2(2):212–215, 2012
work page 2012
-
[23]
Sigurdson.Three essays on the impact of inducement prizes on innovation
K. Sigurdson.Three essays on the impact of inducement prizes on innovation. PhD thesis, University of Toronto, 2021
work page 2021
-
[24]
How Effective Are Prizes at Spurring Innovation? Online, November 2022
Jenny Kudymowa, Tom Hird, and Bruce Tsai. How Effective Are Prizes at Spurring Innovation? Online, November 2022. Accessed 2024/08/21. 52
work page 2022
-
[25]
Government incentive prizes.https://fas.org/ publication/government-incentive, 2025
Federation of American Scientists. Government incentive prizes.https://fas.org/ publication/government-incentive, 2025
work page 2025
-
[26]
Lever for change: Study of prizes and competitions - executive summary
Lever for Change. Lever for change: Study of prizes and competitions - executive summary. Technical report, Lever for Change, 2022
work page 2022
-
[27]
Why competitions can actually level the philanthropic playing field
Cecilia Conrad. Why competitions can actually level the philanthropic playing field. The Chronicle of Philanthropy, 2025
work page 2025
-
[28]
Richard Fullerton and Preston McAfee. The impact of prize size on innovation: Evidence from economic experiments.Journal of Economic Behavior & Organization, 57(3):273– 290, 2005
work page 2005
-
[29]
How the size of a prize affected technology innovation competition
National Bureau of Economic Research. How the size of a prize affected technology innovation competition. Online, Apr 2020. Working paper 26737. Accessed December 14 2025
work page 2020
-
[30]
Zhongzhi Liu, Marc Hatton, Thomas Kull, Kevin Dooley, and Adegoke Oke. Is a large award truly attractive to solvers? the impact of award size on crowd size in innovation contests.Journal of Operations Management, 67:420–449, 12 2020
work page 2020
-
[31]
Marcy E. Gallo. Federal prize competitions. CRS Report R45271, Congressional Re- search Service, Library of Congress, April 2020. Referenced Legislation: P.L.111-358
work page 2020
-
[32]
Deep Filtering with DNN, CNN and RNN, 2021
Bin Xie and Qing Zhang. Deep Filtering with DNN, CNN and RNN, 2021
work page 2021
-
[33]
What are Foundation Models? Online, 09 May 2022
Mike Murphy. What are Foundation Models? Online, 09 May 2022. Accessed: 2024/08/21
work page 2022
-
[34]
AI Foundation Models: Initial Report, 18 Sept
CMA. AI Foundation Models: Initial Report, 18 Sept. 2023. Competition and Markets Authority, UK Government. Accessed: 2024/08/21. 53
work page 2023
-
[35]
Foundation models: An explainer
Elliot Jones. Foundation models: An explainer. Online. Ada Lovelace Institute. Ac- cessed: 2024/08/21
work page 2024
-
[36]
Rethinking imagenet pre-training
Kaiming He, Ross Girshick, and Piotr Dollar. Rethinking imagenet pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
work page 2019
-
[37]
Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, and Samy Bengio. Transfusion: Un- derstanding transfer learning for medical imaging.Advances in neural information pro- cessing systems, 32, 2019
work page 2019
-
[38]
Rethinking pre-training and self-training
Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Dogus Cubuk, and Quoc Le. Rethinking pre-training and self-training. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 3833–3845. Curran Associates, Inc., 2020
work page 2020
-
[39]
What is being transferred in transfer learning? In H
Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. What is being transferred in transfer learning? In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 512–523. Curran Associates, Inc., 2020
work page 2020
-
[40]
Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stef...
work page 2022
-
[41]
Cuong V. Nguyen and Cuong D. Do. Transfer learning in ECG diagnosis: Is it effective? PLOS ONE, 20(5):e0316043, May 2025
work page 2025
-
[42]
Ribeiro, Manoel Horta Ribeiro, Gabriela M
Antˆ onio H. Ribeiro, Manoel Horta Ribeiro, Gabriela M. M. Paix˜ ao, Derick M. Oliveira, Paulo R. Gomes, J´ essica A. Canazart, Milton P. S. Ferreira, Carl R. Andersson, Pe- ter W. Macfarlane, Wagner Meira Jr., Thomas B. Sch¨ on, and Antonio Luiz P. Ribeiro. Automatic diagnosis of the 12-lead ECG using a deep neural network.Nature Commu- nications, 11:1760, 2020
work page 2020
-
[43]
V. Moura Junior, M. Reyna, S. Hong, A. Gupta, M. Ghanta, R. Sameni, J. Rosand, A. Aguirre, Q. Li, G. Clifford, and M. B. Westover. Harvard-Emory ECG Database, 2023. 55
work page 2023
-
[44]
Polity Press, Cambridge, UK ; Medford, MA, 2019
Ruha Benjamin.Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press, Cambridge, UK ; Medford, MA, 2019
work page 2019
-
[45]
Philosophy of the GNU Project - What is Free Software? Online, n.d
The Free Software Foundation. Philosophy of the GNU Project - What is Free Software? Online, n.d. Accessed: 2025-11-09
work page 2025
-
[46]
The growing energy footprint of artificial intelligence.Joule, 7(10):2191– 2194, October 2023
Alex de Vries. The growing energy footprint of artificial intelligence.Joule, 7(10):2191– 2194, October 2023
work page 2023
-
[47]
Carbon Emissions and Large Neural Network Training, 2021
David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon Emissions and Large Neural Network Training, 2021
work page 2021
- [48]
-
[49]
Koen Groenland.Introduction to Quantum Computing for Business. Springer, 1st edition, 2021
work page 2021
-
[50]
Amara’s Law and its place in the future of tech.IEEE Computer Society Tech News, September 2024
Pohan Lin. Amara’s Law and its place in the future of tech.IEEE Computer Society Tech News, September 2024. Accessed: 2025-12-14
work page 2024
-
[51]
Kelsey Russo. The digital life of henrietta lacks: Reforming the regulation of genetic material.Journal of Legal Medicine, 38(3-4):449–470, 2018. PMID: 31307345
work page 2018
-
[52]
G. Mariethoz, F. Herman, and A. Dreiss. The imaginary carrot: no correlation between raising funds and research productivity in geosciences.Scientometrics, 126:2401–2407, 2021
work page 2021
-
[53]
J. Nicholson and J. Ioannidis. Conform and be funded.Nature, 492:34–36, 2012
work page 2012
-
[54]
J. P. Ioannidis. More time for research: fund people not projects.Nature, 477(7366):529– 531, 2011. 56
work page 2011
-
[55]
L. Maier-Hein, M. Eisenmann, A. Reinke, et al. Why rankings of biomedical image analysis competitions should be interpreted with care.Nature Communications, 9:5217, 2018
work page 2018
- [56]
-
[57]
J. P. Mart´ ınez, O. Pahlm, M. Ringborn, S. Warren, P. Laguna, and L. S¨ ornmo. The STAFF III Database: ECGs Recorded During Acutely Induced Myocardial Ischemia. InComputing in Cardiology, volume 44, pages 266–133, 2017
work page 2017
- [58]
- [59]
-
[60]
R. Bousseljot, D. Kreiseler, and A. Schnabel. Nutzung der EKG-Signaldatenbank CAR- DIODAT der PTB ¨ uber das Internet.Biomedizinische Technik / Biomedical Engineer- ing, 40:317–318, 1995
work page 1995
-
[61]
Erick A Perez Alday, Annie Gu, Amit J Shah, Chad Robichaux, An-Kwok Ian Wong, Chengyu Liu, Feifei Liu, Ali Bahrami Rad, Andoni Elola, Salman Seyedi, Qiao Li, Ashish Sharma, Gari D Clifford, and Matthew A Reyna. Classification of 12-lead ECGs: the PhysioNet/Computing in Cardiology Challenge 2020.Physiological Measurement, 41(12), January 2021
work page 2020
-
[62]
C. S. Cardoso, E. C. Sabino, C. D. Oliveira, L. C. de Oliveira, A. M. Ferreira, E. Cunha- Neto, A. L. Bierrenbach, J. E. Ferreira, D. S. Haikal, A. L. Reingold, and A. L. Ribeiro. 57 Longitudinal Study of Patients with Chronic Chagas Cardiomyopathy in Brazil (SaMi- Trop Project): A Cohort Profile.BMJ Open, 6(5):e011181, May 2016
work page 2016
- [63]
-
[64]
Antˆ onio H. Ribeiro et al. Automatic diagnosis of the 12-lead ECG using a deep neural network.Nature Communications, 11(1):1760, 2020
work page 2020
- [65]
-
[66]
H. Liu, D. Chen, D. Chen, X. Zhang, H. Li, L. Bian, M. Shu, and Y. Wang. A Large- Scale Multi-Label 12-Lead Electrocardiogram Database with Standardized Diagnostic Statements.Scientific Data, 9(1):272, June 2022
work page 2022
-
[67]
B. Gow, T. Pollard, L. A. Nathanson, A. Johnson, B. Moody, C. Fernandes, N. Green- baum, J. W. Waks, P. Eslami, T. Carbonati, A. Chaudhari, E. Herbst, D. Moukheiber, S. Berkowitz, R. Mark, and S. Horng. MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset (version 1.0). PhysioNet, 2023. RRID:SCR 007345
work page 2023
-
[68]
Albert, Joel Xue, Aarya Parekh, Reza Sameni, Matthew A
Zuzana Koscova, Qiao Li, Chad Robichaux, Valdery Moura Junior, Manohar Ghanta, Aditya Gupta, Jonathan Rosand, Aaron Aguirre, Shenda Hong, David E. Albert, Joel Xue, Aarya Parekh, Reza Sameni, Matthew A. Reyna, M. Brandon Westover, and Gari D. Cliford. The Harvard-Emory ECG Database. medRxiv, 2024. 58 Table 2: Comparison of Annual Public Data Science Compe...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.