arxiv: 2604.27351 · v1 · submitted 2026-04-30 · 💻 cs.AI · cs.CL· cs.LG

Recognition: unknown

Heterogeneous Scientific Foundation Model Collaboration

Feihao Fang, Jiaru Zou, Jingrui He, Mengting Ai, Sirui Chen, Tianxin Wei, Xiyuan Yang, Xuying Ning, Zihao Li

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:45 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords heterogeneous agentic frameworkscientific foundation modelslanguage reasoning interfacedomain-specific datamulti-agent systemsplanning orchestrationEywaAgentEywaOrchestra

0 comments

The pith

Eywa adds language-based reasoning interfaces to domain-specific foundation models so they can join agentic systems on non-linguistic data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Eywa, a framework that augments specialized scientific foundation models with a language-model reasoning interface. This lets language models guide inference on structured or domain-specific data without forcing everything through text. The design targets the limit of language-only agentic systems in fields like physics or biology where dedicated models already exist for particular tasks. Eywa supports three modes: a single-agent replacement, integration into multi-agent setups, and a planner that mixes regular and specialized agents. Results across physical, life, and social science tasks show gains on structured-data problems and less dependence on pure language reasoning.

Core claim

Eywa is a heterogeneous agentic framework that augments domain-specific foundation models with a language-model-based reasoning interface. This interface enables language models to guide inference over non-linguistic data modalities, allowing predictive foundation models to participate in higher-level reasoning and decision-making. The framework can replace a single-agent pipeline, integrate specialized agents into multi-agent systems, or use planning-based orchestration to coordinate both types across modalities.

What carries the argument

The language-model-based reasoning interface added to domain-specific foundation models, which converts language guidance into operations on specialized non-text data while keeping the model's original strengths intact.

If this is right

EywaAgent can replace a single language-model agent in existing pipelines.
EywaMAS swaps in specialized agents within multi-agent systems.
EywaOrchestra uses a planner to route tasks across language and non-language models.
Tasks involving structured or domain-specific data show measurable accuracy gains.
Overall system reliance on language-only reasoning drops through the collaboration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar interfaces could be tested on engineering or medical simulation models outside the paper's science focus.
Dynamic planning might reduce mismatch errors when agents must choose between text and numeric tools.
The approach could encourage developers to build lightweight adapters rather than retraining full models for each modality.
Broader use might shift scientific AI design toward modular interfaces instead of monolithic language models.

Load-bearing premise

That attaching a language-based reasoning interface lets language models effectively direct inference inside domain-specific models without harming their specialized accuracy.

What would settle it

An experiment in which Eywa shows no performance gain or even lower accuracy than either standalone specialized models or pure language-model agents on the same structured-data scientific tasks.

read the original abstract

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Eywa gives a clean set of wrappers and a planner for letting LLMs steer specialized scientific models on native data, but the performance claims rest on high-level assertions rather than detailed evidence.

read the letter

The two things worth knowing are that the paper supplies a practical architecture for heterogeneous model collaboration and that its experiments are described too lightly to judge the size of the gains. Eywa defines three usage modes: a drop-in single agent, a swap-in for multi-agent systems, and an orchestration layer that routes tasks between ordinary agents and the new specialized ones. The core move is adding a language-model reasoning interface so domain-specific models can participate in planning without everything being translated to text first. That design choice is explained plainly and matches a real pain point in current agentic systems. The evaluation spans physical, life, and social science tasks, which shows an attempt at breadth rather than a single narrow benchmark. The soft spot is the results. The abstract and main claims assert better performance on structured data and less reliance on language reasoning, yet the supporting details on baselines, metrics, ablations, and statistical tests are thin. Without those, it is difficult to separate the contribution of the new interface from simply having access to stronger specialized models. The assumption that the language layer can guide non-linguistic inference without degrading accuracy also needs more direct checks. This is the sort of paper that belongs in an AI-for-science or multi-agent systems venue. Readers who are already building agent pipelines for scientific work will find the component definitions and orchestration sketch useful as a starting point, even if they end up changing the implementation. I would bring it to a reading group to discuss the interface design. I would not cite it in my own work in the next year because the empirical edge is not yet sharp. A serious editor should send it to peer review so the community can see the full methods and push for tighter evaluation.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Eywa, a heterogeneous agentic framework that augments domain-specific scientific foundation models with language-model-based reasoning interfaces. This enables language models to guide inference over non-linguistic data modalities, allowing specialized predictive models to participate in higher-level reasoning and decision-making within agentic systems. The framework is presented in three forms: EywaAgent as a drop-in single-agent replacement, EywaMAS for integration into multi-agent systems, and EywaOrchestra as a planning-based orchestration layer that dynamically coordinates traditional and Eywa agents. The authors evaluate the approach across physical, life, and social science domains and claim that it improves performance on structured and domain-specific data tasks while reducing reliance on language-based reasoning.

Significance. If the empirical claims hold under rigorous validation, the work could meaningfully advance integration of specialized scientific foundation models into agentic AI systems. By providing a general interface layer rather than requiring end-to-end retraining, Eywa addresses a practical gap between general-purpose language agents and high-performance domain models. The orchestration variant further suggests a path toward dynamic, modality-aware planning. These contributions would be of interest to researchers working on scientific AI, multi-agent systems, and foundation-model collaboration, provided the performance gains are shown to be robust across baselines and tasks.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): The central claim that 'Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning' is load-bearing, yet the abstract supplies no information on experimental design, baselines, metrics, datasets, error bars, or statistical tests. If the full manuscript does not contain a complete experimental section with quantitative comparisons (e.g., against standard LLM agents, direct fine-tuning, or modality-specific pipelines) and ablation studies isolating the reasoning-interface contribution, the support for the performance and 'reduced language reliance' assertions cannot be evaluated. This must be addressed with concrete tables, figures, and reproducibility details.
[§3] §3 (Framework Description): The weakest assumption—that a language-model-based reasoning interface can effectively guide inference over non-linguistic data modalities without compromising the specialized capabilities of the domain foundation models—is stated but not formally characterized. The manuscript should provide either a precise interface specification (e.g., input/output formats, prompt templates, or API contracts) or empirical evidence that the interface preserves the original model's accuracy on its native tasks. Without this, it is unclear whether the collaboration mechanism is general or task-specific.

minor comments (2)

[Throughout] The acronyms EywaAgent, EywaMAS, and EywaOrchestra are introduced without an explicit nomenclature table or consistent usage pattern across sections; a short table mapping names to roles would improve readability.
[Abstract and §4] The abstract states evaluation 'across a diverse set of scientific domains' but does not list the specific tasks or datasets; the experimental section should include an explicit enumeration (e.g., Table 1) for traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications on the experimental rigor and framework formalization. Where appropriate, we have revised the manuscript to strengthen the presentation of results and interface details.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim that 'Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning' is load-bearing, yet the abstract supplies no information on experimental design, baselines, metrics, datasets, error bars, or statistical tests. If the full manuscript does not contain a complete experimental section with quantitative comparisons (e.g., against standard LLM agents, direct fine-tuning, or modality-specific pipelines) and ablation studies isolating the reasoning-interface contribution, the support for the performance and 'reduced language reliance' assertions cannot be evaluated. This must be addressed with concrete tables, figures, and reproducibility details.

Authors: The full manuscript contains a comprehensive §4 (Experiments) section with quantitative evaluations across physical, life, and social science domains. This includes direct comparisons to standard LLM agents, fine-tuned baselines, and modality-specific pipelines, along with ablation studies that isolate the contribution of the reasoning interface. Tables report performance metrics with error bars and statistical significance tests; datasets and reproducibility details (including code and hyperparameters) are provided in the appendix. We agree that the abstract is high-level and will expand it in the revision to briefly summarize the experimental design, key baselines, main metrics, and core findings while preserving its concise nature. revision: yes
Referee: [§3] §3 (Framework Description): The weakest assumption—that a language-model-based reasoning interface can effectively guide inference over non-linguistic data modalities without compromising the specialized capabilities of the domain foundation models—is stated but not formally characterized. The manuscript should provide either a precise interface specification (e.g., input/output formats, prompt templates, or API contracts) or empirical evidence that the interface preserves the original model's accuracy on its native tasks. Without this, it is unclear whether the collaboration mechanism is general or task-specific.

Authors: Section 3 describes the Eywa reasoning interface as a modular augmentation layer that translates between language-based agent instructions and the native input/output formats of domain-specific foundation models. Empirical evidence that this interface preserves (and in many cases improves) native task accuracy is presented in §4 through side-by-side comparisons showing that Eywa-augmented models retain or exceed the performance of standalone domain models on their original tasks while enabling higher-level agentic reasoning. To address the request for formal characterization, we will add a dedicated subsection in the revised §3 that specifies the interface contract, including standardized input/output schemas, prompt templates for the language-model wrapper, and API-level contracts that ensure generality across modalities. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain absent

full rationale

The manuscript introduces an architectural framework (Eywa) for interfacing language models with domain-specific scientific foundation models via a reasoning layer. No equations, derivations, fitted parameters, or mathematical claims appear in the abstract or are indicated in the full text. All performance assertions rest on experimental evaluations across physical, life, and social science tasks rather than any reduction to self-defined inputs or self-citations. The central design (augmenting models with a language-based interface) is presented as an engineering choice, not derived from prior results by the same authors. This satisfies the default expectation of a non-circular empirical/architectural paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

With only the abstract available, the ledger reflects the high-level introduction of the Eywa system. No free parameters or axioms are specified. The main addition is the conceptual framework itself.

invented entities (1)

Eywa framework and its variants (EywaAgent, EywaMAS, EywaOrchestra) no independent evidence
purpose: Augmenting domain-specific foundation models with language-model-based reasoning interfaces for agentic collaboration
The framework is the primary contribution introduced in the paper; no independent evidence or prior existence is indicated in the abstract.

pith-pipeline@v0.9.0 · 5566 in / 1284 out tokens · 174582 ms · 2026-05-07T08:45:24.245168+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical Domains
cs.CL 2026-05 unverdicted novelty 4.0

TRACE is a metrologically-grounded four-layer engineering framework for trustworthy agentic AI that enforces an ML-LLM split, stateful policies, human supervision, and a parsimony metric across critical domains.

Reference graph

Works this paper leans on

150 extracted references · 94 canonical work pages · cited by 1 Pith paper · 16 internal anchors

[1]

GPT-4 Technical Report

OpenAI. GPT-4 technical report.CoRR, abs/2303.08774, 2023. doi: 10.48550/ARXIV.2303.08774. URLhttps://doi.org/10.48550/arXiv.2303.08774

work page internal anchor Pith review doi:10.48550/arxiv.2303.08774 2023
[2]

Hilbert’s sixth problem: derivation of fluid equations via Boltzmann’s kinetic theory,

Gemma Team. Gemma 3 technical report.CoRR, abs/2503.19786, 2025. doi: 10.48550/ARXIV.2503. 19786. URLhttps://doi.org/10.48550/arXiv.2503.19786

work page doi:10.48550/arxiv.2503 2025
[3]

Not all noises are created equally: Diffusion noise selection and optimization.CoRR, abs/2407.14041, 2024

Llama Team. The llama 3 herd of models.CoRR, abs/2407.21783, 2024. doi: 10.48550/ARXIV.2407. 21783. URLhttps://doi.org/10.48550/arXiv.2407.21783

work page doi:10.48550/arxiv.2407 2024
[4]

arXiv preprint arXiv:2601.12538 (2026)

Tianxin Wei, Ting-Wei Li, Zhining Liu, Xuying Ning, Ze Yang, Jiaru Zou, Zhichen Zeng, Ruizhong Qiu, Xiao Lin, Dongqi Fu, Zihao Li, Mengting Ai, Duo Zhou, Wenxuan Bao, Yunzhe Li, Gaotang Li, Cheng Qian, Yu Wang, Xiangru Tang, Yin Xiao, Liri Fang, Hui Liu, Xianfeng Tang, Yuji Zhang, Chi Wang, Jiaxuan You, Heng Ji, Hanghang Tong, and Jingrui He. Agentic reas...

work page doi:10.48550/arxiv.2601.12538 2026
[5]

Adaptation of agentic ai: A survey of post-training, memory, and skills.arXiv preprint arXiv:2512.16301, 2026a

Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Hen...

work page doi:10.48550/arxiv.2512.16301 2025
[6]

Latent collaboration in multi-agent systems

Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, and Ling Yang. Latent collaboration in multi-agent systems.CoRR, abs/2511.20639, 2025. doi: 10.48550/ARXIV.2511.20639. URL https://doi.org/10.48550/arXiv.2511.20639

work page doi:10.48550/arxiv.2511.20639 2025
[7]

Yu, Qiang Yang, and Xing Xie

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. A survey on evaluation of large language models.CoRR, abs/2307.03109, 2023. doi: 10.48550/ARXIV.2307.03109. URLhttps://doi.org/10.48550/arXiv.2307.03109

work page doi:10.48550/arxiv.2307.03109 2023
[8]

How far are we from AGI: are llms all we need?Trans

Tao Feng, Chuanyang Jin, Jingyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, and Jiaxuan You. How far are we from AGI: are llms all we need?Trans. Mach. Learn. Res., 2024, 2024. URL https://openreview.net/forum?id=H2ZKqfNd0U

2024
[9]

Holistic Evaluation of Language Models

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu...

work page Pith review doi:10.48550/arxiv.2211.09110 2022
[10]

A survey of scientific large language models: From data foundations to agent frontiers.arXiv preprint arXiv:2508.21148, 2025

Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su, Junzhi Ning, Xinyao Liu, Ye Du, Changkai Ji...

work page doi:10.48550/arxiv.2508.21148 2025
[11]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.J. Chem. Inf. Comput. Sci., 28(1):31–36, 1988. doi: 10.1021/CI00057A005. URL https://doi.org/10.1021/ci00057a005

work page doi:10.1021/ci00057a005 1988
[12]

The era5 global reanalysis

Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, et al. The era5 global reanalysis. Quarterly journal of the royal meteorological society, 146(730):1999–2049, 2020

1999
[13]

Doan, and Chan- dan K

Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D. Doan, and Chan- dan K. Reddy. Llm-srbench: A new benchmark for scientific equation discovery with large language models. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,Forty-second Internat...

2025
[14]

2022 , pages =

UniProt Consortium. Uniprot: the universal protein knowledgebase in 2023.Nucleic Acids Res., 51 (D1):523–531, 2023. doi: 10.1093/NAR/GKAC1052. URL https://doi.org/10.1093/nar/ gkac1052

work page doi:10.1093/nar/gkac1052 2023
[15]

Under review

Jiaqi Wei, Yuejin Yang, Xiang Zhang, Yuhan Chen, Xiang Zhuang, Zhangyang Gao, Dongzhan Zhou, Guangshuai Wang, Zhiqiang Gao, Juntai Cao, Zijie Qiu, Xuming He, Qiang Zhang, Chenyu You, Shuangjia Zheng, Ning Ding, Wanli Ouyang, Nanqing Dong, Yu Cheng, Siqi Sun, Lei Bai, and Bowen Zhou. From AI for science to agentic science: A survey on autonomous scientific...

work page doi:10.48550/arxiv.2508.14111 2025
[16]

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, and Zhiyong Wu. Scienceboard: Evaluating multimodal autonomous agents in realistic scientific workflo...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.19897 2025
[17]

AI scientists produce results without reasoning scientifically

Martiño Ríos-García, Nawaf Alampara, Chandan Gupta, Indrajeet Mandal, Sajid Mannan, Ali Asghar Aghajani, NM Krishnan, and Kevin Maik Jablonka. Ai scientists produce results without reasoning scientifically.arXiv preprint arXiv:2604.18805, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Schneider, B

Sidharth S. Menon, Trishit Mondal, Shuvayan Brahmachary, Aniruddha Panda, Subodh M. Joshi, Kaushic Kalyanaraman, and Ameya D. Jagtap. On scientific foundation models: Rigorous definitions, key applications, and a comprehensive survey.Neural Networks, 198:108567, 2026. doi: 10.1016/J. NEUNET.2026.108567. URLhttps://doi.org/10.1016/j.neunet.2026.108567

work page doi:10.1016/j 2026
[19]

Foundation models for weather and climate data understanding: A comprehensive survey.arXiv preprint arXiv:2312.03014, 2023

Shengchao Chen, Guodong Long, Jing Jiang, Dikai Liu, and Chengqi Zhang. Foundation models for weather and climate data understanding: A comprehensive survey.CoRR, abs/2312.03014, 2023. doi: 10.48550/ARXIV.2312.03014. URLhttps://doi.org/10.48550/arXiv.2312.03014

work page doi:10.48550/arxiv.2312.03014 2023
[20]

arXiv preprint arXiv:2307.13721 doi:10.48550/arXiv.2307.13721

Muhammad Awais, Muzammal Naseer, Salman H. Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundational models defining a new era in vision: A survey and outlook.CoRR, abs/2307.13721, 2023. doi: 10.48550/ARXIV.2307.13721. URLhttps://doi.org/10.48550/arXiv.2307.13721

work page doi:10.48550/arxiv.2307.13721 2023
[21]

Lee, J., Lee, Y ., Kim, J., Kosiorek, A., Choi, S., and Teh, Y

Siva Rama Krishna Kottapalli, Karthik Hubli, Sandeep Chandrashekhara, Garima Jain, Sunayana Hubli, Gayathri Botla, and Ramesh Doddaiah. Foundation models for time series: A survey.CoRR, abs/2504.04011, 2025. doi: 10.48550/ARXIV.2504.04011. URLhttps://doi.org/10.48550/ arXiv.2504.04011

work page doi:10.48550/arxiv.2504.04011 2025
[22]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen Creel, Jared Quincy Davis, Dorottya Demszky, Chris Donahue, Moussa Doumbouya, Esin...

work page internal anchor Pith review arXiv 2021
[23]

A foundation model for clinician-centered drug repurposing.Nature Medicine, 30(12):3601–3613, 2024

Kexin Huang, Payal Chandak, Qianwen Wang, Shreyas Havaldar, Akhil Vaid, Jure Leskovec, Girish N Nadkarni, Benjamin S Glicksberg, Nils Gehlenborg, and Marinka Zitnik. A foundation model for clinician-centered drug repurposing.Nature Medicine, 30(12):3601–3613, 2024

2024
[24]

Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana C

RémiLam,AlvaroSanchez-Gonzalez,MatthewWillson,PeterWirnsberger,MeireFortunato,Alexander Pritzel, Suman V. Ravuri, Timo Ewalds, Ferran Alet, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, StephanHoyer,GeorgeHolland,JacklynnStott,OriolVinyals,ShakirMohamed,andPeterW.Battaglia. Graphcast: Learning skillful medium-range global weather forecasting.CoRR, abs/22...

work page doi:10.48550/arxiv.2212.12794 2022
[25]

arXiv preprint arXiv:2405.04285 , year=

Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zheng- hang Yuan, Thomas Dujardin, Qingsong Xu, and Yilei Shi. On the foundations of earth and cli- mate foundation models.CoRR, abs/2405.04285, 2024. doi: 10.48550/ARXIV.2405.04285. URL https://doi.org/10.48550/arXiv.2405.04285. 16 Heterogeneous Scientific Foundation ...

work page doi:10.48550/arxiv.2405.04285 2024
[26]

Foundation models for the electric power grid.Joule, 8(12):3245–3258, 2024

Hendrik F Hamann, Blazhe Gjorgiev, Thomas Brunschwiler, Leonardo SA Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Lok Choi, et al. Foundation models for the electric power grid.Joule, 8(12):3245–3258, 2024

2024
[27]

OlmoEarth : Stable latent image modeling for multimodal earth observation

Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Farley, et al. Olmoearth: Stable latent image modeling for multimodal earth observation.arXiv preprint arXiv:2511.13655, 2025

work page arXiv 2025
[28]

Chawla, Olaf Wiest, and Xiangliang Zhang

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 8048–8057. ij...

2024
[29]

Metagpt: MetaprogrammingforAmulti-agentcollaborativeframework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, andJürgenSchmidhuber. Metagpt: MetaprogrammingforAmulti-agentcollaborativeframework. InThe Twelfth International Conference on Learning Representations, ICLR 2024, ...

2024
[30]

O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Sean Follmer, Jeff Han, Jürgen Steimle, and Nathalie Henry Riche, editors,Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Fra...

work page doi:10.1145/3586183.3606763 2023
[31]

findings-emnlp.479/

Bingyu Yan, Xiaoming Zhang, Litian Zhang, Lian Zhang, Ziyi Zhou, Dezhuang Miao, and Chaozhuo Li. Beyond self-talk: A communication-centric survey of llm-based multi-agent systems.CoRR, abs/2502.14321, 2025. doi: 10.48550/ARXIV.2502.14321. URLhttps://doi.org/10.48550/ arXiv.2502.14321

work page doi:10.48550/arxiv.2502.14321 2025
[32]

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D. Nguyen. Multi-agent collaboration mechanisms: A survey of llms.CoRR, abs/2501.06322, 2025. doi: 10.48550/ARXIV.2501.06322. URLhttps://doi.org/10.48550/arXiv.2501.06322

work page internal anchor Pith review doi:10.48550/arxiv.2501.06322 2025
[33]

A survey on large language model based autonomous agents , volume=

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. A survey on large language model based autonomous agents.Frontiers Comput. Sci., 18(6):186345, 2024. doi: 10. 1007/S11704-024-40231-1. URLhttps://doi.org/10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[34]

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu, and Ming Zhang. Large language model agent: A surve...

work page Pith review doi:10.48550/arxiv.2503.21460 2025
[35]

Model context protocol

Anthropic. Model context protocol. https://docs.anthropic.com/en/docs/ agents-and-tools/mcp, 2024. 17 Heterogeneous Scientific Foundation Model Collaboration

2024
[36]

Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay S. Pande. Moleculenet: A benchmark for molecular machine learning.CoRR, abs/1703.00564, 2017. URLhttp://arxiv.org/abs/1703.00564

work page arXiv 2017
[37]

Supergpqa: Scaling llm evaluation across 285 graduate disciplines, 2025

M-A-P Team. Supergpqa: Scaling LLM evaluation across 285 graduate disciplines.CoRR, abs/2502.14739, 2025. doi: 10.48550/ARXIV.2502.14739. URLhttps://doi.org/10.48550/ arXiv.2502.14739

work page doi:10.48550/arxiv.2502.14739 2025
[38]

Physicsarena: The first multimodal physics reasoning benchmark exploring variable, process, and solution dimensions

Song Dai, Yibo Yan, Jiamin Su, Dongfang Zihao, Yubo Gao, Yonghua Hei, Jungang Li, Junyan Zhang, Sicheng Tao, Zhuoran Gao, and Xuming Hu. Physicsarena: The first multimodal physics reasoning benchmark exploring variable, process, and solution dimensions. In Christos Christodoulopoulos, Tan- moy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings o...

2025
[39]

Phybench: Holistic evaluation of physical perception and reasoning in large language models

Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang, Xudong Tian, Anqi Lv, Laifu Man, Jianxiang Li,...

work page doi:10.48550/arxiv.2504.16074 2025
[40]

Genome-bench: A scientific reasoning benchmark from real-world expert discussions.CoRR, abs/2505.19501, 2025

Ming Yin, Yuanhao Qu, Dyllan Liu, Ling Yang, Le Cong, and Mengdi Wang. Genome-bench: A scientific reasoning benchmark from real-world expert discussions.CoRR, abs/2505.19501, 2025. doi: 10.48550/ARXIV.2505.19501. URLhttps://doi.org/10.48550/arXiv.2505.19501

work page doi:10.48550/arxiv.2505.19501 2025
[41]

Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, and An- drew Zisserman

Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, and An- drew Zisserman. Scivid: Cross-domain evaluation of video models in scientific applications.CoRR, abs/2507.03578, 2025. doi: 10.48550/ARXIV.2507.03578. URLhttps://doi...

work page doi:10.48550/arxiv.2507.03578 2025
[42]

Emmanuel Johnson, Quentin Febvre, Anastasiia Gorbunova, Sammy Metref, Maxime Bal- larotta, Julien Le Sommer, and Ronan Fablet

J. Emmanuel Johnson, Quentin Febvre, Anastasiia Gorbunova, Sammy Metref, Maxime Bal- larotta, Julien Le Sommer, and Ronan Fablet. Oceanbench: The sea surface height edi- tion. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural...

2023
[43]

Fengxiang Wang, Hongzhen Wang, Zonghao Guo, Di Wang, Yulin Wang, Mingshuo Chen, Qiang Ma, Long Lan, Wenjing Yang, Jing Zhang, Zhiyuan Liu, and Maosong Sun. Xlrs-bench: Could your multimodal llms understand extremely large ultra-high-resolution remote sensing imagery? InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, T...

work page doi:10.1109/cvpr52734.2025.01336 2025
[44]

Evaluating Large Language Models in Scientific Discovery

Zhangde Song, Jieyu Lu, Yuanqi Du, Botao Yu, Thomas M Pruyn, Yue Huang, Kehan Guo, Xiuzhe Luo, Yuanhao Qu, Yi Qu, et al. Evaluating large language models in scientific discovery.arXiv preprint arXiv:2512.15567, 2025

work page internal anchor Pith review arXiv 2025
[45]

Mmlu-pro: A more robust and challenging multi- task language understanding benchmark.Advances in Neural Information Processing Systems, 37: 95266–95290, 2024

Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, et al. Mmlu-pro: A more robust and challenging multi- task language understanding benchmark.Advances in Neural Information Processing Systems, 37: 95266–95290, 2024

2024
[46]

F., Turkmen, C., Stella, L., Erickson, N., Guerron, P., Bohlke-Schneider, M., and Wang, Y

Oleksandr Shchur, Abdul Fatir Ansari, Caner Turkmen, Lorenzo Stella, Nick Erickson, Pablo Guerron, Michael Bohlke-Schneider, and Yuyang Wang. fev-bench: A realistic benchmark for time series forecasting.CoRR, abs/2509.26468, 2025. doi: 10.48550/ARXIV.2509.26468. URLhttps://doi. org/10.48550/arXiv.2509.26468

work page doi:10.48550/arxiv.2509.26468 2025
[47]

arXiv:2506.16791 [cs]

Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, and Frank Hutter. Tabarena: A living benchmark for machine learning on tabular data.CoRR, abs/2506.16791, 2025. doi: 10.48550/ARXIV.2506.16791. URLhttps://doi.org/10.48550/ arXiv.2506.16791

work page doi:10.48550/arxiv.2506.16791 2025
[48]

Self-refine: Iterative refinement with self-feedback

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback. In Alice Oh, Tristan Naumann, Amir Globerson, K...

2023
[49]

Improvingfactuality and reasoning in language models through multiagent debate

YilunDu, ShuangLi, AntonioTorralba, JoshuaB.Tenenbaum, andIgorMordatch. Improvingfactuality and reasoning in language models through multiagent debate. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Forty-first International Conference on Machine Learning, ICML 2024...

2024
[50]

URLhttps://proceedings.mlr.press/v235/du24e.html
[51]

Mixture-of-agents en- hances large language model capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou. Mixture-of-agents en- hances large language model capabilities. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. URLhttps: //openreview.net/forum?id=h0ZfDIrj7T

2025
[52]

arXiv preprint arXiv:2505.16997 , year=

Rui Ye, Xiangrui Liu, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, and Siheng Chen. X-MAS: towards building multi-agent systems with heterogeneous llms.CoRR, abs/2505.16997, 2025. doi: 10.48550/ARXIV.2505.16997. URLhttps://doi.org/10.48550/arXiv.2505.16997. 19 Heterogeneous Scientific Foundation Model Collaboration

work page doi:10.48550/arxiv.2505.16997 2025
[53]

Maddix, Hao Wang, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Ali Caner Türkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda-Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Bernie Wang. Chronos: Learning the langu...

2024
[54]

Chronos-2: From Univariate to Universal Forecasting

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Pra- teek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael...

work page internal anchor Pith review doi:10.48550/arxiv.2510.15821 2025
[55]

Tabpfn: A transformer that solves small tabular classification problems in a second

Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/forum?id=cp5PvcI6w8_

2023
[56]

OpenAI GPT-5 System Card

OpenAI. Openai GPT-5 system card.CoRR, abs/2601.03267, 2026. doi: 10.48550/ARXIV.2601.03267. URLhttps://doi.org/10.48550/arXiv.2601.03267

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.03267 2026
[57]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review arXiv 2023
[58]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review arXiv 2025
[59]

Claude family models.https://platform.claude.com/docs/en/about-claude/ models/overview, 2025

Anthropic. Claude family models.https://platform.claude.com/docs/en/about-claude/ models/overview, 2025

2025
[60]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems 35: Annual Conference on Neural...

2022
[61]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X

2023
[62]

A survey of scientific large language models: From data foundations to agent frontiers.arXiv preprint arXiv:2508.21148, 2025

Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, et al. A survey of scientific large language models: From data foundations to agent frontiers.arXiv preprint arXiv:2508.21148, 2025. 20 Heterogeneous Scientific Foundation Model Collaboration

work page arXiv 2025
[63]

A com- prehensive survey of scientific large language models and their applications in scientific discovery

Yu Zhang, Xiusi Chen, Bowen Jin, Sheng Wang, Shuiwang Ji, Wei Wang, and Jiawei Han. A com- prehensive survey of scientific large language models and their applications in scientific discovery. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8783–8817, 2024

2024
[64]

Galactica: A Large Language Model for Science

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022

work page internal anchor Pith review arXiv 2022
[65]

Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022

AitorLewkowycz,AndersAndreassen,DavidDohan,EthanDyer,HenrykMichalewski,VinayRamasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022

2022
[66]

Biogpt: gen- erative pre-trained transformer for biomedical text generation and mining.Briefings in bioinformatics, 23(6):bbac409, 2022

Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. Biogpt: gen- erative pre-trained transformer for biomedical text generation and mining.Briefings in bioinformatics, 23(6):bbac409, 2022

2022
[67]

Large language models encode clinical knowledge.Nature, 620(7972):172–180, 2023

Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge.Nature, 620(7972):172–180, 2023

2023
[68]

Chemllm: A chemical large language model.arXiv preprint arXiv:2402.06852, 2024

Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Wanli Ouyang, et al. Chemllm: A chemical large language model.arXiv preprint arXiv:2402.06852, 2024

work page arXiv 2024
[69]

arXiv preprint arXiv:2402.09391 (2024)

Botao Yu, Frazier N Baker, Ziqi Chen, Xia Ning, and Huan Sun. Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset.arXiv preprint arXiv:2402.09391, 2024

work page arXiv 2024
[70]

Sciglm: Training scientific language models with self-reflective instruction annotation and tuning.arXiv preprint arXiv:2401.07950, 4, 2024

Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, and Jie Tang. Sciglm: Training scientific language models with self-reflective instruction annotation and tuning.arXiv preprint arXiv:2401.07950, 4, 2024

work page arXiv 2024
[71]

Autonomous chemical research with large language models.Nature, 624(7992):570–578, 2023

Daniil A Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, 2023

2023
[72]

Sciagents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning.Advanced Materials, 37(22):2413523, 2025

Alireza Ghafarollahi and Markus J Buehler. Sciagents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning.Advanced Materials, 37(22):2413523, 2025

2025
[73]

arXiv preprint arXiv:2310.01728 , year=

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models.arXiv preprint arXiv:2310.01728, 2023

work page arXiv 2023
[74]

Table meets llm: Can large language models understand structured table data? a benchmark and empirical study

Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, and Dongmei Zhang. Table meets llm: Can large language models understand structured table data? a benchmark and empirical study. InProceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 645–654, 2024

2024
[75]

Can large language models understand molecules?BMC bioinformatics, 25(1):225, 2024

Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, and Alioune Ngom. Can large language models understand molecules?BMC bioinformatics, 25(1):225, 2024. 21 Heterogeneous Scientific Foundation Model Collaboration

2024
[76]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024,...

2024
[77]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Forty-first International Conference on Machine Learning, ICML...

2024
[78]

R., Ghonia, H., Bhagwatkar, R., Khorasani, A., Bayazi, M

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Bilos, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, AlexandreDrouin,NicolasChapados,YuriyNevmyvaka,andIrinaRish. Lag-llama: Towardsfoundation models for time series forecasting.CoRR, abs/2310.08278, 2023. doi: 10.48550/A...

work page doi:10.48550/arxiv.2310.08278 2023
[79]

Hollmann, N., M ¨uller, S., Purucker, L., Krishnakumar, A., K ¨orfer, M., Hoo, S., Schirrmeister, R., and Hutter, F

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nat., 637(8044):319–326, 2025. doi: 10.1038/S41586-024-08328-6. URL https://doi.org/10.1038/s41586-024-08328-6

work page doi:10.1038/s41586-024-08328-6 2025
[80]

Tabpfn-2.5: Advancing the state of the art in tabular foundation models.arXiv preprint arXiv:2511.08667, 2025

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablonski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schölk...

work page doi:10.48550/arxiv.2511.08667 2025

Showing first 80 references.