Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

Alden Rose; Cesar de la Fuente-Nunez; Haydn Jones; Jacob R. Gardner; Jiaming Liang; Kaiwen Wu; Li S. Yifei; Maggie Ziyu Huan; Mark Yatskar; Osbert Bastani

arxiv: 2605.07022 · v2 · pith:QXYMIEQDnew · submitted 2026-05-07 · 💻 cs.LG

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

Haydn Jones , Yimeng Zeng , Alden Rose , Li S. Yifei , Yining Huang , Kaiwen Wu , Jiaming Liang , Maggie Ziyu Huan

show 6 more authors

Yoseph Barash Cesar de la Fuente-Nunez Osbert Bastani Zachary Ives Mark Yatskar Jacob R. Gardner

This is my paper

Pith reviewed 2026-05-20 22:24 UTC · model grok-4.3

classification 💻 cs.LG

keywords biomedical knowledge extractionLLM agentsPubMed corpusstructured datasetsentity taggingmulti-agent systemsdata curation

0 comments

The pith

PubMed papers can be autonomously turned into larger, more nuanced and accurate structured biomedical datasets than manually curated ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to use large language models to tag billions of entities in a 22.5 million paper PubMed corpus and then deploy a multi-agent system to extract structured records for specific biomedical tasks. The resulting datasets reach millions of entries while including supporting passages that preserve experimental details lost in standard tables. Frontier models reject fewer of these extractions than they do for records from widely used curated databases, indicating a path to scalable, high-fidelity knowledge bases.

Core claim

The authors present an LLM-based entity-tagging pipeline on nine biomedical ontologies applied to 22.5M papers, hybrid retrieval over the tagged corpus, and the Starling multi-agent system that designs filters and extracts nuanced records. For six tasks including blood-brain barrier permeability and gene-disease associations, Starling yields 6.3M records with rejection rates of 0.6-7.7% compared to 7.3-16.5% on curated counterparts, plus nuance-rich fields.

What carries the argument

Starling, the multi-agent deep research system that given a natural language task designs precision and recall targeted retrieval filters, induces an extraction schema, and emits structured records with supporting passages.

If this is right

Produces up to millions of records per task, including some of the largest public datasets for properties like oral bioavailability.
Retains experimental context such as fed versus fasted state in bioavailability measurements that tabular databases typically discard.
Establishes a scalable foundation for AI-driven therapeutic design using autonomously generated knowledge.
Lowers the cost and lag of maintaining biomedical repositories compared to manual curation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar autonomous extraction could be applied to other large scientific corpora beyond PubMed to build structured knowledge in physics or chemistry.
Integrating feedback loops where extracted data informs new queries might further improve coverage and accuracy over time.
The approach opens the possibility of real-time updating of datasets as new papers are published.

Load-bearing premise

Frontier model rejection rates provide a reliable proxy for the actual accuracy of the extracted structured records.

What would settle it

Independent manual review or experimental replication of a sample of the generated records versus the source papers would confirm whether the reported rejection rates correspond to true improvements in data quality.

Figures

Figures reproduced from arXiv: 2605.07022 by Alden Rose, Cesar de la Fuente-Nunez, Haydn Jones, Jacob R. Gardner, Jiaming Liang, Kaiwen Wu, Li S. Yifei, Maggie Ziyu Huan, Mark Yatskar, Osbert Bastani, Yimeng Zeng, Yining Huang, Yoseph Barash, Zachary Ives.

**Figure 2.** Figure 2: Within-mol η 2 ANOVA effect size statistics for each task / conditioning variable: the fraction of a molecule’s label variance explained by that single covariate, averaged across molecules with ≥ 2 extractions covering ≥ 2 subcategories. For example, “Dosing route” splits each molecule’s LD50 measurements into {oral, injection, inhalation, dermal, . . .} and asks how much of that molecule’s lethal-dose var… view at source ↗

**Figure 3.** Figure 3: 20 [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 3.** Figure 3: The five natural-language task prompts given to each model. Minor wording variations [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: One of the FilterSpec probes the Proposer constructed for the BBB task. entity_groups is a CNF entity constraint (outer AND, inner OR); semantic_query is the rerank target applied to the surviving windows. H Additional Results Measuring Disagreement for Individual Molecules In this section, we present two small case studies that measure the variability of BBB, Oral and LD50 labels that exists for single mo… view at source ↗

**Figure 5.** Figure 5: For each labeled molecule in three TDC datasets, what fraction of [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: For each molecule with ≥ 5 extractions, what fraction of the extractions would have the + label in the corresponding TDC datasets? individual labels? In [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

read the original abstract

Manually curated biomedical repositories -- spanning bioactivity, genomics, and chemistry -- are expensive to maintain, lag behind primary literature, and discard experimental context, obscuring nuances needed to assess data correctness and coverage. We show that PubMed itself can be autonomously and cost-effectively turned into structured datasets that are larger, more nuanced, and more accurate than the curated databases they replace. We present three coupled contributions: (1) an LLM-based entity-tagging pipeline, grounded in nine biomedical ontologies, that tags 4.5B entities across 19 categories in a 22.5M-paper, 2.5T-token PubMed corpus; (2) hybrid sparse-dense retrieval supporting entity-filtered semantic queries over the tagged corpus; and (3) Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages. Across six tasks -- blood-brain barrier permeability, oral bioavailability, acute toxicity (LD50), gene-disease associations, protein subcellular localization, and chemical reactions -- Starling produces ~6.3M records (91K-3M per task); several are, to our knowledge, the largest public datasets for their property. Frontier-model rejection of our extractions is 0.6-7.7% across tasks, far below error rates we measure on widely used curated counterparts (e.g., 16.5% on BBB_Martins, 7.3% on Bioavailability_Ma). Beyond scale and accuracy, the supporting passages carry nuance tabular databases discard -- e.g., oral bioavailability may depend on fed vs. fasted state. Together, the corpus, retrieval, and agent establish a foundation for AI-driven therapeutic design. Code and datasets: https://github.com/starling-labs/starling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Starling scales entity tagging and multi-agent extraction over 22M PubMed papers to produce large nuanced datasets, but accuracy claims rest on frontier-model rejection rates that lack independent validation.

read the letter

The main point is that this paper builds a full pipeline to turn the PubMed corpus into structured records at a scale that manual curation cannot match. They tag 4.5 billion entities across 22.5 million papers using nine ontologies, add hybrid retrieval, and then run Starling, a multi-agent system that takes a plain-language task and outputs filtered, schema-driven extractions with supporting passages. Across six properties they release roughly 6.3 million records, some of which are the largest public sets available for those tasks, and they ship the code and data on GitHub. That engineering package is the real deliverable here and it is useful for anyone who needs context-rich biomedical data rather than just tabulated numbers.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that PubMed can be autonomously processed via an LLM-based entity-tagging pipeline over 22.5M papers and a multi-agent system called Starling to produce ~6.3M structured records across six biomedical tasks (e.g., blood-brain barrier permeability, oral bioavailability, gene-disease associations). These datasets are asserted to be larger, more nuanced (via supporting passages), and more accurate than existing manually curated databases, with evidence consisting of frontier-model rejection rates of 0.6-7.7% versus 7.3-16.5% on counterparts such as BBB_Martins and Bioavailability_Ma. The work also contributes hybrid retrieval over a 4.5B-entity tagged corpus and releases code and datasets.

Significance. If the accuracy and nuance claims hold, the approach could enable scalable, cost-effective alternatives to manual curation, preserving experimental context that tabular databases often discard and supporting AI-driven therapeutic design. The public release of code and datasets on GitHub is a clear strength that facilitates reproducibility and community extension of the extraction pipeline.

major comments (2)

[Abstract] Abstract: The headline accuracy claim (rejection rates 0.6-7.7% for Starling extractions versus 7.3-16.5% on curated databases) is load-bearing for the central thesis that the new datasets are 'more accurate.' This comparison uses frontier-model rejection as a proxy for both the new records and the error rates on existing databases, yet the manuscript provides no description of independent human expert adjudication, inter-annotator agreement, or cross-validation against gold-standard sources for the ~6.3M records. Without such grounding, the proxy risks circularity and bias with respect to extraction style and domain correctness.
[Starling system] Starling system description: The multi-agent workflow for designing retrieval filters, inducing schemas, and emitting nuance-rich fields is presented at a high level. Specifics on how precision/recall targets are operationalized, how edge cases in the 4.5B-entity tagging across nine ontologies are handled, and the exact prompting or agent coordination mechanisms are not detailed enough to allow independent reproduction or assessment of whether the low rejection rates reflect genuine correctness or model self-consistency.

minor comments (2)

[Abstract] The abstract states that 'several' datasets are the largest public ones for their property but does not identify which tasks achieve this or provide direct size comparisons to prior work; adding a table with record counts versus existing resources would improve clarity.
[Methods] Notation for the hybrid sparse-dense retrieval and the nine ontologies is introduced without an explicit list or reference; including these details in the methods section would aid readers unfamiliar with the specific biomedical resources.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract] Abstract: The headline accuracy claim (rejection rates 0.6-7.7% for Starling extractions versus 7.3-16.5% on curated databases) is load-bearing for the central thesis that the new datasets are 'more accurate.' This comparison uses frontier-model rejection as a proxy for both the new records and the error rates on existing databases, yet the manuscript provides no description of independent human expert adjudication, inter-annotator agreement, or cross-validation against gold-standard sources for the ~6.3M records. Without such grounding, the proxy risks circularity and bias with respect to extraction style and domain correctness.

Authors: We agree that the accuracy comparison relies on a model-based proxy and that independent human validation would be ideal. However, the proxy is applied consistently: the frontier model evaluates whether each record (our extraction or a curated database entry) is supported by its associated source text or passage. This uniform application reduces bias from differing extraction styles. Circularity is avoided because the judge model is not involved in the original extraction process for our records. We acknowledge the limitation and will revise the manuscript to explicitly describe the proxy methodology, its assumptions, and limitations, including a discussion of why full human adjudication at this scale is not practical. revision: partial
Referee: [Starling system] Starling system description: The multi-agent workflow for designing retrieval filters, inducing schemas, and emitting nuance-rich fields is presented at a high level. Specifics on how precision/recall targets are operationalized, how edge cases in the 4.5B-entity tagging across nine ontologies are handled, and the exact prompting or agent coordination mechanisms are not detailed enough to allow independent reproduction or assessment of whether the low rejection rates reflect genuine correctness or model self-consistency.

Authors: We appreciate this feedback on the level of detail provided for the Starling system. To improve reproducibility, we will expand the relevant section in the revised manuscript to include more specifics on operationalizing precision and recall targets, strategies for handling edge cases in the large-scale entity tagging, and the prompting templates and coordination protocols among the agents. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's derivation consists of an LLM entity-tagging pipeline over a PubMed corpus, hybrid retrieval, and the Starling multi-agent system that induces schemas and emits structured records. Accuracy claims rest on direct comparisons of frontier-model rejection rates (0.6-7.7%) against independently measured error rates on external published curated databases such as BBB_Martins and Bioavailability_Ma. These benchmarks are outside the paper's own fitted values or self-citations, and no load-bearing step reduces by construction to a self-definition, a renamed fit, or an imported uniqueness theorem from the authors' prior work. The methodology is self-contained against external references.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the untested premise that frontier-model rejection rates are a faithful accuracy proxy and on the assumption that the LLM tagging pipeline generalizes across all biomedical sub-domains without systematic bias.

axioms (2)

domain assumption Frontier-model rejection of extractions is a reliable and unbiased measure of true extraction error.
Used to claim superiority over curated databases (abstract).
domain assumption LLM entity tagging grounded in nine ontologies produces sufficiently accurate labels to support downstream structured extraction at 22.5 M paper scale.
Foundational step of the pipeline (abstract).

invented entities (1)

Starling multi-agent system no independent evidence
purpose: Autonomously designs retrieval filters, induces extraction schemas, and emits nuanced structured records with supporting passages.
New system introduced to perform the end-to-end extraction task.

pith-pipeline@v0.9.0 · 5944 in / 1630 out tokens · 47624 ms · 2026-05-20T22:24:01.624563+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Starling, a multi-agent deep research system that, given only a natural-language task description, designs precision- and recall-targeted retrieval filters, induces an extraction schema, and emits structured records with nuance-rich fields and supporting passages.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Frontier-model rejection of our kept extractions is 0.6–7.7% across tasks, surprisingly far below the error rates we measure on the widely used, manually curated counterparts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 7 internal anchors

[1]

Arora, Yu Bai, Bowen Baker, Hai-Biao Bao, Boaz Barak, Ally Bennett, Tyler Bertao, N

OpenAI Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Hai-Biao Bao, Boaz Barak, Ally Bennett, Tyler Bertao, N. Archer Brett, Eugene Brevdo, Greg Brockman, Sébastien Bubeck, Cheng Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, C. Dvorak, K Fives, Vlad Fom...

work page 2025
[2]

Large language models are few-shot clinical information extractors

Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. Large language models are few-shot clinical information extractors. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, Abu Dhabi, United Arab Emirates, December

work page 2022
[3]

doi: 10.18653/v1/2022.emnlp-main.130

Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.130. URLhttps://aclanthology.org/2022.emnlp-main.130/

work page doi:10.18653/v1/2022.emnlp-main.130 2022
[4]

Qianxiang Ai, Fanwang Meng, Jiale Shi, Brenden Pelkie, and Connor W. Coley. Extracting struc- tured data from organic synthesis procedures using a fine-tuned large language model.Digital Discovery, 3(9):1822–1831, September 2024. ISSN 2635-098X. doi: 10.1039/D4DD00091A. URLhttps://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00091a

work page doi:10.1039/d4dd00091a 2024
[5]

Wu, Winona C

Rolf Apweiler, Amos Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria J. Martin, Darren A. Natale, Claire O’Donovan, Nicole Redaschi, and Lai-Su L. Yeh. UniProt: the Universal Protein knowledgebase.Nucleic Acids Research, 32(Database issue):D115–D119, Jan...

work page doi:10.1093/nar/gkh131 2004
[6]

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna Lewis, John C. Matese, 10 Joel E. Richardson, Martin Ringwald, Gerald M. Rubin, and Gavin Sherlock. Gene O...

work page doi:10.1038/75556 2000
[7]

Parit Bansal, Anne Morgat, Kristian B. Axelsen, Venkatesh Muthukrishnan, Elisabeth Coud- ert, Lucila Aimo, Nevila Hyka-Nouspikel, Elisabeth Gasteiger, Arnaud Kerhornou, Teresa Batista Neto, Monica Pozzato, Marie-Claude Blatter, Alex Ignatchenko, Nicole Redaschi, and Alan Bridge. Rhea, the reaction knowledgebase in 2022.Nucleic Acids Research, 50(D1):D693–...

work page doi:10.1093/nar/gkab1016 2022
[8]

An integrated chemical environment to support 21st-century toxicology.Environmental Health Perspectives, 125(5):054501, 2017

SM Bell, J Phillips, A Sedykh, A Tandon, C Sprankle, SQ Morefield, A Shapiro, D Allen, R Shah, EA Maull, WM Casey, and NC Kleinstreuer. An integrated chemical environment to support 21st-century toxicology.Environmental Health Perspectives, 125(5):054501, 2017. doi: 10.1289/EHP1759. URLhttps://doi.org/10.1289/EHP1759

work page doi:10.1289/ehp1759 2017
[9]

Binder, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Sean I

Janos X. Binder, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Sean I. O’Donoghue, Reinhard Schneider, and Lars Juhl Jensen. Compartments: unification and visualization of protein subcellular localization evidence.Database, 2014:bau012, 2014. doi: 10.1093/database/bau012. URLhttps://doi.org/10.1093/database/bau012

work page doi:10.1093/database/bau012 2014
[10]

Bodenreider

Olivier Bodenreider. The Unified Medical Language System (UMLS): integrating biomedical terminology.Nucleic Acids Research, 32(Database issue):D267–D270, January 2004. ISSN 0305-1048. doi: 10.1093/nar/gkh061. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC308795/

work page doi:10.1093/nar/gkh061 2004
[11]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023. ISSN 1476-

work page 2023
[12]

Autonomous chemical research with large language models

doi: 10.1038/s41586-023-06792-0. URL https://www.nature.com/articles/ s41586-023-06792-0

work page doi:10.1038/s41586-023-06792-0
[13]

Augmenting large language models with chemistry tools

Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. Augmenting large-language models with chemistry tools.Nature Ma- chine Intelligence, 6(5):525–535, May 2024. doi: 10.1038/s42256-024-00832-8. URL https://www.nature.com/articles/s42256-024-00832-8

work page doi:10.1038/s42256-024-00832-8 2024
[14]

Brown, Vichet Hem, Kenneth S

Garth R. Brown, Vichet Hem, Kenneth S. Katz, Michael Ovetsky, Craig Wallin, Olga Ermolaeva, Igor Tolstoy, Tatiana Tatusova, Kim D. Pruitt, Donna R. Maglott, and Terence D. Murphy. Gene: a gene-centered information resource at NCBI.Nucleic Acids Research, 43(Database issue):D36–D42, January 2015. ISSN 0305-1048. doi: 10.1093/nar/gku1055. URL https: //pmc.n...

work page doi:10.1093/nar/gku1055 2015
[15]

Rosen, Ger- brand Ceder, Kristin A

John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S. Rosen, Ger- brand Ceder, Kristin A. Persson, and Anubhav Jain. Structured information extraction from scientific text with large language models.Nature Communications, 15(1):1418, February

work page
[16]

doi: 10.1038/s41467-024-45563-x

ISSN 2041-1723. doi: 10.1038/s41467-024-45563-x. URL https://www.nature. com/articles/s41467-024-45563-x

work page doi:10.1038/s41467-024-45563-x 2041
[17]

Wiegers, Robin J

Allan Peter Davis, Thomas C. Wiegers, Robin J. Johnson, Daniela Sciaky, Jolene Wiegers, and Carolyn J. Mattingly. Comparative toxicogenomics database (CTD): update 2023.Nucleic Acids Research, 51(D1):D1257–D1262, 2023. doi: 10.1093/nar/gkac833. URL https://doi. org/10.1093/nar/gkac833

work page doi:10.1093/nar/gkac833 2023
[18]

ChEBI: a database and ontology for chemical entities of biological interest.Nucleic Acids Research, 36 (Database issue):D344–D350, January 2008

Kirill Degtyarenko, Paula de Matos, Marcus Ennis, Janna Hastings, Martin Zbinden, Alan McNaught, Rafael Alcántara, Michael Darsow, Mickaël Guedj, and Michael Ashburner. ChEBI: a database and ontology for chemical entities of biological interest.Nucleic Acids Research, 36 (Database issue):D344–D350, January 2008. ISSN 0305-1048. doi: 10.1093/nar/gkm791. UR...

work page doi:10.1093/nar/gkm791 2008
[19]

ADME evaluation in drug discovery

Tingjun Hou, Junmei Wang, Wei Zhang, and Xiaojie Xu. ADME evaluation in drug discovery

work page
[20]

doi: 10.1021/ci6003515

Can oral bioavailability in humans be effectively predicted by simple molecular property- based rules?Journal of Chemical Information and Modeling, 47(2):460–463, 2007. doi: 10.1021/ci6003515. 11

work page doi:10.1021/ci6003515 2007
[21]

Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Con- nor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Artificial intelligence foun- dation for therapeutic science.Nature Chemical Biology, 18:1033–1036, 2022. doi: 10.1038/s41589-022-01131-2. URLhttps://doi.org/10.1038/s41589-022-01131-2

work page doi:10.1038/s41589-022-01131-2 2022
[22]

Carter, Xin Zhou, Matthew Wheeler, Jonathan A

Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Junze Zhang, Yin Di, et al. Biomni: A general-purpose biomedical ai agent.bioRxiv preprint, 2025. doi: 10.1101/2025.05.30.656746

work page doi:10.1101/2025.05.30.656746 2025
[23]

Huntley, Tony Sawford, Prudence Mutowo-Meullenet, Aleksandra Shypitsyna, Carlos Bonilla, Maria J

Rachael P. Huntley, Tony Sawford, Prudence Mutowo-Meullenet, Aleksandra Shypitsyna, Carlos Bonilla, Maria J. Martin, and Claire O’Donovan. The GOA database: Gene Ontology annotation updates for 2015.Nucleic Acids Research, 43(Database issue):D1057–D1063, 2015. doi: 10.1093/nar/gku1113

work page doi:10.1093/nar/gku1113 2015
[24]

Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Benjamin Moody, Brian Gow, Li wei H. Lehman, Leo Anthony Celi, and Roger G. Mark. Mimic-iv, a freely accessible electronic health record dataset.Scientific Data, 10, 2023. URLhttps://doi.org/10.1038/s41597-022-01899-x

work page doi:10.1038/s41597-022-01899-x 2023
[25]

John M. Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ron- neberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andy Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, ...

work page 2021
[26]

Journal of the American Chemical Society , author =

Steven M. Kearnes, Michael R. Maser, Michael Wleklinski, Anton Kast, Abigail G. Doyle, Spencer D. Dreher, Joel M. Hawkins, Klavs F. Jensen, and Connor W. Coley. The Open Reaction Database.Journal of the American Chemical Society, 143(45):18820–18826, November 2021. ISSN 0002-7863. doi: 10.1021/jacs.1c09820. URL https://doi.org/10.1021/jacs. 1c09820

work page doi:10.1021/jacs.1c09820 2021
[27]

Kim, Alexander Sedykh, Suman K

Marlene T. Kim, Alexander Sedykh, Suman K. Chakravarti, Roustem D. Saiakhov, and Hao Zhu. Critical evaluation of human oral bioavailability for pharmaceutical drugs by using various cheminformatics approaches, 2014

work page 2014
[28]

Thiessen, Evan E

Sunghwan Kim, Paul A. Thiessen, Evan E. Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A. Shoemaker, Jiyao Wang, Bo Yu, Jian Zhang, and Stephen H. Bryant. PubChem Substance and Compound databases.Nucleic Acids Research, 44 (D1):D1202–1213, January 2016. ISSN 1362-4962. doi: 10.1093/nar/gkv951

work page doi:10.1093/nar/gkv951 2016
[29]

$\texttt{MiniMol}$: A Parameter- Efficient Foundation Model for Molecular Learning, April 2024

Kerstin Kläser, Bła˙zej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, and Andrew Fitzgibbon. $\texttt{MiniMol}$: A Parameter- Efficient Foundation Model for Molecular Learning, April 2024. URL http://arxiv.org/ abs/2404.14986. arXiv:2404.14986 [cs]

work page arXiv 2024
[30]

DrugBank 6.0: the DrugBank Knowledgebase for 2024.Nucleic Acids Research, 52(D1):D1265–D1275, January 2024

Craig Knox, Mike Wilson, Christen M Klinger, Mark Franklin, Eponine Oler, Alex Wilson, Allison Pon, Jordan Cox, Na Eun (Lucy) Chin, Seth A Strawbridge, Marysol Garcia-Patino, Ray Kruger, Aadhavya Sivakumaran, Selena Sanford, Rahil Doshi, Nitya Khetarpal, Omolola Fatokun, Daphnee Doucet, Ashley Zubkowski, Dorsa Yahya Rayat, Hayley Jackson, Karxena Harford,...

work page doi:10.1093/nar/gkad976 2024
[31]

Landrum, Jennifer M

Melissa J. Landrum, Jennifer M. Lee, Mark Benson, Garth R. Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, 12 Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J. Bradley Holmes, Brandi L. Kattman, and Donna R. Maglo...

work page doi:10.1093/nar/gkx1153 2018
[32]

ReactionSeek: LLM-powered literature data mining and knowledge discovery in organic synthesis.Nature Communications, 17(1):3356,

Jiawei Li, Minzhou Li, Qi Yang, and Sanzhong Luo. ReactionSeek: LLM-powered literature data mining and knowledge discovery in organic synthesis.Nature Communications, 17(1):3356,

work page
[33]

URL https://www.nature.com/articles/ s41467-026-70180-1

doi: 10.1038/s41467-026-70180-1. URL https://www.nature.com/articles/ s41467-026-70180-1

work page doi:10.1038/s41467-026-70180-1
[34]

InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5

Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, and Colin Zhang. dots.ocr: Multilingual document layout parsing in a single vision-language model, 2025. URL https://arxiv.org/ abs/2512.02498

work page arXiv 2025
[35]

reaction-ship

Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, and Tat-Seng Chua. ReactXT: Understanding molecular “reaction-ship” via reaction- contextualized molecule-text pretraining. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5353–5377, B...

work page 2024
[36]

Lowe, Peter T

Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, and Robert C. Glen. Chemical Name to Structure: OPSIN, an Open Source Solution.Journal of Chemical Information and Modeling, 51(3):739–753, March 2011. ISSN 1549-9596. doi: 10.1021/ci100384d. URL https: //doi.org/10.1021/ci100384d

work page doi:10.1021/ci100384d 2011
[37]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, September 2024. URL http://arxiv.org/abs/2408.06292. arXiv:2408.06292 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Toxacol: an endpoint-aware and task-focused compound representation learning paradigm for acute toxicity assessment.Nature Communications, 16:5992, 2025

Jiang Lu, Lianlian Wu, Ruijiang Li, Mengxuan Wan, Jun Yang, Peng Zan, Hui Bai, Song He, and Xiaochen Bo. Toxacol: an endpoint-aware and task-focused compound representation learning paradigm for acute toxicity assessment.Nature Communications, 16:5992, 2025. doi: 10.1038/s41467-025-60989-7. URLhttps://doi.org/10.1038/s41467-025-60989-7

work page doi:10.1038/s41467-025-60989-7 2025
[39]

Paperqa: Retrieval-augmented generative agent for scientific research,

Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G. Rodriques, and Andrew D. White. PaperQA: Retrieval-Augmented Generative Agent for Scientific Research, December 2023. URLhttp://arxiv.org/abs/2312.07559. arXiv:2312.07559 [cs]

work page arXiv 2023
[40]

Chang-Ying Ma, Sheng-Yong Yang, Hui Zhang, Mingli Xiang, Qi Huang, and Yuquan Wei. Prediction models of human plasma protein binding rate and oral bioavailability derived by using ga–cg–svm method.Journal of Pharmaceutical and Biomedical Analysis, 47(4–5):677–682,

work page
[41]

URL https://doi.org/10.1016/j.jpba.2008

doi: 10.1016/j.jpba.2008.03.023. URL https://doi.org/10.1016/j.jpba.2008. 03.023

work page doi:10.1016/j.jpba.2008.03.023 2008
[42]

Iyad Majid, Vaibhav Mishra, Rohith Ravindranath, and Sophia Y . Wang. Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.AMIA Annual Symposium Proceedings, 2024:778–787, May 2025. ISSN 1942-597X. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12099357/

work page 2024
[43]

Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E

Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce V . Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yarosl...

work page doi:10.1289/ehp8495 2021
[44]

Foulger, Sarah Leigh, Louise C

Antonio Rueda Martin, Eleanor Williams, Rebecca E. Foulger, Sarah Leigh, Louise C. Daugh- erty, Olivia Niblock, Ivone U. S. Leong, Katherine R. Smith, Oleg Gerasimenko, Eik Haraldsdot- tir, Ellen Thomas, Richard H. Scott, Emma Baple, Arianna Tucci, Helen Brittain, Anna de Burca, Kristina Ibañez, Dalia Kasperaviciute, Damian Smedley, Mark Caulfield, August...

work page doi:10.1038/s41588-019-0528-2 2019
[45]

Teixeira, Luis Pinheiro, and Andre O

Ines Filipa Martins, Ana L. Teixeira, Luis Pinheiro, and Andre O. Falcao. A bayesian approach to in silico blood-brain barrier penetration modeling.Journal of Chemical Information and Modeling, 52(6):1686–1697, 2012. doi: 10.1021/ci300124c. URL https://doi.org/10. 1021/ci300124c

work page doi:10.1021/ci300124c 2012
[46]

Fanwang Meng, Yang Xi, Jinfeng Huang, and Paul W. Ayers. A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors.Scientific Data, 8 (1):289, October 2021. ISSN 2052-4463. doi: 10.1038/s41597-021-01069-5. URL https: //www.nature.com/articles/s41597-021-01069-5

work page doi:10.1038/s41597-021-01069-5 2021
[47]

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Dániel L. Barabási, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha S. Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk Caldas Ramos, Albert Bou, Kaleigh F. Roberts, Sladja...

work page internal anchor Pith review arXiv 2025
[48]

Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G

Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, and Andrew D. White. Aviary: training language agents on challenging scientific tasks, December 2024. URL http://arxiv.org/abs/2412.21154. arXiv:2412.21154 [cs]

work page arXiv 2024
[49]

Generif: Gene reference into function

National Center for Biotechnology Information. Generif: Gene reference into function. https: //www.ncbi.nlm.nih.gov/gene/about-generif

work page
[50]

S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing

Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar. ScispaCy: Fast and robust models for biomedical natural language processing. In Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, and Junichi Tsujii, editors,Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy, August 2019. Association for Computationa...

work page doi:10.18653/v1/w19-5034 2019
[51]

Pharmabench: Enhancing admet benchmarks with large language models.Scientific Data, 11(1):985, 2024

Zhangming Niu, Xianglu Xiao, Wenfan Wu, Qiwei Cai, Yinghui Jiang, Wangzhen Jin, Min- hao Wang, Guojian Yang, Lingkang Kong, Xurui Jin, Guang Yang, and Hongming Chen. Pharmabench: Enhancing admet benchmarks with large language models.Scientific Data, 11(1):985, 2024. doi: 10.1038/s41597-024-03793-0. URL https://doi.org/10.1038/ s41597-024-03793-0

work page doi:10.1038/s41597-024-03793-0 2024
[52]

Do LLMs Surpass Encoders for Biomedical NER?Proceedings

Motasem S Obeidat, Md Sultan Al Nahian, and Ramakanth Kavuluru. Do LLMs Surpass Encoders for Biomedical NER?Proceedings. IEEE International Conference on Healthcare Informatics, 2025:352–358, June 2025. ISSN 2575-2626. doi: 10.1109/ICHI64645.2025.00048. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12335919/. 14

work page doi:10.1109/ichi64645.2025.00048 2025
[53]

Guillaume Ollitrault, Marco Marzo, Alessandra Roncaglioni, Emilio Benfenati, Olivier Taboureau, and Enrico Mombelli. Qsar models for predicting oral bioavailability and vol- ume of distribution and their application in mapping the tk space of endocrine disrup- tors.Journal of Xenobiotics, 15(5):166, 2025. doi: 10.3390/jox15050166. URL https: //doi.org/10....

work page doi:10.3390/jox15050166 2025
[54]

SubCellBarCode: Proteome-wide mapping of protein localization and relocalization.Molecular Cell, 73(1): 166–182.e7, 2019

Lukas Minus Orre, Mattias Vesterlund, Yanbo Pan, Taner Arslan, Yafeng Zhu, Alejandro Fernandez Woodbridge, Oliver Frings, Erik Fredlund, and Janne Lehtiö. SubCellBarCode: Proteome-wide mapping of protein localization and relocalization.Molecular Cell, 73(1): 166–182.e7, 2019. doi: 10.1016/j.molcel.2018.11.035

work page doi:10.1016/j.molcel.2018.11.035 2019
[55]

Janet Piñero, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I. Furlong. The DisGeNET knowledge platform for disease genomics: 2019 update.Nucleic Acids Research, 48(D1):D845–D855, 2020. doi: 10.1093/nar/ gkz1021

work page doi:10.1093/nar/ 2019
[56]

Binder, and Lars Juhl Jensen

Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X. Binder, and Lars Juhl Jensen. Diseases: Text mining and data integration of disease–gene associations.Methods, 74:83–89,

work page
[57]

URL https://doi.org/10.1016/j.ymeth.2014

doi: 10.1016/j.ymeth.2014.11.020. URL https://doi.org/10.1016/j.ymeth.2014. 11.020

work page doi:10.1016/j.ymeth.2014.11.020 2014
[58]

Bran, Malte Franke, Rémi Schlama, Jeremy S

Victor Sabanza Gil, Andres M. Bran, Malte Franke, Rémi Schlama, Jeremy S. Luterbacher, and Philippe Schwaller. Holistic chemical evaluation reveals pitfalls in reaction prediction models. InNeurIPS 2023 AI for Science Workshop, 2023. doi: 10.48550/arXiv.2312.09004. URLhttps://arxiv.org/abs/2312.09004

work page doi:10.48550/arxiv.2312.09004 2023
[59]

NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database: The Journal of Biological Databases and Curation, 2020:baaa062, August 2020

Conrad L Schoch, Stacy Ciufo, Mikhail Domrachev, Carol L Hotton, Sivakumar Kannan, Rogneda Khovanskaya, Detlef Leipe, Richard Mcveigh, Kathleen O’Neill, Barbara Robbertse, Shobha Sharma, Vladimir Soussov, John P Sullivan, Lu Sun, Seán Turner, and Ilene Karsch- Mizrachi. NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database: The J...

work page 2020
[60]

URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7408187/

doi: 10.1093/database/baaa062. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7408187/

work page doi:10.1093/database/baaa062
[61]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, February 2024. URL https://arxiv. org/abs/2402.03300v3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[62]

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, A. J. Ostrow, Akhila Ananthram, Akshay Nathan, Alan Luo, Alec Helyar, Aleksander Madry, Aleksandr Efremov, Aleksandra Spyra, Alex Baker- Whitcomb, Alex Beutel, Alex Karpenko, Alex Makelov, Alex Neitz, Alex Wei, Alexandra Barr, Alexandre Kirchmeyer,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

Language agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024

Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, and Andrew D. White. Language agents achieve superhuman synthesis of scientific knowledge, September 2024. URL http: //arxiv.org/abs/2409.13740. arXiv:2409.13740 [cs]

work page arXiv 2024
[64]

BERN2: an advanced neural biomedical named entity recognition and normalization tool.Bioinformatics, 38(20):4837–4839, October 2022

Mujeen Sung, Minbyul Jeong, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, and Jaewoo Kang. BERN2: an advanced neural biomedical named entity recognition and normalization tool.Bioinformatics, 38(20):4837–4839, October 2022. ISSN 1367-4811. doi: 10.1093/ bioinformatics/btac598. URLhttps://doi.org/10.1093/bioinformatics/btac598

work page doi:10.1093/bioinformatics/btac598 2022
[65]

Tongyi DeepResearch Technical Report

Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Ga...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[66]

Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M

Peter J. Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M. Breckels, Anna Bäckström, Frida Danielsson, Linn Fagerberg, Jenny Fall, Laurent Gatto, Christian Gnann, Sophia Hober, Martin Hjelmare, Fredric Johansson, Sunjae Lee, Cecilia Lindskog, Jan Mulder, Claire M. Mulv...

work page doi:10.1126/science.aal3321 2017
[67]

Manthena V . S. Varma, R. Scott Obach, Charles Rotter, Howard R. Miller, George Chang, Stefanus J. Steyn, Ayman El-Kattan, and Matthew D. Troutman. Physicochemical space for optimum oral bioavailability: Contribution of human intestinal absorption and first-pass elimination.Journal of Medicinal Chemistry, 53(3):1098–1108, 2010. doi: 10.1021/jm901371v

work page doi:10.1021/jm901371v 2010
[68]

Wall, Risa R

Jonathan T. Wall, Risa R. Sayre, Doris Smith, Samuel Winter, Maxwell Groover, Jasmine Hope, Adriana Webb, Katie Paul Friedman, Madison Feshuk, Antony J. Williams, Charles Lowe, Nisha S. Sipes, Jason Lambert, Jennifer H. Olker, Russell S. Thomas, Colleen Elonen, Richard S. Judson, and Chelsea A. Weitekamp. Development of the toxicity values database, toxva...

work page doi:10.1016/j.comtox.2025.100365 2025
[69]

Collins, and César de la Fuente-Nunez

Fangping Wan, Felix Wong, James J. Collins, and César de la Fuente-Nunez. Machine learning for antimicrobial peptide identification and design.Nature Reviews Bioengineering, 2:392 – 407, 2024. URLhttps://doi.org/10.1038/s44222-024-00152-x

work page doi:10.1038/s44222-024-00152-x 2024
[70]

Pubtator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.Nucleic Acids Research, 52(W1):W540–W546,

Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, and Zhiyong Lu. Pubtator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.Nucleic Acids Research, 52(W1):W540–W546,

work page
[71]

Nucleic Acids Research , volume =

doi: 10.1093/nar/gkae235. URLhttps://doi.org/10.1093/nar/gkae235

work page doi:10.1093/nar/gkae235
[72]

Min Wei, Xudong Zhang, Xiaolin Pan, Bo Wang, Changge Ji, Yifei Qi, and John Z. H. Zhang. Hobpre: accurate prediction of human oral bioavailability for small molecules.Journal of Cheminformatics, 14(1), 2022. doi: 10.1186/s13321-021-00580-6. URL https://doi.org/ 10.1186/s13321-021-00580-6

work page doi:10.1186/s13321-021-00580-6 2022
[73]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist-v2: Workshop-Level Automated Scientific Dis- covery via Agentic Tree Search, April 2025. URL http://arxiv.org/abs/2504.08066. arXiv:2504.08066 [cs]. 17

work page internal anchor Pith review Pith/arXiv arXiv 2025
[74]

Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Haris Ioannidis, David Mendez Lopez, Juan F

Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J. Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F. Mosquera, Maria Paula Magarinos, Nicolas Bosc, Ricardo Arcila, Tevfik Kizilören, Anna Gaulton, A. Patrícia Bento, Melissa F. Adasme, Peter Monecke, Gregory A. Landrum, and Andrew R. Leach. The ChEMBL ...

work page doi:10.1093/nar/gkad1004 2023
[75]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models, June 2025. URL http://arxiv.org/abs/2506.05176. arXiv:2506.05176 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[76]

Univer- salNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.International Conference on Learning Representations, 2024:12276–12294, May 2024

Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, and Hoifung Poon. Univer- salNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.International Conference on Learning Representations, 2024:12276–12294, May 2024. URL https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 34678d08b36076de986df95c5bbba92f-Abstract-Conference.html

work page 2024
[77]

deep research

Hao Zhu, Todd M. Martin, Lin Ye, Alexander Sedykh, Douglas M. Young, and Alexander Tropsha. Quantitative structure–activity relationship modeling of rat acute toxicity by oral exposure.Chemical Research in Toxicology, 22(12):1913–1921, 2009. doi: 10.1021/tx900189p. URLhttps://doi.org/10.1021/tx900189p. A Release Plan, Limitations, and Broader Impact A cor...

work page doi:10.1021/tx900189p 1913
[78]

Produces execution trace notebooks alongside output CSVs

BioMNI-Phylo: A sandboxed Python coding agent with iterative cell execution, access to tools such as NCBI E-utilities, ChEMBL, PubChem APIs, and NLP libraries (scispaCy). Produces execution trace notebooks alongside output CSVs

work page
[79]

As it cannot produce data files, we pair it with acontinuation agent(Claude Code) that executes the recommended pipelines to produce CSVs

Claude Research Mode(Opus 4.6 Extended): A web-based deep research agent on claude.ai that produces comprehensive narrative reports. As it cannot produce data files, we pair it with acontinuation agent(Claude Code) that executes the recommended pipelines to produce CSVs. This constitutes a two-stage pipeline

work page
[80]

Like Claude Research Mode, it produces no data files; a continuation agent executes the designed pipelines

GPT-5.4 Deep Research: A web-based deep research agent on chatgpt.com that produces comprehensive narrative reports. Like Claude Research Mode, it produces no data files; a continuation agent executes the designed pipelines. This is the only configuration that attempted genuine PubMed text mining (E-utilities + PubTator + regex extraction) for 4 of 5 tasks

work page

Showing first 80 references.

[1] [1]

Arora, Yu Bai, Bowen Baker, Hai-Biao Bao, Boaz Barak, Ally Bennett, Tyler Bertao, N

OpenAI Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Hai-Biao Bao, Boaz Barak, Ally Bennett, Tyler Bertao, N. Archer Brett, Eugene Brevdo, Greg Brockman, Sébastien Bubeck, Cheng Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, C. Dvorak, K Fives, Vlad Fom...

work page 2025

[2] [2]

Large language models are few-shot clinical information extractors

Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. Large language models are few-shot clinical information extractors. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, Abu Dhabi, United Arab Emirates, December

work page 2022

[3] [3]

doi: 10.18653/v1/2022.emnlp-main.130

Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.130. URLhttps://aclanthology.org/2022.emnlp-main.130/

work page doi:10.18653/v1/2022.emnlp-main.130 2022

[4] [4]

Qianxiang Ai, Fanwang Meng, Jiale Shi, Brenden Pelkie, and Connor W. Coley. Extracting struc- tured data from organic synthesis procedures using a fine-tuned large language model.Digital Discovery, 3(9):1822–1831, September 2024. ISSN 2635-098X. doi: 10.1039/D4DD00091A. URLhttps://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00091a

work page doi:10.1039/d4dd00091a 2024

[5] [5]

Wu, Winona C

Rolf Apweiler, Amos Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria J. Martin, Darren A. Natale, Claire O’Donovan, Nicole Redaschi, and Lai-Su L. Yeh. UniProt: the Universal Protein knowledgebase.Nucleic Acids Research, 32(Database issue):D115–D119, Jan...

work page doi:10.1093/nar/gkh131 2004

[6] [6]

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna Lewis, John C. Matese, 10 Joel E. Richardson, Martin Ringwald, Gerald M. Rubin, and Gavin Sherlock. Gene O...

work page doi:10.1038/75556 2000

[7] [7]

Parit Bansal, Anne Morgat, Kristian B. Axelsen, Venkatesh Muthukrishnan, Elisabeth Coud- ert, Lucila Aimo, Nevila Hyka-Nouspikel, Elisabeth Gasteiger, Arnaud Kerhornou, Teresa Batista Neto, Monica Pozzato, Marie-Claude Blatter, Alex Ignatchenko, Nicole Redaschi, and Alan Bridge. Rhea, the reaction knowledgebase in 2022.Nucleic Acids Research, 50(D1):D693–...

work page doi:10.1093/nar/gkab1016 2022

[8] [8]

An integrated chemical environment to support 21st-century toxicology.Environmental Health Perspectives, 125(5):054501, 2017

SM Bell, J Phillips, A Sedykh, A Tandon, C Sprankle, SQ Morefield, A Shapiro, D Allen, R Shah, EA Maull, WM Casey, and NC Kleinstreuer. An integrated chemical environment to support 21st-century toxicology.Environmental Health Perspectives, 125(5):054501, 2017. doi: 10.1289/EHP1759. URLhttps://doi.org/10.1289/EHP1759

work page doi:10.1289/ehp1759 2017

[9] [9]

Binder, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Sean I

Janos X. Binder, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Sean I. O’Donoghue, Reinhard Schneider, and Lars Juhl Jensen. Compartments: unification and visualization of protein subcellular localization evidence.Database, 2014:bau012, 2014. doi: 10.1093/database/bau012. URLhttps://doi.org/10.1093/database/bau012

work page doi:10.1093/database/bau012 2014

[10] [10]

Bodenreider

Olivier Bodenreider. The Unified Medical Language System (UMLS): integrating biomedical terminology.Nucleic Acids Research, 32(Database issue):D267–D270, January 2004. ISSN 0305-1048. doi: 10.1093/nar/gkh061. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC308795/

work page doi:10.1093/nar/gkh061 2004

[11] [11]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023. ISSN 1476-

work page 2023

[12] [12]

Autonomous chemical research with large language models

doi: 10.1038/s41586-023-06792-0. URL https://www.nature.com/articles/ s41586-023-06792-0

work page doi:10.1038/s41586-023-06792-0

[13] [13]

Augmenting large language models with chemistry tools

Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. Augmenting large-language models with chemistry tools.Nature Ma- chine Intelligence, 6(5):525–535, May 2024. doi: 10.1038/s42256-024-00832-8. URL https://www.nature.com/articles/s42256-024-00832-8

work page doi:10.1038/s42256-024-00832-8 2024

[14] [14]

Brown, Vichet Hem, Kenneth S

Garth R. Brown, Vichet Hem, Kenneth S. Katz, Michael Ovetsky, Craig Wallin, Olga Ermolaeva, Igor Tolstoy, Tatiana Tatusova, Kim D. Pruitt, Donna R. Maglott, and Terence D. Murphy. Gene: a gene-centered information resource at NCBI.Nucleic Acids Research, 43(Database issue):D36–D42, January 2015. ISSN 0305-1048. doi: 10.1093/nar/gku1055. URL https: //pmc.n...

work page doi:10.1093/nar/gku1055 2015

[15] [15]

Rosen, Ger- brand Ceder, Kristin A

John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S. Rosen, Ger- brand Ceder, Kristin A. Persson, and Anubhav Jain. Structured information extraction from scientific text with large language models.Nature Communications, 15(1):1418, February

work page

[16] [16]

doi: 10.1038/s41467-024-45563-x

ISSN 2041-1723. doi: 10.1038/s41467-024-45563-x. URL https://www.nature. com/articles/s41467-024-45563-x

work page doi:10.1038/s41467-024-45563-x 2041

[17] [17]

Wiegers, Robin J

Allan Peter Davis, Thomas C. Wiegers, Robin J. Johnson, Daniela Sciaky, Jolene Wiegers, and Carolyn J. Mattingly. Comparative toxicogenomics database (CTD): update 2023.Nucleic Acids Research, 51(D1):D1257–D1262, 2023. doi: 10.1093/nar/gkac833. URL https://doi. org/10.1093/nar/gkac833

work page doi:10.1093/nar/gkac833 2023

[18] [18]

ChEBI: a database and ontology for chemical entities of biological interest.Nucleic Acids Research, 36 (Database issue):D344–D350, January 2008

Kirill Degtyarenko, Paula de Matos, Marcus Ennis, Janna Hastings, Martin Zbinden, Alan McNaught, Rafael Alcántara, Michael Darsow, Mickaël Guedj, and Michael Ashburner. ChEBI: a database and ontology for chemical entities of biological interest.Nucleic Acids Research, 36 (Database issue):D344–D350, January 2008. ISSN 0305-1048. doi: 10.1093/nar/gkm791. UR...

work page doi:10.1093/nar/gkm791 2008

[19] [19]

ADME evaluation in drug discovery

Tingjun Hou, Junmei Wang, Wei Zhang, and Xiaojie Xu. ADME evaluation in drug discovery

work page

[20] [20]

doi: 10.1021/ci6003515

Can oral bioavailability in humans be effectively predicted by simple molecular property- based rules?Journal of Chemical Information and Modeling, 47(2):460–463, 2007. doi: 10.1021/ci6003515. 11

work page doi:10.1021/ci6003515 2007

[21] [21]

Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Con- nor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Artificial intelligence foun- dation for therapeutic science.Nature Chemical Biology, 18:1033–1036, 2022. doi: 10.1038/s41589-022-01131-2. URLhttps://doi.org/10.1038/s41589-022-01131-2

work page doi:10.1038/s41589-022-01131-2 2022

[22] [22]

Carter, Xin Zhou, Matthew Wheeler, Jonathan A

Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Junze Zhang, Yin Di, et al. Biomni: A general-purpose biomedical ai agent.bioRxiv preprint, 2025. doi: 10.1101/2025.05.30.656746

work page doi:10.1101/2025.05.30.656746 2025

[23] [23]

Huntley, Tony Sawford, Prudence Mutowo-Meullenet, Aleksandra Shypitsyna, Carlos Bonilla, Maria J

Rachael P. Huntley, Tony Sawford, Prudence Mutowo-Meullenet, Aleksandra Shypitsyna, Carlos Bonilla, Maria J. Martin, and Claire O’Donovan. The GOA database: Gene Ontology annotation updates for 2015.Nucleic Acids Research, 43(Database issue):D1057–D1063, 2015. doi: 10.1093/nar/gku1113

work page doi:10.1093/nar/gku1113 2015

[24] [24]

Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Benjamin Moody, Brian Gow, Li wei H. Lehman, Leo Anthony Celi, and Roger G. Mark. Mimic-iv, a freely accessible electronic health record dataset.Scientific Data, 10, 2023. URLhttps://doi.org/10.1038/s41597-022-01899-x

work page doi:10.1038/s41597-022-01899-x 2023

[25] [25]

John M. Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ron- neberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andy Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, ...

work page 2021

[26] [26]

Journal of the American Chemical Society , author =

Steven M. Kearnes, Michael R. Maser, Michael Wleklinski, Anton Kast, Abigail G. Doyle, Spencer D. Dreher, Joel M. Hawkins, Klavs F. Jensen, and Connor W. Coley. The Open Reaction Database.Journal of the American Chemical Society, 143(45):18820–18826, November 2021. ISSN 0002-7863. doi: 10.1021/jacs.1c09820. URL https://doi.org/10.1021/jacs. 1c09820

work page doi:10.1021/jacs.1c09820 2021

[27] [27]

Kim, Alexander Sedykh, Suman K

Marlene T. Kim, Alexander Sedykh, Suman K. Chakravarti, Roustem D. Saiakhov, and Hao Zhu. Critical evaluation of human oral bioavailability for pharmaceutical drugs by using various cheminformatics approaches, 2014

work page 2014

[28] [28]

Thiessen, Evan E

Sunghwan Kim, Paul A. Thiessen, Evan E. Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A. Shoemaker, Jiyao Wang, Bo Yu, Jian Zhang, and Stephen H. Bryant. PubChem Substance and Compound databases.Nucleic Acids Research, 44 (D1):D1202–1213, January 2016. ISSN 1362-4962. doi: 10.1093/nar/gkv951

work page doi:10.1093/nar/gkv951 2016

[29] [29]

$\texttt{MiniMol}$: A Parameter- Efficient Foundation Model for Molecular Learning, April 2024

Kerstin Kläser, Bła˙zej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, and Andrew Fitzgibbon. $\texttt{MiniMol}$: A Parameter- Efficient Foundation Model for Molecular Learning, April 2024. URL http://arxiv.org/ abs/2404.14986. arXiv:2404.14986 [cs]

work page arXiv 2024

[30] [30]

DrugBank 6.0: the DrugBank Knowledgebase for 2024.Nucleic Acids Research, 52(D1):D1265–D1275, January 2024

Craig Knox, Mike Wilson, Christen M Klinger, Mark Franklin, Eponine Oler, Alex Wilson, Allison Pon, Jordan Cox, Na Eun (Lucy) Chin, Seth A Strawbridge, Marysol Garcia-Patino, Ray Kruger, Aadhavya Sivakumaran, Selena Sanford, Rahil Doshi, Nitya Khetarpal, Omolola Fatokun, Daphnee Doucet, Ashley Zubkowski, Dorsa Yahya Rayat, Hayley Jackson, Karxena Harford,...

work page doi:10.1093/nar/gkad976 2024

[31] [31]

Landrum, Jennifer M

Melissa J. Landrum, Jennifer M. Lee, Mark Benson, Garth R. Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Karen Karapetyan, 12 Kenneth Katz, Chunlei Liu, Zenith Maddipatla, Adriana Malheiro, Kurt McDaniel, Michael Ovetsky, George Riley, George Zhou, J. Bradley Holmes, Brandi L. Kattman, and Donna R. Maglo...

work page doi:10.1093/nar/gkx1153 2018

[32] [32]

ReactionSeek: LLM-powered literature data mining and knowledge discovery in organic synthesis.Nature Communications, 17(1):3356,

Jiawei Li, Minzhou Li, Qi Yang, and Sanzhong Luo. ReactionSeek: LLM-powered literature data mining and knowledge discovery in organic synthesis.Nature Communications, 17(1):3356,

work page

[33] [33]

URL https://www.nature.com/articles/ s41467-026-70180-1

doi: 10.1038/s41467-026-70180-1. URL https://www.nature.com/articles/ s41467-026-70180-1

work page doi:10.1038/s41467-026-70180-1

[34] [34]

InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5

Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, and Colin Zhang. dots.ocr: Multilingual document layout parsing in a single vision-language model, 2025. URL https://arxiv.org/ abs/2512.02498

work page arXiv 2025

[35] [35]

reaction-ship

Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, and Tat-Seng Chua. ReactXT: Understanding molecular “reaction-ship” via reaction- contextualized molecule-text pretraining. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5353–5377, B...

work page 2024

[36] [36]

Lowe, Peter T

Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, and Robert C. Glen. Chemical Name to Structure: OPSIN, an Open Source Solution.Journal of Chemical Information and Modeling, 51(3):739–753, March 2011. ISSN 1549-9596. doi: 10.1021/ci100384d. URL https: //doi.org/10.1021/ci100384d

work page doi:10.1021/ci100384d 2011

[37] [37]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, September 2024. URL http://arxiv.org/abs/2408.06292. arXiv:2408.06292 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Toxacol: an endpoint-aware and task-focused compound representation learning paradigm for acute toxicity assessment.Nature Communications, 16:5992, 2025

Jiang Lu, Lianlian Wu, Ruijiang Li, Mengxuan Wan, Jun Yang, Peng Zan, Hui Bai, Song He, and Xiaochen Bo. Toxacol: an endpoint-aware and task-focused compound representation learning paradigm for acute toxicity assessment.Nature Communications, 16:5992, 2025. doi: 10.1038/s41467-025-60989-7. URLhttps://doi.org/10.1038/s41467-025-60989-7

work page doi:10.1038/s41467-025-60989-7 2025

[39] [39]

Paperqa: Retrieval-augmented generative agent for scientific research,

Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G. Rodriques, and Andrew D. White. PaperQA: Retrieval-Augmented Generative Agent for Scientific Research, December 2023. URLhttp://arxiv.org/abs/2312.07559. arXiv:2312.07559 [cs]

work page arXiv 2023

[40] [40]

Chang-Ying Ma, Sheng-Yong Yang, Hui Zhang, Mingli Xiang, Qi Huang, and Yuquan Wei. Prediction models of human plasma protein binding rate and oral bioavailability derived by using ga–cg–svm method.Journal of Pharmaceutical and Biomedical Analysis, 47(4–5):677–682,

work page

[41] [41]

URL https://doi.org/10.1016/j.jpba.2008

doi: 10.1016/j.jpba.2008.03.023. URL https://doi.org/10.1016/j.jpba.2008. 03.023

work page doi:10.1016/j.jpba.2008.03.023 2008

[42] [42]

Iyad Majid, Vaibhav Mishra, Rohith Ravindranath, and Sophia Y . Wang. Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.AMIA Annual Symposium Proceedings, 2024:778–787, May 2025. ISSN 1942-597X. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12099357/

work page 2024

[43] [43]

Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E

Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce V . Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yarosl...

work page doi:10.1289/ehp8495 2021

[44] [44]

Foulger, Sarah Leigh, Louise C

Antonio Rueda Martin, Eleanor Williams, Rebecca E. Foulger, Sarah Leigh, Louise C. Daugh- erty, Olivia Niblock, Ivone U. S. Leong, Katherine R. Smith, Oleg Gerasimenko, Eik Haraldsdot- tir, Ellen Thomas, Richard H. Scott, Emma Baple, Arianna Tucci, Helen Brittain, Anna de Burca, Kristina Ibañez, Dalia Kasperaviciute, Damian Smedley, Mark Caulfield, August...

work page doi:10.1038/s41588-019-0528-2 2019

[45] [45]

Teixeira, Luis Pinheiro, and Andre O

Ines Filipa Martins, Ana L. Teixeira, Luis Pinheiro, and Andre O. Falcao. A bayesian approach to in silico blood-brain barrier penetration modeling.Journal of Chemical Information and Modeling, 52(6):1686–1697, 2012. doi: 10.1021/ci300124c. URL https://doi.org/10. 1021/ci300124c

work page doi:10.1021/ci300124c 2012

[46] [46]

Fanwang Meng, Yang Xi, Jinfeng Huang, and Paul W. Ayers. A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors.Scientific Data, 8 (1):289, October 2021. ISSN 2052-4463. doi: 10.1038/s41597-021-01069-5. URL https: //www.nature.com/articles/s41597-021-01069-5

work page doi:10.1038/s41597-021-01069-5 2021

[47] [47]

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Dániel L. Barabási, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha S. Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk Caldas Ramos, Albert Bou, Kaleigh F. Roberts, Sladja...

work page internal anchor Pith review arXiv 2025

[48] [48]

Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G

Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, and Andrew D. White. Aviary: training language agents on challenging scientific tasks, December 2024. URL http://arxiv.org/abs/2412.21154. arXiv:2412.21154 [cs]

work page arXiv 2024

[49] [49]

Generif: Gene reference into function

National Center for Biotechnology Information. Generif: Gene reference into function. https: //www.ncbi.nlm.nih.gov/gene/about-generif

work page

[50] [50]

S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing

Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar. ScispaCy: Fast and robust models for biomedical natural language processing. In Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, and Junichi Tsujii, editors,Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy, August 2019. Association for Computationa...

work page doi:10.18653/v1/w19-5034 2019

[51] [51]

Pharmabench: Enhancing admet benchmarks with large language models.Scientific Data, 11(1):985, 2024

Zhangming Niu, Xianglu Xiao, Wenfan Wu, Qiwei Cai, Yinghui Jiang, Wangzhen Jin, Min- hao Wang, Guojian Yang, Lingkang Kong, Xurui Jin, Guang Yang, and Hongming Chen. Pharmabench: Enhancing admet benchmarks with large language models.Scientific Data, 11(1):985, 2024. doi: 10.1038/s41597-024-03793-0. URL https://doi.org/10.1038/ s41597-024-03793-0

work page doi:10.1038/s41597-024-03793-0 2024

[52] [52]

Do LLMs Surpass Encoders for Biomedical NER?Proceedings

Motasem S Obeidat, Md Sultan Al Nahian, and Ramakanth Kavuluru. Do LLMs Surpass Encoders for Biomedical NER?Proceedings. IEEE International Conference on Healthcare Informatics, 2025:352–358, June 2025. ISSN 2575-2626. doi: 10.1109/ICHI64645.2025.00048. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12335919/. 14

work page doi:10.1109/ichi64645.2025.00048 2025

[53] [53]

Guillaume Ollitrault, Marco Marzo, Alessandra Roncaglioni, Emilio Benfenati, Olivier Taboureau, and Enrico Mombelli. Qsar models for predicting oral bioavailability and vol- ume of distribution and their application in mapping the tk space of endocrine disrup- tors.Journal of Xenobiotics, 15(5):166, 2025. doi: 10.3390/jox15050166. URL https: //doi.org/10....

work page doi:10.3390/jox15050166 2025

[54] [54]

SubCellBarCode: Proteome-wide mapping of protein localization and relocalization.Molecular Cell, 73(1): 166–182.e7, 2019

Lukas Minus Orre, Mattias Vesterlund, Yanbo Pan, Taner Arslan, Yafeng Zhu, Alejandro Fernandez Woodbridge, Oliver Frings, Erik Fredlund, and Janne Lehtiö. SubCellBarCode: Proteome-wide mapping of protein localization and relocalization.Molecular Cell, 73(1): 166–182.e7, 2019. doi: 10.1016/j.molcel.2018.11.035

work page doi:10.1016/j.molcel.2018.11.035 2019

[55] [55]

Janet Piñero, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I. Furlong. The DisGeNET knowledge platform for disease genomics: 2019 update.Nucleic Acids Research, 48(D1):D845–D855, 2020. doi: 10.1093/nar/ gkz1021

work page doi:10.1093/nar/ 2019

[56] [56]

Binder, and Lars Juhl Jensen

Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X. Binder, and Lars Juhl Jensen. Diseases: Text mining and data integration of disease–gene associations.Methods, 74:83–89,

work page

[57] [57]

URL https://doi.org/10.1016/j.ymeth.2014

doi: 10.1016/j.ymeth.2014.11.020. URL https://doi.org/10.1016/j.ymeth.2014. 11.020

work page doi:10.1016/j.ymeth.2014.11.020 2014

[58] [58]

Bran, Malte Franke, Rémi Schlama, Jeremy S

Victor Sabanza Gil, Andres M. Bran, Malte Franke, Rémi Schlama, Jeremy S. Luterbacher, and Philippe Schwaller. Holistic chemical evaluation reveals pitfalls in reaction prediction models. InNeurIPS 2023 AI for Science Workshop, 2023. doi: 10.48550/arXiv.2312.09004. URLhttps://arxiv.org/abs/2312.09004

work page doi:10.48550/arxiv.2312.09004 2023

[59] [59]

NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database: The Journal of Biological Databases and Curation, 2020:baaa062, August 2020

Conrad L Schoch, Stacy Ciufo, Mikhail Domrachev, Carol L Hotton, Sivakumar Kannan, Rogneda Khovanskaya, Detlef Leipe, Richard Mcveigh, Kathleen O’Neill, Barbara Robbertse, Shobha Sharma, Vladimir Soussov, John P Sullivan, Lu Sun, Seán Turner, and Ilene Karsch- Mizrachi. NCBI Taxonomy: a comprehensive update on curation, resources and tools.Database: The J...

work page 2020

[60] [60]

URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7408187/

doi: 10.1093/database/baaa062. URL https://pmc.ncbi.nlm.nih.gov/articles/ PMC7408187/

work page doi:10.1093/database/baaa062

[61] [61]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, February 2024. URL https://arxiv. org/abs/2402.03300v3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[62] [62]

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, A. J. Ostrow, Akhila Ananthram, Akshay Nathan, Alan Luo, Alec Helyar, Aleksander Madry, Aleksandr Efremov, Aleksandra Spyra, Alex Baker- Whitcomb, Alex Beutel, Alex Karpenko, Alex Makelov, Alex Neitz, Alex Wei, Alexandra Barr, Alexandre Kirchmeyer,...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[63] [63]

Language agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024

Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, and Andrew D. White. Language agents achieve superhuman synthesis of scientific knowledge, September 2024. URL http: //arxiv.org/abs/2409.13740. arXiv:2409.13740 [cs]

work page arXiv 2024

[64] [64]

BERN2: an advanced neural biomedical named entity recognition and normalization tool.Bioinformatics, 38(20):4837–4839, October 2022

Mujeen Sung, Minbyul Jeong, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, and Jaewoo Kang. BERN2: an advanced neural biomedical named entity recognition and normalization tool.Bioinformatics, 38(20):4837–4839, October 2022. ISSN 1367-4811. doi: 10.1093/ bioinformatics/btac598. URLhttps://doi.org/10.1093/bioinformatics/btac598

work page doi:10.1093/bioinformatics/btac598 2022

[65] [65]

Tongyi DeepResearch Technical Report

Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Ga...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[66] [66]

Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M

Peter J. Thul, Lovisa Åkesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Björk, Lisa M. Breckels, Anna Bäckström, Frida Danielsson, Linn Fagerberg, Jenny Fall, Laurent Gatto, Christian Gnann, Sophia Hober, Martin Hjelmare, Fredric Johansson, Sunjae Lee, Cecilia Lindskog, Jan Mulder, Claire M. Mulv...

work page doi:10.1126/science.aal3321 2017

[67] [67]

Manthena V . S. Varma, R. Scott Obach, Charles Rotter, Howard R. Miller, George Chang, Stefanus J. Steyn, Ayman El-Kattan, and Matthew D. Troutman. Physicochemical space for optimum oral bioavailability: Contribution of human intestinal absorption and first-pass elimination.Journal of Medicinal Chemistry, 53(3):1098–1108, 2010. doi: 10.1021/jm901371v

work page doi:10.1021/jm901371v 2010

[68] [68]

Wall, Risa R

Jonathan T. Wall, Risa R. Sayre, Doris Smith, Samuel Winter, Maxwell Groover, Jasmine Hope, Adriana Webb, Katie Paul Friedman, Madison Feshuk, Antony J. Williams, Charles Lowe, Nisha S. Sipes, Jason Lambert, Jennifer H. Olker, Russell S. Thomas, Colleen Elonen, Richard S. Judson, and Chelsea A. Weitekamp. Development of the toxicity values database, toxva...

work page doi:10.1016/j.comtox.2025.100365 2025

[69] [69]

Collins, and César de la Fuente-Nunez

Fangping Wan, Felix Wong, James J. Collins, and César de la Fuente-Nunez. Machine learning for antimicrobial peptide identification and design.Nature Reviews Bioengineering, 2:392 – 407, 2024. URLhttps://doi.org/10.1038/s44222-024-00152-x

work page doi:10.1038/s44222-024-00152-x 2024

[70] [70]

Pubtator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.Nucleic Acids Research, 52(W1):W540–W546,

Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, and Zhiyong Lu. Pubtator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.Nucleic Acids Research, 52(W1):W540–W546,

work page

[71] [71]

Nucleic Acids Research , volume =

doi: 10.1093/nar/gkae235. URLhttps://doi.org/10.1093/nar/gkae235

work page doi:10.1093/nar/gkae235

[72] [72]

Min Wei, Xudong Zhang, Xiaolin Pan, Bo Wang, Changge Ji, Yifei Qi, and John Z. H. Zhang. Hobpre: accurate prediction of human oral bioavailability for small molecules.Journal of Cheminformatics, 14(1), 2022. doi: 10.1186/s13321-021-00580-6. URL https://doi.org/ 10.1186/s13321-021-00580-6

work page doi:10.1186/s13321-021-00580-6 2022

[73] [73]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The AI Scientist-v2: Workshop-Level Automated Scientific Dis- covery via Agentic Tree Search, April 2025. URL http://arxiv.org/abs/2504.08066. arXiv:2504.08066 [cs]. 17

work page internal anchor Pith review Pith/arXiv arXiv 2025

[74] [74]

Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Haris Ioannidis, David Mendez Lopez, Juan F

Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J. Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F. Mosquera, Maria Paula Magarinos, Nicolas Bosc, Ricardo Arcila, Tevfik Kizilören, Anna Gaulton, A. Patrícia Bento, Melissa F. Adasme, Peter Monecke, Gregory A. Landrum, and Andrew R. Leach. The ChEMBL ...

work page doi:10.1093/nar/gkad1004 2023

[75] [75]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models, June 2025. URL http://arxiv.org/abs/2506.05176. arXiv:2506.05176 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025

[76] [76]

Univer- salNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.International Conference on Learning Representations, 2024:12276–12294, May 2024

Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, and Hoifung Poon. Univer- salNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.International Conference on Learning Representations, 2024:12276–12294, May 2024. URL https://proceedings.iclr.cc/paper_files/paper/2024/hash/ 34678d08b36076de986df95c5bbba92f-Abstract-Conference.html

work page 2024

[77] [77]

deep research

Hao Zhu, Todd M. Martin, Lin Ye, Alexander Sedykh, Douglas M. Young, and Alexander Tropsha. Quantitative structure–activity relationship modeling of rat acute toxicity by oral exposure.Chemical Research in Toxicology, 22(12):1913–1921, 2009. doi: 10.1021/tx900189p. URLhttps://doi.org/10.1021/tx900189p. A Release Plan, Limitations, and Broader Impact A cor...

work page doi:10.1021/tx900189p 1913

[78] [78]

Produces execution trace notebooks alongside output CSVs

BioMNI-Phylo: A sandboxed Python coding agent with iterative cell execution, access to tools such as NCBI E-utilities, ChEMBL, PubChem APIs, and NLP libraries (scispaCy). Produces execution trace notebooks alongside output CSVs

work page

[79] [79]

As it cannot produce data files, we pair it with acontinuation agent(Claude Code) that executes the recommended pipelines to produce CSVs

Claude Research Mode(Opus 4.6 Extended): A web-based deep research agent on claude.ai that produces comprehensive narrative reports. As it cannot produce data files, we pair it with acontinuation agent(Claude Code) that executes the recommended pipelines to produce CSVs. This constitutes a two-stage pipeline

work page

[80] [80]

Like Claude Research Mode, it produces no data files; a continuation agent executes the designed pipelines

GPT-5.4 Deep Research: A web-based deep research agent on chatgpt.com that produces comprehensive narrative reports. Like Claude Research Mode, it produces no data files; a continuation agent executes the designed pipelines. This is the only configuration that attempted genuine PubMed text mining (E-utilities + PubTator + regex extraction) for 4 of 5 tasks

work page