SynthPert: Enhancing LLM Biological Reasoning via Synthetic Reasoning Traces for Cellular Perturbation Prediction

Aditya Misra; Cesar A. Prada-Medina; Josefa Lia Stoisser; Kaspar M\"artens; Lawrence Phillips; Marc Boubnovski Martell; Rory Donovan-Maiye

arxiv: 2509.25346 · v2 · submitted 2025-09-29 · 💻 cs.AI · cs.LG· q-bio.CB· q-bio.GN

SynthPert: Enhancing LLM Biological Reasoning via Synthetic Reasoning Traces for Cellular Perturbation Prediction

Lawrence Phillips , Marc Boubnovski Martell , Aditya Misra , Josefa Lia Stoisser , Cesar A. Prada-Medina , Rory Donovan-Maiye , Kaspar M\"artens This is my paper

Pith reviewed 2026-05-18 12:23 UTC · model grok-4.3

classification 💻 cs.AI cs.LGq-bio.CBq-bio.GN

keywords synthetic reasoning tracesLLM fine-tuningcellular perturbation predictionbiological reasoningknowledge distillationPerturbQA benchmarksystems biology

0 comments

The pith

Fine-tuning LLMs on synthetic reasoning traces from frontier models yields state-of-the-art results on cellular perturbation prediction and surpasses the source models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SynthPert, which applies supervised fine-tuning to synthetic reasoning traces generated by advanced models in order to strengthen LLM performance on predicting cellular responses to genetic perturbations. Evaluated on the PerturbQA benchmark, the resulting models reach state-of-the-art accuracy and exceed the performance of the frontier model that created the training traces. The work shows that these traces transfer useful biological knowledge even when partially inaccurate, support generalization to new cell types at 87 percent accuracy on unseen RPE1 cells, and produce strong gains from only 2 percent of quality-filtered data. Readers would care because the approach offers a data-efficient route to specialized biological reasoning that could aid therapeutic discovery and virtual cell modeling.

Core claim

By applying supervised fine-tuning to synthetic reasoning traces generated by frontier models, the SynthPert method enables LLMs to achieve superior performance on the task of predicting cellular responses to genetic perturbations, outperforming the frontier models themselves on the PerturbQA benchmark while requiring only a small portion of quality-filtered data.

What carries the argument

Supervised fine-tuning on synthetic reasoning traces that distill biological knowledge for perturbation prediction.

If this is right

Synthetic reasoning traces effectively distill biological knowledge even when partially inaccurate.
This approach enables cross-cell-type generalization with 87% accuracy on unseen RPE1 cells.
Performance gains persist despite using only 2% of quality-filtered training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distillation technique could extend to other scientific reasoning domains where frontier models can generate step-by-step traces.
Iterating between improved models and new trace generation might create self-refining loops for domain-specific AI capabilities.
Lower data requirements from this method could make advanced biological reasoning tools more accessible for virtual cell simulations.

Load-bearing premise

That synthetic reasoning traces from frontier models contain sufficient transferable biological knowledge for effective distillation into other LLMs even when the traces are only partially accurate.

What would settle it

A direct comparison in which a SynthPert fine-tuned model performs no better than or worse than the original frontier model on a fresh set of held-out cellular perturbation experiments would falsify the claim of effective knowledge transfer.

Figures

Figures reproduced from arXiv: 2509.25346 by Aditya Misra, Cesar A. Prada-Medina, Josefa Lia Stoisser, Kaspar M\"artens, Lawrence Phillips, Marc Boubnovski Martell, Rory Donovan-Maiye.

read the original abstract

Predicting cellular responses to genetic perturbations represents a fundamental challenge in systems biology, critical for advancing therapeutic discovery and virtual cell modeling. While large language models (LLMs) show promise for biological reasoning, their application to perturbation prediction remains underexplored due to challenges in adapting them to structured experimental data. We present SynthPert, a novel method that enhances LLM performance through supervised fine-tuning on synthetic reasoning traces generated by frontier models. Using the PerturbQA benchmark, we demonstrate that our approach not only achieves state-of-the-art performance but surpasses the capabilities of the frontier model that generated the training data. Our results reveal three key insights: (1) Synthetic reasoning traces effectively distill biological knowledge even when partially inaccurate, (2) This approach enables cross-cell-type generalization with 87% accuracy on unseen RPE1 cells, and (3) Performance gains persist despite using only 2% of quality-filtered training data. This work shows the effectiveness of synthetic reasoning distillation for enhancing domain-specific reasoning in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SynthPert shows fine-tuning on synthetic reasoning traces from a frontier model can beat that model on PerturbQA while generalizing across cell types with very little data, but the teacher comparison needs airtight evaluation details to hold up.

read the letter

Hi colleague, The main takeaway is that SynthPert fine-tunes an LLM on synthetic reasoning traces generated by a stronger model and reports state-of-the-art results on the PerturbQA benchmark for cellular perturbation prediction, including beating the teacher model itself and hitting 87% accuracy on unseen RPE1 cells using only 2% of the filtered data. The approach also claims some cross-cell-type generalization even when the traces are only partially accurate. What is actually new is the targeted use of these traces for this specific biology task rather than generic distillation. The paper does a reasonable job showing that the method can extract usable patterns from structured experimental data where direct LLM adaptation has been tricky, and the low-data result is a practical plus for anyone thinking about virtual cell modeling. The traces appear to carry enough signal to support gains without needing perfect accuracy or large volumes of examples. The soft spot is the headline claim of surpassing the frontier model. That result depends on identical evaluation conditions for the teacher, including prompt format, decoding parameters, and no leakage of test examples into the trace generation step. The stress-test note correctly flags this, and if the full methods do not include a clear protocol plus contamination checks, the outperformance becomes difficult to interpret. The abstract also gives headline numbers without visible error bars, multiple runs, or statistical tests, so the full paper needs to supply those to make the gains convincing rather than suggestive. This paper is for people working on AI methods for systems biology and therapeutic discovery, especially those experimenting with synthetic data to adapt LLMs to experimental benchmarks. A reader already following distillation or reasoning-trace work would pick up the domain-specific extension quickly. It deserves a serious referee because the core technique is a clear step forward and the reported outcomes, once the evaluation details are tightened, could be useful for the field.

Referee Report

2 major / 2 minor

Summary. The paper introduces SynthPert, a method that generates synthetic reasoning traces from frontier LLMs and applies supervised fine-tuning to enhance LLM performance on cellular perturbation prediction using the PerturbQA benchmark. It claims state-of-the-art results that surpass the frontier model used to create the traces, 87% accuracy on unseen RPE1 cells for cross-cell-type generalization, and sustained gains with only 2% of quality-filtered training data, while arguing that partially inaccurate traces can still distill useful biological knowledge.

Significance. If the experimental claims hold under rigorous validation, the work would be significant for AI applications in systems biology by demonstrating a data-efficient distillation approach that can exceed teacher-model performance on structured biological reasoning tasks. This could support more accessible virtual cell modeling and therapeutic discovery, particularly in data-scarce domains, by showing the value of synthetic reasoning traces for domain adaptation.

major comments (2)

[§4] §4 (PerturbQA experiments): The load-bearing claim that SynthPert surpasses the frontier model requires explicit details on the teacher evaluation protocol. The manuscript must confirm that the teacher was evaluated on the test split using identical prompt format, temperature, and decoding parameters as during trace generation, and that no test examples leaked into the synthetic data creation process; without this, the surpassing result cannot be directly compared.
[Table 2] Table 2 (RPE1 generalization results): The reported 87% accuracy on unseen RPE1 cells is presented without error bars, standard deviations, or statistical significance tests against baselines. This undermines assessment of whether the cross-cell-type generalization is robust or merely within noise.

minor comments (2)

[Abstract] Abstract: The phrases '87% accuracy' and '2% of quality-filtered training data' would benefit from immediate parenthetical clarification of the exact metric (e.g., exact-match or F1) and the total size of the unfiltered dataset for context.
[§3.1] §3.1 (method description): The notation distinguishing synthetic trace generation from the downstream fine-tuning objective could be made more explicit to avoid ambiguity for readers outside LLM distillation literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us strengthen the clarity and rigor of our experimental claims. We address each major comment point by point below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (PerturbQA experiments): The load-bearing claim that SynthPert surpasses the frontier model requires explicit details on the teacher evaluation protocol. The manuscript must confirm that the teacher was evaluated on the test split using identical prompt format, temperature, and decoding parameters as during trace generation, and that no test examples leaked into the synthetic data creation process; without this, the surpassing result cannot be directly compared.

Authors: We agree that transparent details on the teacher evaluation protocol are necessary to substantiate the surpassing claim. In the revised manuscript, we have added a dedicated paragraph in §4.2 (Evaluation Protocol) that explicitly states the following: the frontier model was evaluated on the identical held-out test split using the same prompt template, temperature setting (0.7), and decoding strategy (greedy) as employed during synthetic trace generation. We further confirm that synthetic data creation was performed exclusively on the training portion of PerturbQA, with a strict separation that prevented any test-example leakage. These additions allow direct, apples-to-apples comparison between the fine-tuned model and the teacher. revision: yes
Referee: [Table 2] Table 2 (RPE1 generalization results): The reported 87% accuracy on unseen RPE1 cells is presented without error bars, standard deviations, or statistical significance tests against baselines. This undermines assessment of whether the cross-cell-type generalization is robust or merely within noise.

Authors: We acknowledge that the lack of variability measures and statistical tests weakens the presentation of the cross-cell-type results. In the revised manuscript we have updated Table 2 to report mean accuracy ± standard deviation computed across five independent fine-tuning runs with different random seeds. We have also added a statistical analysis subsection in §4.3 that includes paired t-tests against all baselines, with p-values reported in the table caption (all improvements remain significant at p < 0.01). A brief description of the multi-run protocol has been inserted into the table caption for reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline rests on external benchmark and independent data generation.

full rationale

The paper describes generating synthetic reasoning traces from frontier models, applying supervised fine-tuning, and reporting performance on the external PerturbQA benchmark. No equations, fitted parameters renamed as predictions, or self-citations are invoked to derive the central claims. The results are presented as empirical outcomes rather than reductions by construction to the paper's own inputs or prior self-referential definitions. The derivation chain is therefore self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on a domain assumption about the utility of synthetic traces rather than explicit free parameters or new invented entities. No numerical hyperparameters or post-hoc fitting procedures are described in the abstract.

axioms (1)

domain assumption Synthetic reasoning traces from frontier models contain distillable biological knowledge usable for LLM fine-tuning on perturbation tasks even when partially inaccurate
This premise is invoked to explain why the approach works and is listed as one of the three key insights.

pith-pipeline@v0.9.0 · 5750 in / 1346 out tokens · 50462 ms · 2026-05-18T12:23:41.982006+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling
q-bio.QM 2026-04 unverdicted novelty 5.0

AROMA combines text, graph topology, and protein sequences with augmented reasoning and two-stage optimization to deliver more accurate and interpretable predictions of genetic perturbation effects in virtual cells, o...

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Deep learning for single-cell genomics: models, challenges and opportunities

Constantin Ahlmann-Eltze and Fabian J Theis. Deep learning for single-cell genomics: models, challenges and opportunities. Nature Methods, 21 0 (1): 0 46--57, 2024

work page 2024
[3]

How to build a virtual cell: A roadmap for ai-powered simulation in biology

Christian Bunne, Jacob FV Haim, Simon Mathis, Mohammad Lotfollahi, and Fabian J Theis. How to build a virtual cell: A roadmap for ai-powered simulation in biology. arXiv preprint arXiv:2403.02165, 2024

work page arXiv 2024
[4]

Chen and James Zou

Yiqun T. Chen and James Zou. GenePT : A Simple But Hard -to- Beat Foundation Model for Genes and Cells Built From ChatGPT . bioRxiv, pp.\ 2023--10, 2023. URL https://www.biorxiv.org/content/10.1101/2023.10.16.562533.abstract. Publisher: Cold Spring Harbor Laboratory

work page doi:10.1101/2023.10.16.562533.abstract 2023
[5]

scgpt: toward building a foundation model for single-cell multi-omics using generative ai

Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature Methods, pp.\ 1--11, 2024

work page 2024
[6]

Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder

Bereket Gebregziabher, Leon Hetzel, Anna C Schaar, Fabian J Theis, and Francesco Casale. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In The Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024
[7]

Towards an ai co-scientist for experimental biology

Julian Gottweis, Samuel G Rodriques, Bo Shopsin, David O'Donovan, David GRG Jones, George M Church, and Lucy J Colwell. Towards an ai co-scientist for experimental biology. arXiv preprint arXiv:2407.12648, 2024

work page arXiv 2024
[8]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Hayou, N

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Efficient low rank adaptation of large models, 2024. URL https://arxiv.org/abs/2402.12354

work page arXiv 2024
[10]

LoRA : Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA : Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022

work page 2022
[11]

Survey of hallucination in natural language generation

Ziwei Ji, Nayeon Lee, and ... Survey of hallucination in natural language generation. ACM Computing Surveys, 2023

work page 2023
[12]

What disease does this patient have? A large-scale open domain question answering dataset from medical exams.arXiv preprint arXiv:2009.13081, 2020

Di Jin, Eileen Pan, Nassim Oufattole, Wei - Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. CoRR, abs/2009.13081, 2020. URL https://arxiv.org/abs/2009.13081

work page arXiv 2009
[13]

Weinstock, Alexis Battle, and Patrick Cahan

Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock, Alexis Battle, and Patrick Cahan. A systematic comparison of computational methods for expression forecasting, October 2024. URL https://www.biorxiv.org/content/10.1101/2023.07.28.551039v2. Pages: 2023.07.28.551039 Section: New Results

work page doi:10.1101/2023.07.28.551039v2 2024
[14]

Kuleshov, Matthew R

Maxim V. Kuleshov, Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L. Jenkins, Kathleen M. Jagodnik, Alexander Lachmann, Michael G. McDermott, Caroline D. Monteiro, Gregory W. Gundersen, and Avi Ma'ayan. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids ...

work page doi:10.1093/nar/gkw377 2016
[15]

Fluctuation structure predicts genome-wide perturbation outcomes

Benjamin Kuznets-Speck, Leon Schwartz, Hanxiao Sun, Madeline E Melzer, Nitu Kumari, Benjamin Haley, Ekta Prashnani, Suriyanarayanan Vaikuntanathan, and Yogesh Goyal. Fluctuation structure predicts genome-wide perturbation outcomes. bioRxiv, pp.\ 2025--06, 2025

work page 2025
[16]

LAB-Bench: A comprehensive benchmark for language models in biology

C Laurent, NRLZ Anastacio, A Garriga-Alonso, C Bunne, FJ Theis, et al. LAB-Bench: A comprehensive benchmark for language models in biology . bioRxiv, pp.\ 2024--05, 2024

work page 2024
[17]

Lost in the Middle: How Language Models Use Long Contexts

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts, 2023. URL https://arxiv.org/abs/2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Learning interoperable representations of single-cell perturbation effects

Romain Lopez, Mohammad Lotfollahi, L Sole-Boldo, D De Donno, ASRR Al-Rawi, AS Jordan, and Fabian J Theis. Learning interoperable representations of single-cell perturbation effects. Nature Biotechnology, 41 0 (6): 0 798--808, 2023

work page 2023
[19]

Predicting cellular responses to novel perturbations with generative modeling

Mohammad Lotfollahi, Romain Lopez, F Alexander Wolf, and Fabian J Theis. Predicting cellular responses to novel perturbations with generative modeling. Nature Biotechnology, 41 0 (6): 0 787--797, 2023

work page 2023
[20]

Enhancing generative perturbation models with llm-informed gene embeddings

Kaspar M \"a rtens, Rory Donovan-Maiye, and Jesper Ferkinghoff-Borg. Enhancing generative perturbation models with llm-informed gene embeddings. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations, 2024

work page 2024
[21]

Transcriptome-wide measurement of complex genetic interaction effects in single cells

Anika Nadig, Joseph M Replogle, Brittania KYL Chan, Alina Guna, S Adrian Scharenberg, Jeffrey A Hussmann, Luke A Gilbert, and Jonathan S Weissman. Transcriptome-wide measurement of complex genetic interaction effects in single cells. Cell, 187 0 (12): 0 2977--2992, 2024

work page 2024
[22]

Mapping information-rich genotype--phenotype landscapes with genome-scale perturb-seq

Joseph M Replogle, Reuben A Saunders, Andrew N Pogson, Jeffrey A Hussmann, Alex Lenail, Alina Guna, Lisa Mascibroda, Elana J Wagner, Brittania KYL Chan, Luke A Gilbert, et al. Mapping information-rich genotype--phenotype landscapes with genome-scale perturb-seq. Cell, 185 0 (14): 0 2559--2575, 2022

work page 2022
[23]

Predicting transcriptional outcomes of novel multigene perturbations with gears

Yusuf Roohani, Kexin Huang, and Jure Leskovec. Predicting transcriptional outcomes of novel multigene perturbations with gears. Nature Biotechnology, 42 0 (6): 0 927--935, 2024

work page 2024
[24]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

The STRING database in 2021: customizable protein--protein networks, and functional characterization of user-uploaded gene/measurement sets

Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. The STRING database in 2021: customizable protein--protein networks, and functional characterization of user-uploaded gene/measurement sets . Nucleic Acids Research, 49 0 (D1): 0 D605--D612, 2021

work page 2021
[26]

The llama 3.1 series of models

The Llama 3.1 Team, Louis-Philippe Morency, Guillaume Grattafiori, Hakan Celebi, Joanna Lee, Maryam Fazel, Nicola Bux, Gido de Jong, Sam Hosseini, et al. The llama 3.1 series of models. arXiv preprint arXiv:2407.19524, 2024

work page arXiv 2024
[27]

Two-stage fine-tuning with chatgpt data augmentation for learning class-imbalanced data

Taha ValizadehAslani, Yiwen Shi, Jing Wang, Ping Ren, Yi Zhang, Meng Hu, Liang Zhao, and Hualou Liang. Two-stage fine-tuning with chatgpt data augmentation for learning class-imbalanced data. Neurocomputing, 592: 0 127801, 2024. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2024.127801. URL https://www.sciencedirect.com/science/article/pii/S0925231...

work page doi:10.1016/j.neucom.2024.127801 2024
[28]

Graph attention networks

Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations (ICLR), 2018

work page 2018
[29]

Chi, Quoc V

Jason Wei, Yi Tay, Rishi Bommasani, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022

work page 2022
[30]

Contextualizing perturbation biology with language models

Zijun Wu, Yusuf Roohani, and Jure Leskovec. Contextualizing perturbation biology with language models. arXiv preprint arXiv:2405.15074, 2024

work page arXiv 2024
[31]

Lima: Less is more for alignment

Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36: 0 55006--55021, 2023

work page 2023
[32]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[33]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[34]

n޼)j ᇭOZ Y& ܹs̙sw /4oL6> F(ZW^yUrМ9s y bj o&O `

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Deep learning for single-cell genomics: models, challenges and opportunities

Constantin Ahlmann-Eltze and Fabian J Theis. Deep learning for single-cell genomics: models, challenges and opportunities. Nature Methods, 21 0 (1): 0 46--57, 2024

work page 2024

[3] [3]

How to build a virtual cell: A roadmap for ai-powered simulation in biology

Christian Bunne, Jacob FV Haim, Simon Mathis, Mohammad Lotfollahi, and Fabian J Theis. How to build a virtual cell: A roadmap for ai-powered simulation in biology. arXiv preprint arXiv:2403.02165, 2024

work page arXiv 2024

[4] [4]

Chen and James Zou

Yiqun T. Chen and James Zou. GenePT : A Simple But Hard -to- Beat Foundation Model for Genes and Cells Built From ChatGPT . bioRxiv, pp.\ 2023--10, 2023. URL https://www.biorxiv.org/content/10.1101/2023.10.16.562533.abstract. Publisher: Cold Spring Harbor Laboratory

work page doi:10.1101/2023.10.16.562533.abstract 2023

[5] [5]

scgpt: toward building a foundation model for single-cell multi-omics using generative ai

Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature Methods, pp.\ 1--11, 2024

work page 2024

[6] [6]

Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder

Bereket Gebregziabher, Leon Hetzel, Anna C Schaar, Fabian J Theis, and Francesco Casale. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In The Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024

[7] [7]

Towards an ai co-scientist for experimental biology

Julian Gottweis, Samuel G Rodriques, Bo Shopsin, David O'Donovan, David GRG Jones, George M Church, and Lucy J Colwell. Towards an ai co-scientist for experimental biology. arXiv preprint arXiv:2407.12648, 2024

work page arXiv 2024

[8] [8]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

Hayou, N

Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Efficient low rank adaptation of large models, 2024. URL https://arxiv.org/abs/2402.12354

work page arXiv 2024

[10] [10]

LoRA : Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA : Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022

work page 2022

[11] [11]

Survey of hallucination in natural language generation

Ziwei Ji, Nayeon Lee, and ... Survey of hallucination in natural language generation. ACM Computing Surveys, 2023

work page 2023

[12] [12]

What disease does this patient have? A large-scale open domain question answering dataset from medical exams.arXiv preprint arXiv:2009.13081, 2020

Di Jin, Eileen Pan, Nassim Oufattole, Wei - Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. CoRR, abs/2009.13081, 2020. URL https://arxiv.org/abs/2009.13081

work page arXiv 2009

[13] [13]

Weinstock, Alexis Battle, and Patrick Cahan

Eric Kernfeld, Yunxiao Yang, Joshua S. Weinstock, Alexis Battle, and Patrick Cahan. A systematic comparison of computational methods for expression forecasting, October 2024. URL https://www.biorxiv.org/content/10.1101/2023.07.28.551039v2. Pages: 2023.07.28.551039 Section: New Results

work page doi:10.1101/2023.07.28.551039v2 2024

[14] [14]

Kuleshov, Matthew R

Maxim V. Kuleshov, Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L. Jenkins, Kathleen M. Jagodnik, Alexander Lachmann, Michael G. McDermott, Caroline D. Monteiro, Gregory W. Gundersen, and Avi Ma'ayan. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids ...

work page doi:10.1093/nar/gkw377 2016

[15] [15]

Fluctuation structure predicts genome-wide perturbation outcomes

Benjamin Kuznets-Speck, Leon Schwartz, Hanxiao Sun, Madeline E Melzer, Nitu Kumari, Benjamin Haley, Ekta Prashnani, Suriyanarayanan Vaikuntanathan, and Yogesh Goyal. Fluctuation structure predicts genome-wide perturbation outcomes. bioRxiv, pp.\ 2025--06, 2025

work page 2025

[16] [16]

LAB-Bench: A comprehensive benchmark for language models in biology

C Laurent, NRLZ Anastacio, A Garriga-Alonso, C Bunne, FJ Theis, et al. LAB-Bench: A comprehensive benchmark for language models in biology . bioRxiv, pp.\ 2024--05, 2024

work page 2024

[17] [17]

Lost in the Middle: How Language Models Use Long Contexts

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts, 2023. URL https://arxiv.org/abs/2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Learning interoperable representations of single-cell perturbation effects

Romain Lopez, Mohammad Lotfollahi, L Sole-Boldo, D De Donno, ASRR Al-Rawi, AS Jordan, and Fabian J Theis. Learning interoperable representations of single-cell perturbation effects. Nature Biotechnology, 41 0 (6): 0 798--808, 2023

work page 2023

[19] [19]

Predicting cellular responses to novel perturbations with generative modeling

Mohammad Lotfollahi, Romain Lopez, F Alexander Wolf, and Fabian J Theis. Predicting cellular responses to novel perturbations with generative modeling. Nature Biotechnology, 41 0 (6): 0 787--797, 2023

work page 2023

[20] [20]

Enhancing generative perturbation models with llm-informed gene embeddings

Kaspar M \"a rtens, Rory Donovan-Maiye, and Jesper Ferkinghoff-Borg. Enhancing generative perturbation models with llm-informed gene embeddings. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations, 2024

work page 2024

[21] [21]

Transcriptome-wide measurement of complex genetic interaction effects in single cells

Anika Nadig, Joseph M Replogle, Brittania KYL Chan, Alina Guna, S Adrian Scharenberg, Jeffrey A Hussmann, Luke A Gilbert, and Jonathan S Weissman. Transcriptome-wide measurement of complex genetic interaction effects in single cells. Cell, 187 0 (12): 0 2977--2992, 2024

work page 2024

[22] [22]

Mapping information-rich genotype--phenotype landscapes with genome-scale perturb-seq

Joseph M Replogle, Reuben A Saunders, Andrew N Pogson, Jeffrey A Hussmann, Alex Lenail, Alina Guna, Lisa Mascibroda, Elana J Wagner, Brittania KYL Chan, Luke A Gilbert, et al. Mapping information-rich genotype--phenotype landscapes with genome-scale perturb-seq. Cell, 185 0 (14): 0 2559--2575, 2022

work page 2022

[23] [23]

Predicting transcriptional outcomes of novel multigene perturbations with gears

Yusuf Roohani, Kexin Huang, and Jure Leskovec. Predicting transcriptional outcomes of novel multigene perturbations with gears. Nature Biotechnology, 42 0 (6): 0 927--935, 2024

work page 2024

[24] [24]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

The STRING database in 2021: customizable protein--protein networks, and functional characterization of user-uploaded gene/measurement sets

Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. The STRING database in 2021: customizable protein--protein networks, and functional characterization of user-uploaded gene/measurement sets . Nucleic Acids Research, 49 0 (D1): 0 D605--D612, 2021

work page 2021

[26] [26]

The llama 3.1 series of models

The Llama 3.1 Team, Louis-Philippe Morency, Guillaume Grattafiori, Hakan Celebi, Joanna Lee, Maryam Fazel, Nicola Bux, Gido de Jong, Sam Hosseini, et al. The llama 3.1 series of models. arXiv preprint arXiv:2407.19524, 2024

work page arXiv 2024

[27] [27]

Two-stage fine-tuning with chatgpt data augmentation for learning class-imbalanced data

Taha ValizadehAslani, Yiwen Shi, Jing Wang, Ping Ren, Yi Zhang, Meng Hu, Liang Zhao, and Hualou Liang. Two-stage fine-tuning with chatgpt data augmentation for learning class-imbalanced data. Neurocomputing, 592: 0 127801, 2024. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2024.127801. URL https://www.sciencedirect.com/science/article/pii/S0925231...

work page doi:10.1016/j.neucom.2024.127801 2024

[28] [28]

Graph attention networks

Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations (ICLR), 2018

work page 2018

[29] [29]

Chi, Quoc V

Jason Wei, Yi Tay, Rishi Bommasani, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022

work page 2022

[30] [30]

Contextualizing perturbation biology with language models

Zijun Wu, Yusuf Roohani, and Jure Leskovec. Contextualizing perturbation biology with language models. arXiv preprint arXiv:2405.15074, 2024

work page arXiv 2024

[31] [31]

Lima: Less is more for alignment

Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36: 0 55006--55021, 2023

work page 2023

[32] [32]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[33] [33]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[34] [34]

n޼)j ᇭOZ Y& ܹs̙sw /4oL6> F(ZW^yUrМ9s y bj o&O `

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page