scBench-Long: Verifiable Benchmarking of Long-Horizon Single-Cell Biology

Arjun Banerjee; Ian Diks; Kenny Workman; Tim Proctor; Zhen Yang

arxiv: 2606.26563 · v1 · pith:JZJ3UO3Lnew · submitted 2026-06-25 · 🧬 q-bio.GN · cs.AI

scBench-Long: Verifiable Benchmarking of Long-Horizon Single-Cell Biology

Ian Diks , Zhen Yang , Arjun Banerjee , Tim Proctor , Kenny Workman This is my paper

Pith reviewed 2026-06-26 02:10 UTC · model grok-4.3

classification 🧬 q-bio.GN cs.AI

keywords single-cell biologyAI agentslong-horizon tasksverifiable benchmarksscientific claimsmulti-step workflowsraw data analysisdeterministic grading

0 comments

The pith

AI agents recover scientific claims from raw single-cell data in only 25 percent of long-horizon tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents scBench-Long as a benchmark that requires agents to produce specific biological conclusions from raw or near-raw single-cell measurements without given methods. It spans 21 evaluations covering melanoma T-cell reactivity, cross-species development, lung tumor aging, and COVID-19 pathology, using paired sequencing, chromatin profiling, immune repertoires, and validation resources. Results from 1,068 trajectories show the strongest model-harness pair succeeds on 16 of 63 runs. This setup tests whether agents can integrate data, metadata, and auxiliary evidence into supported claims over extended workflows rather than isolated steps. The low pass rate indicates that current systems remain limited in turning measurements into complex, verifiable biology statements.

Core claim

scBench-Long contains 21 evaluations in which agents must recover scientific conclusions from raw single-cell data across melanoma CD8 T-cell reactivity, CD8 RNA+ATAC regulatory inference, human-monkey chimera development, KRAS-driven lung tumor aging, and lethal COVID-19 lung pathology. Tasks draw on paired scRNA/TCR sequencing, RNA and chromatin profiling, cross-species transcriptomics, combinatorial scRNA-seq, single-nucleus RNA-seq, immune repertoires, ortholog maps, ligand-receptor resources, and validation evidence. Candidate claims are reproduced, reviewed, and turned into controlled answer vocabularies with deterministic grading and trajectory rubrics. Across 1,068 completed trajecto

What carries the argument

scBench-Long benchmark of 21 evaluations that convert reproduced claims into controlled vocabularies with deterministic grading and trajectory rubrics for long-horizon single-cell workflows.

If this is right

Agents must improve at chaining raw data processing with metadata integration and auxiliary evidence to reach supported claims.
Existing benchmarks that test only broad knowledge or local steps miss the full workflow demands of single-cell studies.
The controlled vocabularies and rubrics provide a repeatable way to track whether future agents close the 25 percent success gap.
Tasks that combine sequencing modalities, cross-species mapping, and validation evidence set concrete targets for multi-step scientific reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The benchmark could be extended by adding new evaluations that require agents to handle noisy or incomplete metadata.
Low success rates suggest current systems may still need human oversight for turning single-cell measurements into publishable claims.
Similar long-horizon setups could be applied to other data-rich fields such as spatial transcriptomics or proteomics to test transfer of the approach.

Load-bearing premise

The 21 evaluations accurately represent the multi-step integration of raw data, metadata, and auxiliary evidence that real single-cell biology research requires without prescribed methods.

What would settle it

A model-harness pair that passes more than half of the 63 runs on the 21 evaluations would show the reported performance gap is not general.

read the original abstract

Single-cell studies require analysts to convert raw measurements into specific biological claims through multi-step workflows and integration of metadata, assay context, and auxiliary evidence. Existing AI-biology benchmarks largely measure broad knowledge, executable workflows, or local analysis steps. We introduce scBench-Long, a benchmark for long-horizon single-cell biology in which agents must recover scientific conclusions from raw or near-raw data without prescribed methods. The benchmark contains 21 evaluations spanning melanoma CD8 T-cell reactivity, CD8 RNA+ATAC regulatory inference, human--monkey chimera development, KRAS-driven lung tumor aging, and lethal COVID-19 lung pathology. Tasks cover paired scRNA/TCR sequencing, RNA and chromatin profiling, cross-species transcriptomics, combinatorial scRNA-seq, single-nucleus RNA-seq, immune repertoires, ortholog maps, ligand--receptor resources, and validation evidence. Candidate claims are reproduced, reviewed, and converted into controlled answer vocabularies with deterministic grading and trajectory rubrics. Across 1,068 completed trajectories, the strongest model--harness pair passes 16/63 runs (25.4\%). scBench-Long evaluates whether agents can move beyond local analysis steps and make complex scientific claims that are supported by single-cell data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

scBench-Long sets up 21 tasks to test long-horizon claim recovery from raw single-cell data but supplies almost no detail on how those tasks were built or validated.

read the letter

The one or two things to know are that this paper introduces scBench-Long, a benchmark of 21 evaluations drawn from real single-cell studies (melanoma CD8 reactivity, RNA+ATAC inference, chimera development, KRAS lung aging, COVID pathology), and reports that the strongest model-harness pair passes only 16 out of 63 runs (25.4%) across 1,068 trajectories.

What is new is the explicit focus on recovering complex scientific claims from raw or near-raw data without prescribed methods, using multiple data modalities and auxiliary resources. The paper does a reasonable job naming concrete domains and data types that go beyond local analysis steps or broad-knowledge tests.

The soft spots sit in the task construction and grading. The abstract states that claims are reproduced, reviewed, and converted into controlled vocabularies with deterministic grading, yet it gives no protocol for that review, no expert validation steps, and no checks that the tasks cannot be solved by local shortcuts or that the rubrics accept only one narrow path. The stress-test concern lands because nothing shown demonstrates that these 21 evaluations faithfully reproduce unprescribed multi-step research workflows. Without those details the 25.4% figure is hard to interpret as evidence of a real agent limitation.

This paper is for groups working on AI research assistants in biology who need harder, more integrated benchmarks. It deserves a serious referee because the core idea addresses a documented gap and the performance number is specific, even though the methods section will require substantial clarification on construction and validation before the results can be trusted.

Referee Report

3 major / 2 minor

Summary. The paper introduces scBench-Long, a benchmark of 21 evaluations spanning melanoma CD8 reactivity, RNA+ATAC inference, chimera development, KRAS lung aging, and COVID pathology. Agents must recover scientific claims from raw or near-raw single-cell data (scRNA/TCR, RNA+ATAC, cross-species, etc.) without prescribed methods; candidate claims are reproduced, reviewed, and converted to controlled vocabularies with deterministic grading. Across 1,068 trajectories the strongest model-harness pair passes 16/63 runs (25.4%).

Significance. If the 21 tasks are shown to require genuine multi-step integration of raw data, metadata, and auxiliary evidence without embedded method guidance or narrow grading, the benchmark would supply a falsifiable, verifiable test of long-horizon biological reasoning that existing knowledge or local-analysis benchmarks do not provide.

major comments (3)

[Task construction / abstract] Task-construction section (and abstract): the claim that tasks require 'unprescribed multi-step integration of raw data, metadata, and auxiliary evidence' rests on the 21 evaluations faithfully reproducing real workflows, yet no expert-validation protocol, inter-reviewer agreement statistics, or explicit comparison demonstrating that the tasks cannot be solved by local steps or that the controlled vocabularies exclude valid alternative conclusions is supplied.
[Results / evaluation protocol] Results and evaluation sections: the headline 16/63 (25.4%) pass rate is reported without baselines for random guessing, purely local analysis strategies, or ablations that remove metadata/auxiliary resources, so it is impossible to attribute the failure rate specifically to the long-horizon requirement rather than other factors.
[Grading / trajectory rubrics] Grading and trajectory-rubric description: deterministic grading is asserted via 'controlled answer vocabularies,' but no concrete examples of rubric items, edge-case handling, or evidence that multiple biologically plausible paths are accepted are given, which directly affects whether the 25.4% figure measures the intended capability.

minor comments (2)

[Abstract / introduction] The abstract and introduction use 'reproduced, reviewed' without citing the source publications or review process for each of the 21 tasks; adding a supplementary table with original references would improve traceability.
[Task overview] Figure or table summarizing the 21 tasks should include columns for data modality, required auxiliary resources, and number of steps, to make the 'long-horizon' claim immediately verifiable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating where we agree revisions are needed and where we provide additional clarification or defense based on the manuscript content.

read point-by-point responses

Referee: [Task construction / abstract] Task-construction section (and abstract): the claim that tasks require 'unprescribed multi-step integration of raw data, metadata, and auxiliary evidence' rests on the 21 evaluations faithfully reproducing real workflows, yet no expert-validation protocol, inter-reviewer agreement statistics, or explicit comparison demonstrating that the tasks cannot be solved by local steps or that the controlled vocabularies exclude valid alternative conclusions is supplied.

Authors: The 21 evaluations are derived directly from published single-cell studies (e.g., melanoma CD8 reactivity, KRAS lung aging), with candidate claims extracted from the original papers' reported conclusions rather than invented workflows. This ensures fidelity to real multi-step processes involving raw data, metadata, and auxiliary resources such as ortholog maps and ligand-receptor databases. We acknowledge that the submitted manuscript does not include a formal expert-validation protocol or inter-reviewer agreement statistics. In revision we will expand the Task construction section with a step-by-step account of claim extraction and review by the authoring team, plus explicit examples illustrating why local single-step analysis is insufficient for each task category. The controlled vocabularies are constructed to accept equivalent biological statements while excluding unsupported alternatives; we will add a paragraph clarifying this design and noting that alternative conclusions are only accepted if they align with the original paper's evidence. revision: partial
Referee: [Results / evaluation protocol] Results and evaluation sections: the headline 16/63 (25.4%) pass rate is reported without baselines for random guessing, purely local analysis strategies, or ablations that remove metadata/auxiliary resources, so it is impossible to attribute the failure rate specifically to the long-horizon requirement rather than other factors.

Authors: We agree that baselines and ablations are necessary to isolate the contribution of long-horizon integration. The current results section reports aggregate pass rates across 1,068 trajectories but does not include these controls. In the revised manuscript we will add (i) a random-guessing baseline computed over the controlled vocabularies, (ii) performance of local-analysis-only agents that receive only one data modality at a time, and (iii) ablation runs that withhold metadata or auxiliary evidence. These additions will allow readers to quantify how much of the 25.4% ceiling is attributable to the requirement for multi-step reasoning. revision: yes
Referee: [Grading / trajectory rubrics] Grading and trajectory-rubric description: deterministic grading is asserted via 'controlled answer vocabularies,' but no concrete examples of rubric items, edge-case handling, or evidence that multiple biologically plausible paths are accepted are given, which directly affects whether the 25.4% figure measures the intended capability.

Authors: The manuscript states that candidate claims are converted to controlled vocabularies with deterministic grading and trajectory rubrics, but the submitted version indeed omits concrete rubric excerpts. In revision we will append a new subsection under Grading that provides (a) verbatim examples of rubric items for two representative tasks (e.g., CD8 TCR reactivity and cross-species chimera development), (b) explicit edge-case rules (e.g., how partial matches or synonymous terminology are scored), and (c) a statement that any biologically equivalent conclusion supported by the same evidence is accepted, with an illustration of two distinct but valid reasoning paths that both receive full credit. revision: partial

Circularity Check

0 steps flagged

No circularity: benchmark introduction is self-contained with no derivation chain

full rationale

The paper presents scBench-Long as a new external benchmark consisting of 21 evaluations drawn from published single-cell studies. It does not derive predictions, fit parameters, or claim mathematical results from its own equations. No steps match the enumerated circularity patterns (self-definitional, fitted-input-as-prediction, self-citation load-bearing, etc.). The central claim—that agents must recover claims from raw data without prescribed methods—is supported by the benchmark's construction details rather than reducing to any internal fit or self-reference. This is the expected non-finding for a benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no information available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5761 in / 1112 out tokens · 17005 ms · 2026-06-26T02:10:33.156078+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 33 canonical work pages · 2 internal anchors

[1]

& Pachter, L

Moses, L. & Pachter, L. Museum of spatial transcrip- tomics.Nature Methods19, 534–546 (2022).https: //doi.org/10.1038/s41592-022-01409-2

work page doi:10.1038/s41592-022-01409-2 2022
[2]

G., Lee, H

Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction to spatial transcriptomics for biomedical research.Genome Medicine14, 68 (2022). https://doi.org/10.1186/s13073-022-01075-1

work page doi:10.1186/s13073-022-01075-1 2022
[3]

M., Sistig, A

Dries, R., Chen, J., Del Rossi, N., Khan, M. M., Sistig, A. & Yuan, G. C. Advances in spatial transcriptomic data analysis.Genome Research31, 1706–1718 (2021). https://doi.org/10.1101/gr.275224.121

work page doi:10.1101/gr.275224.121 2021
[4]

et al.In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surround- ing microenvironment.Cell Reports45, 116827 (2026)

Lyubetskaya, A., Rabe, B., Kavran, A., Bai, Y. et al.In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surround- ing microenvironment.Cell Reports45, 116827 (2026). https://doi.org/10.1016/j.celrep.2025.116827

work page doi:10.1016/j.celrep.2025.116827 2026
[5]

H., Annamalai, D., Woodiwiss, T., McCornack, C., Cleary, R

Ishahak, M., Han, R. H., Annamalai, D., Woodiwiss, T., McCornack, C., Cleary, R. T., DeSouza, P. A., Qu, X., Dahiya, S., Kim, A. H. & Millman, J. R. Genetically en- gineered brain organoids recapitulate spatial and devel- opmental states of glioblastoma progression.Advanced Science12, 2410110 (2025).https://doi.org/10.1002/ advs.202410110

2025
[6]

G., Sun, D., Min, K

Jones, M. G., Sun, D., Min, K. H. J., Colgan, W. N., Wang, H., Torok, T., Cardoso, E. C., Tian, L., Weir, J. A., Chen, V. Z., Koblan, L. W., Yost, K. E., Mathey-Andrews, N., D’Souza, E., Russell, A. J. C., Stickels, R. R., Balderrama, K. S., Rideout, W. M., Dai, M., Marrero, G., Kumar, V., Saqi, A., Chen, F., Weissman, J. S., Yosef, N. & Yang, D. Spatiote...

work page doi:10.1101/2024.10.21 2024
[7]

Darwish, A

Yang, D., Jones, M. G., Naranjo, S., Rideout, W. M., Min, K. H. J., Ho, R., Wu, W., Replogle, J. M., Page, J. L., Quinn, J. J., Horns, F., Qiu, X., Chen, M. Z., Freed-Pastor, W. A., McGinnis, C. S., Patterson, D. M., Gartner, Z. J., Chow, E. D., Bivona, T. G., Chan, M. M., Yosef, N., Jacks, T. & Weissman, J. S. Lineage tracing reveals the phylodynam- ics,...

work page doi:10.1016/j 1905
[8]

DuPage, M., Dooley, A. L. & Jacks, T. Conditional mouse lung cancer models using adenoviral or lentiviral deliv- ery of Cre recombinase.Nature Protocols4, 1064–1072 (2009).https://doi.org/10.1038/nprot.2009.95

work page doi:10.1038/nprot.2009.95 2009
[9]

G., Stickels, R

Rodriques, S. G., Stickels, R. R., Goeva, A., Martin, C. A., Murray, E., Vanderburg, C. R., Welch, J., Chen, L. M., Chen, F. & Macosko, E. Z. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution.Science363, 1463–1467 (2019). https://doi.org/10.1126/science.aaw1219

work page doi:10.1126/science.aaw1219 2019
[10]

Russell, A. J. C., Weir, J. A., Nadaf, N. M., Shabet, M., Kumar, V., Kambhampati, S., Raichur, R., Marrero, G. J., Liu, S., Balderrama, K. S., Vanderburg, C. R., Shan- mugam, V., Tian, L., Iorgulescu, J. B., Yoon, C. H., Wu, C. J., Macosko, E. Z. & Chen, F. Slide-tags enables single- nucleus barcoding for multimodal spatial genomics.Na- ture625, 101–109 (...

2024
[11]

Groh, J., Feng, R., Yuan, X., Liu, L., Klein, D. et al. Mi- croglia activation orchestrates CXCL10-mediated CD8+ T cell recruitment to promote aging-related white matter degeneration.Nature Neuroscience28, 1160–1173 (2025). https://doi.org/10.1038/s41593-025-01955-w

work page doi:10.1038/s41593-025-01955-w 2025
[12]

K., Lin, L., Chang, Y.-C., Teo, E

Singhal, V., Chou, N., Lee, J., Yue, Y., Liu, J., Chock, W. K., Lin, L., Chang, Y.-C., Teo, E. M. L., Aow, J., Lee, H. K., Chen, K. H. & Prabhakar, S. BANKSY unifies cell typ- ing and tissue domain segmentation for scalable spatial omics data analysis.Nature Genetics56, 431–441 (2024). https://doi.org/10.1038/s41588-024-01664-3

work page doi:10.1038/s41588-024-01664-3 2024
[13]

Varrone, M., Tavernari, D., Santamaria-Martinez, A., Walsh, L. A. et al. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nature Genetics56, 74–84 (2024).https://doi.org/10. 1038/s41588-023-01588-4

2024
[14]

& Cai, G

Qin, F., Luo, X., Lu, Q., Cai, B., Xiao, F. & Cai, G. Spatial pattern and differential expression analysis with spatial transcriptomic data.Nucleic Acids Research52, e101 (2024). https://doi.org/10.1093/nar/gkae962

work page doi:10.1093/nar/gkae962 2024
[15]

W., Li, T., Elmentaite, R., Lomakin, A., Kedlian, V., Gayoso, A., Jain, M

Kleshchevnikov, V., Shmatko, A., Dann, E., Aivazidis, A., King, H. W., Li, T., Elmentaite, R., Lomakin, A., Kedlian, V., Gayoso, A., Jain, M. S., Park, J. S., Ra- mona, L., Tuck, E., Arutyunyan, A., Vento-Tormo, R., Gerstung, M., James, L., Stegle, O. & Bayraktar, O. A. Cell2location maps fine-grained cell types in spatial transcriptomics.Nature Biotechno...

work page doi:10.1038/s41587-021-01139-4 2022
[16]

R., Segerstolpe, A., Zhang, M., Avraham-Davidi, I

Biancalani, T., Scalia, G., Buffoni, L., Avasthi, R., Lu, Z., Sanger, A., Tokcan, N., Vanderburg, C. R., Segerstolpe, A., Zhang, M., Avraham-Davidi, I. & Regev, A. Deep learning and alignment of spatially resolved single- cell transcriptomes with Tangram.Nature Methods 18, 1352–1362 (2021).https://doi.org/10.1038/ s41592-021-01264-7

2021
[17]

J., Hicks, S

Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C. et al. Eleven grand challenges in single-cell data science.Genome Biology21, 31 (2020). https://doi.org/10.1186/s13059-020-1926-6

work page doi:10.1186/s13059-020-1926-6 2020
[18]

C., Lance, C., Litinetskaya, A., Drost, F

Heumos, L., Schaar, A. C., Lance, C., Litinetskaya, A., Drost, F. et al. Best practices for single-cell analysis across modalities.Nature Reviews Genetics24, 550–572 (2023). https://doi.org/10.1038/s41576-023-00586-w

work page doi:10.1038/s41576-023-00586-w 2023
[19]

M., Zheng, S

Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W. M., Zheng, S. et al. Integrated analysis of multimodal single-cell data.Cell184, 3573–3587.e29 (2021).https: //doi.org/10.1016/j.cell.2021.04.048

work page doi:10.1016/j.cell.2021.04.048 2021
[20]

Zheng, G. X. Y., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W. et al. Massively parallel digital transcriptional profiling of single cells.Nature Communications8, 14049 (2017).https://doi.org/10.1038/ncomms14049

work page doi:10.1038/ncomms14049 2017
[21]

& Davis, M

Han, A., Glanville, J., Hansmann, L. & Davis, M. M. Link- ing T-cell receptor sequence to functional phenotype at the single-cell level.Nature Biotechnology32, 684–692 (2014).https://doi.org/10.1038/nbt.2938

work page doi:10.1038/nbt.2938 2014
[22]

R., Björklund, Å

Picelli, S., Faridani, O. R., Björklund, Å. K., Winberg, G., Sagasser, S. & Sandberg, R. Full-length RNA-seq from single cells using Smart-seq2.Nature Protocols9, 171–181 (2014).https://doi.org/10.1038/nprot.2014.006

work page doi:10.1038/nprot.2014.006 2014
[23]

B., Roco, C

Rosenberg, A. B., Roco, C. M., Muscat, R. A., Kuchina, A., Sample, P. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science360, 176–182 (2018).https://doi.org/10.1126/ science.aam8999

2018
[24]

Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell-cell communication from combined expression of multi- subunit ligand-receptor complexes.Nature Protocols 15, 1484–1506 (2020).https://doi.org/10.1038/ s41596-020-0292-x

2020
[25]

F., Zhang, L., Chang, I., Ramos, R

Jin, S., Guerrero-Juarez, C. F., Zhang, L., Chang, I., Ramos, R. et al. Inference and analysis of cell-cell communication using CellChat.Nature Communica- tions12, 1088 (2021).https://doi.org/10.1038/ s41467-021-21246-9

2021
[26]

Workman, K., Yang, Z., Muralidharan, H. & Le, H. SpatialBench: Can agents analyze real-world spatial biology data?arXivarXiv:2512.21907 (2025). https: //doi.org/10.48550/arXiv.2512.21907

work page doi:10.48550/arxiv.2512.21907 2025
[27]

Workman, K., Yang, Z., Muralidharan, H., Abdulali, A. & Le, H. scBench: Evaluating AI agents on single- cell RNA-seq analysis.arXivarXiv:2602.09063 (2026). https://doi.org/10.48550/arXiv.2602.09063

work page doi:10.48550/arxiv.2602.09063 2026
[28]

& Workman, K

Diks, I., Muralidharan, H., Proctor, T. & Workman, K. Verifiable benchmarking of long-horizon spatial biology. arXivarXiv:2605.28065 (2026). https://doi.org/10. 48550/arXiv.2605.28065

Pith/arXiv arXiv 2026
[29]

G., Shih, J.-H., Zhao, B

Qu, Y., Lu, Y., Tu, X., Zhang, S., She, T., Shaw, A. G., Shih, J.-H., Zhao, B. et al. BiomniBench: Process-level evalua- tion of LLM agents for real-world biomedical research. bioRxiv(2026). https://doi.org/10.64898/2026.05. 12.724604

work page doi:10.64898/2026.05 2026
[30]

Let's Verify Step by Step

Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I. & Cobbe, K. Let’s verify step by step.arXivarXiv:2305.20050 (2023). https://doi.org/10.48550/arXiv.2305.20050

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.20050 2023
[31]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E. & Stoica, I. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena.arXivarXiv:2306.05685 (2023). https://doi.org/10.48550/arXiv.2306. 05685

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306 2023
[32]

Large Language Models are not Fair Evaluators

Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., Cao, Y., Kong, L., Liu, Q., Liu, T. & Sui, Z. Large language models are not fair evaluators. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 9440–9450 (2024). https://doi.org/10.18653/v1/2024.acl-long.511

work page doi:10.18653/v1/2024.acl-long.511 2024
[33]

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R. & Zhu, C. G-Eval: NLG evaluation using GPT-4 with better human align- ment. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2511–2522 (2023). https://doi.org/10.18653/v1/2023.emnlp-main. 153

work page doi:10.18653/v1/2023.emnlp-main 2023
[34]

M., Janizek, J

Laurent, J. M., Janizek, J. D., Ruzo, M., Hinks, M. M., Hammerling, M. J., Narayanan, S., Ponnapati, M., White, A. D. & Rodriques, S. G. LAB-Bench: Measuring capa- bilities of language models for biology research.arXiv arXiv:2407.10362 (2024).https://doi.org/10.48550/ arXiv.2407.10362

Pith/arXiv arXiv 2024
[35]

M., Andonian, A., Tenmann, B., Narayanan, S., Wellawatte, G

Mitchener, L., Laurent, J. M., Andonian, A., Tenmann, B., Narayanan, S., Wellawatte, G. P., White, A., Sani, L. & Rodriques, S. G. BixBench: a comprehensive benchmark for LLM-based agents in computational biology.arXiv arXiv:2503.00096 (2025).https://doi.org/10.48550/ arXiv.2503.00096

arXiv 2025
[36]

H., Fletez-Brant, K., Xie, X., Corrada Bravo, H

Nair, S., Gunsalus, L., Orcutt-Jahns, B., Rossen, J., Lal, A., De Donno, C., Celik, M. H., Fletez-Brant, K., Xie, X., Corrada Bravo, H. & Eraslan, G. Agentic sys- tems are adept at solving well-scoped, verifiable prob- lems in computational biology.bioRxiv(2026).https: //doi.org/10.64898/2026.04.06.716850

work page doi:10.64898/2026.04.06.716850 2026
[37]

Li, J. & Ho, A. GeneBench: Assessing AI agents for multi- stage inference problems in genomics and quantitative biology.bioRxiv(2026). https://doi.org/10.64898/ 2026.04.22.720113

2026
[38]

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench.Anthropic Research (2026)

Anthropic. Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench.Anthropic Research (2026). anthropic.com/research/BioMysteryBench

2026
[39]

Ioannidis, J. P. A. Why most published research find- ings are false.PLOS Medicine2, e124 (2005).https: //doi.org/10.1371/journal.pmed.0020124

work page doi:10.1371/journal.pmed.0020124 2005
[40]

& Asadullah, K

Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets?Nature Reviews Drug Discovery10, 712 (2011).https://doi.org/10.1038/nrd3439-c1

work page doi:10.1038/nrd3439-c1 2011
[41]

Begley, C. G. & Ellis, L. M. Raise standards for pre- clinical cancer research.Nature483, 531–533 (2012). https://doi.org/10.1038/483531a

work page doi:10.1038/483531a 2012
[42]

M., Denis, A., Perfito, N., Iorns, E

Errington, T. M., Denis, A., Perfito, N., Iorns, E. & Nosek, B. A. Reproducibility in Cancer Biology: Challenges for assessing replicability in preclinical cancer biology.eLife 10, e67995 (2021).https://doi.org/10.7554/eLife. 67995

work page doi:10.7554/elife 2021
[43]

M., Mathur, M., Soderberg, C

Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E. & Nosek, B. A. Investigating the replicability of preclinical cancer biology.eLife10, e71601 (2021).https://doi.org/10.7554/eLife.71601

work page doi:10.7554/elife.71601 2021
[44]

J., George, A., Hoefakker, K

Ibáñez-Molero, S., Veldman, J., Simon Nieto, J., Traets, J. J., George, A., Hoefakker, K. et al. Tumour-reactive heterotypic CD8 T cell clusters from clinical samples.Na- ture649, 467–476 (2026).https://doi.org/10.1038/ s41586-025-09754-w

2026
[45]

D., Gomez, A

Green, W. D., Gomez, A. G., Plotkin, A. L., Pratt, B. M., Merritt, E. F. et al. Enhancer-driven gene regula- tory networks reveal transcription factors governing CD8 T cell adaptation and differentiation in the tumor microenvironment.Immunity58, 1725–1741 (2025). https://doi.org/10.1016/j.immuni.2025.04.030

work page doi:10.1016/j.immuni.2025.04.030 2025
[46]

Tan, T., Wu, J., Si, C., Dai, S., Zhang, Y. et al. Chimeric contribution of human extended pluripotent stem cells to monkey embryos ex vivo.Cell184, 2020–2032.e14 (2021). https://doi.org/10.1016/j.cell.2021.03.020

work page doi:10.1016/j.cell.2021.03.020 2020
[47]

G., Karmakar, S., Tsai, M

Shuldiner, E. G., Karmakar, S., Tsai, M. K., Hebert, J. D., Tang, Y. J. et al. Aging represses oncogenic KRAS-driven lung tumorigenesis and alters tumor suppression.Nature Aging5, 2263–2278 (2025).https://doi.org/10.1038/ s43587-025-00986-z

2025
[48]

C., Biermann, J., Huang, H., Wang, Y., Nair, A

Melms, J. C., Biermann, J., Huang, H., Wang, Y., Nair, A. et al. A molecular single-cell lung atlas of lethal COVID- 19.Nature595, 114–119 (2021).https://doi.org/10. 1038/s41586-021-03569-1

2021

[1] [1]

& Pachter, L

Moses, L. & Pachter, L. Museum of spatial transcrip- tomics.Nature Methods19, 534–546 (2022).https: //doi.org/10.1038/s41592-022-01409-2

work page doi:10.1038/s41592-022-01409-2 2022

[2] [2]

G., Lee, H

Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction to spatial transcriptomics for biomedical research.Genome Medicine14, 68 (2022). https://doi.org/10.1186/s13073-022-01075-1

work page doi:10.1186/s13073-022-01075-1 2022

[3] [3]

M., Sistig, A

Dries, R., Chen, J., Del Rossi, N., Khan, M. M., Sistig, A. & Yuan, G. C. Advances in spatial transcriptomic data analysis.Genome Research31, 1706–1718 (2021). https://doi.org/10.1101/gr.275224.121

work page doi:10.1101/gr.275224.121 2021

[4] [4]

et al.In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surround- ing microenvironment.Cell Reports45, 116827 (2026)

Lyubetskaya, A., Rabe, B., Kavran, A., Bai, Y. et al.In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surround- ing microenvironment.Cell Reports45, 116827 (2026). https://doi.org/10.1016/j.celrep.2025.116827

work page doi:10.1016/j.celrep.2025.116827 2026

[5] [5]

H., Annamalai, D., Woodiwiss, T., McCornack, C., Cleary, R

Ishahak, M., Han, R. H., Annamalai, D., Woodiwiss, T., McCornack, C., Cleary, R. T., DeSouza, P. A., Qu, X., Dahiya, S., Kim, A. H. & Millman, J. R. Genetically en- gineered brain organoids recapitulate spatial and devel- opmental states of glioblastoma progression.Advanced Science12, 2410110 (2025).https://doi.org/10.1002/ advs.202410110

2025

[6] [6]

G., Sun, D., Min, K

Jones, M. G., Sun, D., Min, K. H. J., Colgan, W. N., Wang, H., Torok, T., Cardoso, E. C., Tian, L., Weir, J. A., Chen, V. Z., Koblan, L. W., Yost, K. E., Mathey-Andrews, N., D’Souza, E., Russell, A. J. C., Stickels, R. R., Balderrama, K. S., Rideout, W. M., Dai, M., Marrero, G., Kumar, V., Saqi, A., Chen, F., Weissman, J. S., Yosef, N. & Yang, D. Spatiote...

work page doi:10.1101/2024.10.21 2024

[7] [7]

Darwish, A

Yang, D., Jones, M. G., Naranjo, S., Rideout, W. M., Min, K. H. J., Ho, R., Wu, W., Replogle, J. M., Page, J. L., Quinn, J. J., Horns, F., Qiu, X., Chen, M. Z., Freed-Pastor, W. A., McGinnis, C. S., Patterson, D. M., Gartner, Z. J., Chow, E. D., Bivona, T. G., Chan, M. M., Yosef, N., Jacks, T. & Weissman, J. S. Lineage tracing reveals the phylodynam- ics,...

work page doi:10.1016/j 1905

[8] [8]

DuPage, M., Dooley, A. L. & Jacks, T. Conditional mouse lung cancer models using adenoviral or lentiviral deliv- ery of Cre recombinase.Nature Protocols4, 1064–1072 (2009).https://doi.org/10.1038/nprot.2009.95

work page doi:10.1038/nprot.2009.95 2009

[9] [9]

G., Stickels, R

Rodriques, S. G., Stickels, R. R., Goeva, A., Martin, C. A., Murray, E., Vanderburg, C. R., Welch, J., Chen, L. M., Chen, F. & Macosko, E. Z. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution.Science363, 1463–1467 (2019). https://doi.org/10.1126/science.aaw1219

work page doi:10.1126/science.aaw1219 2019

[10] [10]

Russell, A. J. C., Weir, J. A., Nadaf, N. M., Shabet, M., Kumar, V., Kambhampati, S., Raichur, R., Marrero, G. J., Liu, S., Balderrama, K. S., Vanderburg, C. R., Shan- mugam, V., Tian, L., Iorgulescu, J. B., Yoon, C. H., Wu, C. J., Macosko, E. Z. & Chen, F. Slide-tags enables single- nucleus barcoding for multimodal spatial genomics.Na- ture625, 101–109 (...

2024

[11] [11]

Groh, J., Feng, R., Yuan, X., Liu, L., Klein, D. et al. Mi- croglia activation orchestrates CXCL10-mediated CD8+ T cell recruitment to promote aging-related white matter degeneration.Nature Neuroscience28, 1160–1173 (2025). https://doi.org/10.1038/s41593-025-01955-w

work page doi:10.1038/s41593-025-01955-w 2025

[12] [12]

K., Lin, L., Chang, Y.-C., Teo, E

Singhal, V., Chou, N., Lee, J., Yue, Y., Liu, J., Chock, W. K., Lin, L., Chang, Y.-C., Teo, E. M. L., Aow, J., Lee, H. K., Chen, K. H. & Prabhakar, S. BANKSY unifies cell typ- ing and tissue domain segmentation for scalable spatial omics data analysis.Nature Genetics56, 431–441 (2024). https://doi.org/10.1038/s41588-024-01664-3

work page doi:10.1038/s41588-024-01664-3 2024

[13] [13]

Varrone, M., Tavernari, D., Santamaria-Martinez, A., Walsh, L. A. et al. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nature Genetics56, 74–84 (2024).https://doi.org/10. 1038/s41588-023-01588-4

2024

[14] [14]

& Cai, G

Qin, F., Luo, X., Lu, Q., Cai, B., Xiao, F. & Cai, G. Spatial pattern and differential expression analysis with spatial transcriptomic data.Nucleic Acids Research52, e101 (2024). https://doi.org/10.1093/nar/gkae962

work page doi:10.1093/nar/gkae962 2024

[15] [15]

W., Li, T., Elmentaite, R., Lomakin, A., Kedlian, V., Gayoso, A., Jain, M

Kleshchevnikov, V., Shmatko, A., Dann, E., Aivazidis, A., King, H. W., Li, T., Elmentaite, R., Lomakin, A., Kedlian, V., Gayoso, A., Jain, M. S., Park, J. S., Ra- mona, L., Tuck, E., Arutyunyan, A., Vento-Tormo, R., Gerstung, M., James, L., Stegle, O. & Bayraktar, O. A. Cell2location maps fine-grained cell types in spatial transcriptomics.Nature Biotechno...

work page doi:10.1038/s41587-021-01139-4 2022

[16] [16]

R., Segerstolpe, A., Zhang, M., Avraham-Davidi, I

Biancalani, T., Scalia, G., Buffoni, L., Avasthi, R., Lu, Z., Sanger, A., Tokcan, N., Vanderburg, C. R., Segerstolpe, A., Zhang, M., Avraham-Davidi, I. & Regev, A. Deep learning and alignment of spatially resolved single- cell transcriptomes with Tangram.Nature Methods 18, 1352–1362 (2021).https://doi.org/10.1038/ s41592-021-01264-7

2021

[17] [17]

J., Hicks, S

Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C. et al. Eleven grand challenges in single-cell data science.Genome Biology21, 31 (2020). https://doi.org/10.1186/s13059-020-1926-6

work page doi:10.1186/s13059-020-1926-6 2020

[18] [18]

C., Lance, C., Litinetskaya, A., Drost, F

Heumos, L., Schaar, A. C., Lance, C., Litinetskaya, A., Drost, F. et al. Best practices for single-cell analysis across modalities.Nature Reviews Genetics24, 550–572 (2023). https://doi.org/10.1038/s41576-023-00586-w

work page doi:10.1038/s41576-023-00586-w 2023

[19] [19]

M., Zheng, S

Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W. M., Zheng, S. et al. Integrated analysis of multimodal single-cell data.Cell184, 3573–3587.e29 (2021).https: //doi.org/10.1016/j.cell.2021.04.048

work page doi:10.1016/j.cell.2021.04.048 2021

[20] [20]

Zheng, G. X. Y., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W. et al. Massively parallel digital transcriptional profiling of single cells.Nature Communications8, 14049 (2017).https://doi.org/10.1038/ncomms14049

work page doi:10.1038/ncomms14049 2017

[21] [21]

& Davis, M

Han, A., Glanville, J., Hansmann, L. & Davis, M. M. Link- ing T-cell receptor sequence to functional phenotype at the single-cell level.Nature Biotechnology32, 684–692 (2014).https://doi.org/10.1038/nbt.2938

work page doi:10.1038/nbt.2938 2014

[22] [22]

R., Björklund, Å

Picelli, S., Faridani, O. R., Björklund, Å. K., Winberg, G., Sagasser, S. & Sandberg, R. Full-length RNA-seq from single cells using Smart-seq2.Nature Protocols9, 171–181 (2014).https://doi.org/10.1038/nprot.2014.006

work page doi:10.1038/nprot.2014.006 2014

[23] [23]

B., Roco, C

Rosenberg, A. B., Roco, C. M., Muscat, R. A., Kuchina, A., Sample, P. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science360, 176–182 (2018).https://doi.org/10.1126/ science.aam8999

2018

[24] [24]

Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell-cell communication from combined expression of multi- subunit ligand-receptor complexes.Nature Protocols 15, 1484–1506 (2020).https://doi.org/10.1038/ s41596-020-0292-x

2020

[25] [25]

F., Zhang, L., Chang, I., Ramos, R

Jin, S., Guerrero-Juarez, C. F., Zhang, L., Chang, I., Ramos, R. et al. Inference and analysis of cell-cell communication using CellChat.Nature Communica- tions12, 1088 (2021).https://doi.org/10.1038/ s41467-021-21246-9

2021

[26] [26]

Workman, K., Yang, Z., Muralidharan, H. & Le, H. SpatialBench: Can agents analyze real-world spatial biology data?arXivarXiv:2512.21907 (2025). https: //doi.org/10.48550/arXiv.2512.21907

work page doi:10.48550/arxiv.2512.21907 2025

[27] [27]

Workman, K., Yang, Z., Muralidharan, H., Abdulali, A. & Le, H. scBench: Evaluating AI agents on single- cell RNA-seq analysis.arXivarXiv:2602.09063 (2026). https://doi.org/10.48550/arXiv.2602.09063

work page doi:10.48550/arxiv.2602.09063 2026

[28] [28]

& Workman, K

Diks, I., Muralidharan, H., Proctor, T. & Workman, K. Verifiable benchmarking of long-horizon spatial biology. arXivarXiv:2605.28065 (2026). https://doi.org/10. 48550/arXiv.2605.28065

Pith/arXiv arXiv 2026

[29] [29]

G., Shih, J.-H., Zhao, B

Qu, Y., Lu, Y., Tu, X., Zhang, S., She, T., Shaw, A. G., Shih, J.-H., Zhao, B. et al. BiomniBench: Process-level evalua- tion of LLM agents for real-world biomedical research. bioRxiv(2026). https://doi.org/10.64898/2026.05. 12.724604

work page doi:10.64898/2026.05 2026

[30] [30]

Let's Verify Step by Step

Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I. & Cobbe, K. Let’s verify step by step.arXivarXiv:2305.20050 (2023). https://doi.org/10.48550/arXiv.2305.20050

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.20050 2023

[31] [31]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E. & Stoica, I. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena.arXivarXiv:2306.05685 (2023). https://doi.org/10.48550/arXiv.2306. 05685

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306 2023

[32] [32]

Large Language Models are not Fair Evaluators

Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., Cao, Y., Kong, L., Liu, Q., Liu, T. & Sui, Z. Large language models are not fair evaluators. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 9440–9450 (2024). https://doi.org/10.18653/v1/2024.acl-long.511

work page doi:10.18653/v1/2024.acl-long.511 2024

[33] [33]

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R. & Zhu, C. G-Eval: NLG evaluation using GPT-4 with better human align- ment. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2511–2522 (2023). https://doi.org/10.18653/v1/2023.emnlp-main. 153

work page doi:10.18653/v1/2023.emnlp-main 2023

[34] [34]

M., Janizek, J

Laurent, J. M., Janizek, J. D., Ruzo, M., Hinks, M. M., Hammerling, M. J., Narayanan, S., Ponnapati, M., White, A. D. & Rodriques, S. G. LAB-Bench: Measuring capa- bilities of language models for biology research.arXiv arXiv:2407.10362 (2024).https://doi.org/10.48550/ arXiv.2407.10362

Pith/arXiv arXiv 2024

[35] [35]

M., Andonian, A., Tenmann, B., Narayanan, S., Wellawatte, G

Mitchener, L., Laurent, J. M., Andonian, A., Tenmann, B., Narayanan, S., Wellawatte, G. P., White, A., Sani, L. & Rodriques, S. G. BixBench: a comprehensive benchmark for LLM-based agents in computational biology.arXiv arXiv:2503.00096 (2025).https://doi.org/10.48550/ arXiv.2503.00096

arXiv 2025

[36] [36]

H., Fletez-Brant, K., Xie, X., Corrada Bravo, H

Nair, S., Gunsalus, L., Orcutt-Jahns, B., Rossen, J., Lal, A., De Donno, C., Celik, M. H., Fletez-Brant, K., Xie, X., Corrada Bravo, H. & Eraslan, G. Agentic sys- tems are adept at solving well-scoped, verifiable prob- lems in computational biology.bioRxiv(2026).https: //doi.org/10.64898/2026.04.06.716850

work page doi:10.64898/2026.04.06.716850 2026

[37] [37]

Li, J. & Ho, A. GeneBench: Assessing AI agents for multi- stage inference problems in genomics and quantitative biology.bioRxiv(2026). https://doi.org/10.64898/ 2026.04.22.720113

2026

[38] [38]

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench.Anthropic Research (2026)

Anthropic. Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench.Anthropic Research (2026). anthropic.com/research/BioMysteryBench

2026

[39] [39]

Ioannidis, J. P. A. Why most published research find- ings are false.PLOS Medicine2, e124 (2005).https: //doi.org/10.1371/journal.pmed.0020124

work page doi:10.1371/journal.pmed.0020124 2005

[40] [40]

& Asadullah, K

Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets?Nature Reviews Drug Discovery10, 712 (2011).https://doi.org/10.1038/nrd3439-c1

work page doi:10.1038/nrd3439-c1 2011

[41] [41]

Begley, C. G. & Ellis, L. M. Raise standards for pre- clinical cancer research.Nature483, 531–533 (2012). https://doi.org/10.1038/483531a

work page doi:10.1038/483531a 2012

[42] [42]

M., Denis, A., Perfito, N., Iorns, E

Errington, T. M., Denis, A., Perfito, N., Iorns, E. & Nosek, B. A. Reproducibility in Cancer Biology: Challenges for assessing replicability in preclinical cancer biology.eLife 10, e67995 (2021).https://doi.org/10.7554/eLife. 67995

work page doi:10.7554/elife 2021

[43] [43]

M., Mathur, M., Soderberg, C

Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E. & Nosek, B. A. Investigating the replicability of preclinical cancer biology.eLife10, e71601 (2021).https://doi.org/10.7554/eLife.71601

work page doi:10.7554/elife.71601 2021

[44] [44]

J., George, A., Hoefakker, K

Ibáñez-Molero, S., Veldman, J., Simon Nieto, J., Traets, J. J., George, A., Hoefakker, K. et al. Tumour-reactive heterotypic CD8 T cell clusters from clinical samples.Na- ture649, 467–476 (2026).https://doi.org/10.1038/ s41586-025-09754-w

2026

[45] [45]

D., Gomez, A

Green, W. D., Gomez, A. G., Plotkin, A. L., Pratt, B. M., Merritt, E. F. et al. Enhancer-driven gene regula- tory networks reveal transcription factors governing CD8 T cell adaptation and differentiation in the tumor microenvironment.Immunity58, 1725–1741 (2025). https://doi.org/10.1016/j.immuni.2025.04.030

work page doi:10.1016/j.immuni.2025.04.030 2025

[46] [46]

Tan, T., Wu, J., Si, C., Dai, S., Zhang, Y. et al. Chimeric contribution of human extended pluripotent stem cells to monkey embryos ex vivo.Cell184, 2020–2032.e14 (2021). https://doi.org/10.1016/j.cell.2021.03.020

work page doi:10.1016/j.cell.2021.03.020 2020

[47] [47]

G., Karmakar, S., Tsai, M

Shuldiner, E. G., Karmakar, S., Tsai, M. K., Hebert, J. D., Tang, Y. J. et al. Aging represses oncogenic KRAS-driven lung tumorigenesis and alters tumor suppression.Nature Aging5, 2263–2278 (2025).https://doi.org/10.1038/ s43587-025-00986-z

2025

[48] [48]

C., Biermann, J., Huang, H., Wang, Y., Nair, A

Melms, J. C., Biermann, J., Huang, H., Wang, Y., Nair, A. et al. A molecular single-cell lung atlas of lethal COVID- 19.Nature595, 114–119 (2021).https://doi.org/10. 1038/s41586-021-03569-1

2021