Recognition: 2 theorem links
· Lean TheoremPrototype Guided Post-pretraining for Single-Cell Representation Learning
Pith reviewed 2026-05-11 02:55 UTC · model grok-4.3
The pith
A post-pretraining stage that uses marker-gene sets as priors refines cell embeddings and lifts downstream performance by up to 15 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CellRefine is a post-pretraining method that operates between the pretraining and fine-tuning stages of a single-cell foundation model. It employs a multi-faceted objective that incorporates marker-gene sets as structural priors to guide refinement of the latent embedding manifold of cells. Empirical results across multiple computational biology tasks show that this stage consistently improves downstream performance, yielding gains up to 15 percent.
What carries the argument
Marker-gene sets used as structural priors inside a multi-faceted post-pretraining objective that reshapes the cell latent embedding manifold.
If this is right
- Existing single-cell foundation models can be improved without retraining from scratch by inserting the guided post-pretraining stage.
- Performance gains hold across tasks that suffer from long-tailed cell-type distributions and covariate shifts in gene expression.
- The refined embeddings support more accurate downstream analyses of cellular regulatory logic.
- The approach keeps the original pretraining and fine-tuning pipelines intact while adding only the intermediate refinement step.
Where Pith is reading between the lines
- The method may generalize to other sequence-based biological foundation models if suitable structural priors can be identified for those domains.
- Performance sensitivity to the exact choice of marker genes could be tested by swapping marker sets and measuring downstream variance.
- If the gains persist on very large or noisy datasets, the technique might reduce reliance on extensive labeled data during fine-tuning.
- The refinement could interact with batch-correction methods to further stabilize embeddings under strong covariate shifts.
Load-bearing premise
Marker-gene sets supply reliable structural priors that improve the embedding manifold without introducing selection bias or overfitting to the chosen genes.
What would settle it
Apply CellRefine to an existing single-cell pretrained model on a held-out dataset with established marker genes and compare against direct fine-tuning; if downstream task metrics show no improvement or a decline, the claimed benefit of the guided post-pretraining stage is falsified.
Figures
read the original abstract
Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have recently been proposed that treat genes as tokens and cells as sentences. However, these models are fundamentally limited by the long-tailed nature of cell-type distributions and struggle to generalize under covariate shifts in gene expression data. While fine-tuning is often used to mitigate these issues, we observe that performance remains bounded. To address this challenge, we introduce CellRefine, a post-pretraining method that operates between the pretraining and fine-tuning stages of a single-cell foundation model. CellRefine uses a multi-faceted objective that incorporates marker-gene sets as structural priors to guide post-pretraining and refine the latent embedding manifold of cells. Across multiple computational biology tasks, empirical results show that CellRefine consistently improves downstream performance, yielding gains up to 15%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CellRefine, a post-pretraining method inserted between pretraining and fine-tuning of single-cell foundation models. It employs a multi-faceted objective that incorporates marker-gene sets as structural priors to refine the latent embedding manifold of cells. The central claim is that this procedure yields consistent improvements on multiple computational biology downstream tasks, with empirical gains reaching up to 15%.
Significance. If the reported gains prove reproducible and free of leakage from marker-gene selection, the approach could supply a lightweight, targeted refinement stage that mitigates long-tailed cell-type distributions and covariate shifts without requiring full model retraining. This would be a practical addition to the single-cell representation learning toolkit.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): The claim of gains up to 15% is presented without any description of baselines, datasets, statistical tests, ablation studies, or cross-validation protocol, rendering the central empirical result unverifiable from the manuscript.
- [§3] §3 (Method): The construction and provenance of the marker-gene sets are not specified, including whether the sets are held out from all downstream evaluation splits; without this, the structural-prior interpretation cannot be distinguished from possible prior leakage or selection bias.
minor comments (1)
- [Abstract and §3] The abstract and method sections would benefit from explicit equations for the multi-faceted objective to clarify how the marker-gene priors enter the loss.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below and will incorporate revisions to improve the verifiability and transparency of the work.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The claim of gains up to 15% is presented without any description of baselines, datasets, statistical tests, ablation studies, or cross-validation protocol, rendering the central empirical result unverifiable from the manuscript.
Authors: We agree that the abstract's presentation of the up to 15% gains lacks sufficient context for immediate verification. We will revise the abstract to include a concise description of the evaluation protocol, key baselines, and datasets. In Section 4, we will expand the text to explicitly detail the statistical tests performed, the ablation studies on the multi-faceted objective, and the cross-validation protocol (including number of folds and runs). These changes will ensure all empirical claims are fully verifiable directly from the manuscript. revision: yes
-
Referee: [§3] §3 (Method): The construction and provenance of the marker-gene sets are not specified, including whether the sets are held out from all downstream evaluation splits; without this, the structural-prior interpretation cannot be distinguished from possible prior leakage or selection bias.
Authors: We thank the referee for identifying this gap in methodological detail. We will revise Section 3 to specify the construction process and provenance of the marker-gene sets, drawing from established curated databases and literature sources. We will also add an explicit statement and supporting evidence that the sets are held out from all downstream evaluation splits, with no overlap or selection from the fine-tuning or test data. This will confirm that the marker genes function purely as structural priors during post-pretraining and eliminate any possibility of leakage or bias. revision: yes
Circularity Check
No circularity in derivation chain; purely empirical method
full rationale
The paper introduces CellRefine as an empirical post-pretraining procedure that incorporates marker-gene sets as structural priors to refine embeddings, with performance gains demonstrated via downstream experiments. No equations, derivations, or self-referential definitions are present that would reduce any claimed result to its inputs by construction. The method description and results do not rely on fitted parameters renamed as predictions, load-bearing self-citations, or imported uniqueness theorems. The central claims remain independent and falsifiable through external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CellRefine uses a multi-faceted objective that incorporates marker-gene sets as structural priors to guide post-pretraining and refine the latent embedding manifold of cells... L_Total = L_MLM + λ1 L_prototype + λ2 L_lineage + λ3 L_GMVE
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We construct marker gene sets for each cell type... ordered sequence of marker genes as a prototype... prototype-guided regularization loss
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Machine learning for precision diagnostics of autoimmunity.Scientific Reports, 14(1):27848, 2024
Jan Kruta, Raphael Carapito, Marten Trendelenburg, Thierry Martin, Marta Rizzi, Reinhard E V oll, Andrea Cavalli, Eriberto Natali, Patrick Meier, Marc Stawiski, et al. Machine learning for precision diagnostics of autoimmunity.Scientific Reports, 14(1):27848, 2024
work page 2024
-
[2]
Tom W Andrew, Mogdad Alrawi, Ruth Plummer, Nick Reynolds, Vern Sondak, Isaac Brownell, Penny E Lovat, Aidan Rose, and Sophia Z Shalhout. A hybrid machine learning approach for the personalized prognostication of aggressive skin cancers.npj Digital Medicine, 8(1):15, 2025
work page 2025
-
[3]
Learning the natural history of human disease with generative transformers.Nature, pages 1–9, 2025
Artem Shmatko, Alexander Wolfgang Jung, Kumar Gaurav, Søren Brunak, Laust Hvas Mortensen, Ewan Birney, Tom Fitzgerald, and Moritz Gerstung. Learning the natural history of human disease with generative transformers.Nature, pages 1–9, 2025
work page 2025
-
[4]
Transfer learning enables predictions in network biology.Nature, 618(7965):616–624, 2023
Christina V Theodoris, Ling Xiao, Anant Chopra, Mark D Chaffin, Zeina R Al Sayed, Matthew C Hill, Helene Mantineo, Elizabeth M Brydon, Zexian Zeng, X Shirley Liu, et al. Transfer learning enables predictions in network biology.Nature, 618(7965):616–624, 2023
work page 2023
-
[5]
Cellplm: pre-training of cell language model beyond single cells.BioRxiv, pages 2023–10, 2023
Hongzhi Wen, Wenzhuo Tang, Xinnan Dai, Jiayuan Ding, Wei Jin, Yuying Xie, and Jiliang Tang. Cellplm: pre-training of cell language model beyond single cells.BioRxiv, pages 2023–10, 2023
work page 2023
-
[6]
Da Kuang, Guanwen Qiu, and Junhyong Kim. Reconstructing cell lineage trees from phenotypic features with metric learning.arXiv preprint arXiv:2503.13925, 2025
-
[7]
Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai.Nature methods, 21(8): 1470–1480, 2024
work page 2024
-
[8]
scgen predicts single-cell perturbation responses.Nature methods, 16(8):715–721, 2019
Mohammad Lotfollahi, F Alexander Wolf, and Fabian J Theis. scgen predicts single-cell perturbation responses.Nature methods, 16(8):715–721, 2019
work page 2019
-
[9]
Adaptive resampling for improved machine learning in imbalanced single-cell datasets
Zeinab Navidi, Akshaya Thoutam, Madeline Hughes, Srivatsan Raghavan, Peter S Winter, Lorin Crawford, and Ava P Amini. Adaptive resampling for improved machine learning in imbalanced single-cell datasets. bioRxiv, pages 2025–11, 2025
work page 2025
-
[10]
Grace XY Zheng, Jessica M Terry, Phillip Belgrader, Paul Ryvkin, Zachary W Bent, Ryan Wilson, Solongo B Ziraldo, Tobias D Wheeler, Geoff P McDermott, Junjie Zhu, et al. Massively parallel digital transcriptional profiling of single cells.Nature communications, 8(1):14049, 2017
work page 2017
-
[11]
Elham Azizi, Ambrose J Carr, George Plitas, Andrew E Cornish, Catherine Konopacki, Sandhya Prab- hakaran, Juozas Nainys, Kenmin Wu, Vaidotas Kiseliovas, Manu Setty, et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment.Cell, 174(5):1293–1308, 2018
work page 2018
-
[12]
Dominic Grün, Anna Lyubimova, Lennart Kester, Kay Wiebrands, Onur Basak, Nobuo Sasaki, Hans Clevers, and Alexander Van Oudenaarden. Single-cell messenger rna sequencing reveals rare intestinal cell types.Nature, 525(7568):251–255, 2015
work page 2015
-
[13]
Alexandra-Chloé Villani, Rahul Satija, Gary Reynolds, Siranush Sarkizova, Karthik Shekhar, James Fletcher, Morgane Griesbeck, Andrew Butler, Shiwei Zheng, Suzan Lazo, et al. Single-cell rna-seq reveals new types of human blood dendritic cells, monocytes, and progenitors.Science, 356(6335):eaah4573, 2017
work page 2017
-
[14]
Suyuan Zhao, Jiahuan Zhang, Yushuai Wu, Yizhen Luo, and Zaiqing Nie. Langcell: Language-cell pre-training for cell identity understanding.arXiv preprint arXiv:2405.06708, 2024
-
[15]
Yusuf Roohani, Kexin Huang, and Jure Leskovec. Predicting transcriptional outcomes of novel multigene perturbations with gears.Nature Biotechnology, 42(6):927–935, 2024
work page 2024
-
[16]
Qun Jiang, Shengquan Chen, Xiaoyang Chen, and Rui Jiang. scpram accurately predicts single-cell gene expression perturbation response based on attention mechanism.Bioinformatics, 40(5):btae265, 2024
work page 2024
-
[17]
Statistical analysis of gene expression microarray data.CRC press, 2003
Xiangqin Cui and Gary A Churchill. Statistical analysis of gene expression microarray data.CRC press, 2003
work page 2003
-
[18]
Sandrine Dudoit, Yee Hwa Yang, Matthew J Callow, and Terence P Speed. Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments.Statistica sinica, pages 111–139, 2002
work page 2002
-
[19]
Alexander Schliep, Alexander Schönhuth, and Carsten Steinhoff. Hidden markov models for microarray time course data in multiple biological conditions.Bioinformatics, 19:i264–i272, 2003. 10
work page 2003
-
[20]
Gordon K Smyth. Linear models and empirical bayes methods for assessing differential expression in microarray experiments.Statistical applications in genetics and molecular biology, 3(1), 2004
work page 2004
-
[21]
Virginia Goss Tusher, Robert Tibshirani, and Gilbert Chu. Significance analysis of microarrays applied to the ionizing radiation response.Proceedings of the National Academy of Sciences, 98(9):5116–5121, 2001
work page 2001
-
[22]
Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. Gene selection for cancer classifi- cation using support vector machines.Machine learning, 46(1):389–422, 2002
work page 2002
-
[23]
Terrence S Furey, Nello Cristianini, Nigel Duffy, David W Bednarski, Michel Schummer, and David Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data.Bioinformatics, 16(10):906–914, 2000
work page 2000
-
[24]
Random forests for gene expression analysis.BMC bioinformatics, 7(1):1–13, 2006
Ramón Díaz-Uriarte and Sara Alvarez De Andres. Random forests for gene expression analysis.BMC bioinformatics, 7(1):1–13, 2006
work page 2006
-
[25]
Benjamin A Goldstein, Alan E Hubbard, Adele Cutler, and Lisa F Barcellos. Gene selection and classifica- tion of microarray data using random forest.BMC bioinformatics, 11(1):1–13, 2010
work page 2010
-
[26]
Ensemble methods for gene expression microarray analysis.Applied bioinformatics, 2(2):75–83, 2003
Aik Choon Tan and David Gilbert. Ensemble methods for gene expression microarray analysis.Applied bioinformatics, 2(2):75–83, 2003
work page 2003
-
[27]
An ensemble approach for gene expression data classification.BMC bioinformatics, 9(1):1–16, 2008
Mehdi Pirooznia, Jack Y Yang, Mary Qu Yang, and Youping Deng. An ensemble approach for gene expression data classification.BMC bioinformatics, 9(1):1–16, 2008
work page 2008
-
[28]
Mark D Robinson and Alicia Oshlack. A scaling normalization method for differential expression analysis of rna-seq data.Genome biology, 11(3):1–9, 2010
work page 2010
-
[29]
Michael I Love, Wolfgang Huber, and Simon Anders. Moderated estimation of fold change and dispersion for rna-seq data with deseq2.Genome biology, 15(12):1–21, 2014
work page 2014
-
[30]
Mark D Robinson, Davis J McCarthy, and Gordon K Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data.Bioinformatics, 26(1):139–140, 2010
work page 2010
-
[31]
Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning.Nature biotechnology, 33(8):831–838, 2015
work page 2015
-
[32]
Jian Zhou and Olga G Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model.Nature methods, 12(10):931–934, 2015
work page 2015
-
[33]
Deep learning for regulatory genomics.Nature biotechnology, 37(9):1082–1090, 2019
James Zou, Mikael Huss, Abubakar Abid, Pejman Mohammadi, Ali Torkamani, and Amalio Telenti. Deep learning for regulatory genomics.Nature biotechnology, 37(9):1082–1090, 2019
work page 2019
-
[34]
Etienne Becht, Leland McInnes, John Healy, Charles-Antoine Dutertre, Immanuel WH Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W Newell. Dimensionality reduction for visualizing single-cell data using umap.Nature biotechnology, 37(1):38–44, 2019
work page 2019
-
[35]
Scanpy: large-scale single-cell gene expression data analysis.Genome biology, 19(1):1–5, 2018
F Alexander Wolf, Philipp Angerer, and Fabian J Theis. Scanpy: large-scale single-cell gene expression data analysis.Genome biology, 19(1):1–5, 2018
work page 2018
-
[36]
Sc3: consensus clustering of single-cell rna-seq data.Nature methods, 14(5):483–486, 2017
Vladimir Yu Kiselev, Kristina Kirschner, Michael T Schaub, Tallulah Andrews, Andrew Yiu, Tamir Chandra, Kedar N Natarajan, Wolf Reik, Mauricio Barahona, Anthony R Green, et al. Sc3: consensus clustering of single-cell rna-seq data.Nature methods, 14(5):483–486, 2017
work page 2017
-
[37]
Seurat: tools for single cell genomics.Nature biotechnology, 36(4):411–420, 2018
Andrew Butler, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. Seurat: tools for single cell genomics.Nature biotechnology, 36(4):411–420, 2018
work page 2018
-
[38]
Cole Trapnell, Davide Cacchiarelli, Jonna Grimsby, Prapti Pokharel, Shuqiang Li, Michael Morse, Niall J Lennon, Kenneth J Livak, Tarjei S Mikkelsen, and John L Rinn. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.Nature biotechnology, 32(4):381–386, 2014
work page 2014
-
[39]
Diffusion pseudotime robustly reconstructs lineage branching.Nature methods, 13(10):845–848, 2016
Laleh Haghverdi, Maren Büttner, F Alexander Wolf, Florian Buettner, and Fabian J Theis. Diffusion pseudotime robustly reconstructs lineage branching.Nature methods, 13(10):845–848, 2016
work page 2016
-
[40]
Ziga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R Ledsam, Agnieszka Grabska-Barwinska, Kyle R Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R Kelley. Effective gene expression prediction from sequence by integrating long-range interactions.Nature methods, 18(10):1196–1203, 2021. URLhttps://www.nature.com/articles/s41592-021-01252-x. 11
work page 2021
-
[41]
Fan Yang, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou Huang, Hui Lu, and Jianhua Yao. scbert as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data.Nature Machine Intelligence, 4(10):852–866, 2022
work page 2022
-
[42]
Yang Wang, Jiaqi Chen, Qian Li, Xuegong Wang, Jianhua Tang, and Rui Zhang. Deep learning in single-cell and spatial transcriptomics data analysis: advances and challenges from a data science perspective.Briefings in Bioinformatics, 2025. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC11970898/
work page 2025
-
[43]
Yi Yang, Zunpeng Liu, Jiajia Liao, and Luonan Chen. Integrating multi-modal information to detect spatial domains of spatial transcriptomics by graph attention network.Computers in Biology and Medicine, 2023. URLhttps://www.sciencedirect.com/science/article/abs/pii/S1673852723001418
work page 2023
-
[44]
Sachini Weerasekara, Natasha Darras, Nicolas Fernandez, Melinda Chen, Alina Ainbinder, and Colles Price. Cellclique: Dissecting tumor microenvironments at the single cell level using generative ai and spatial transcriptomics.Cancer Research, 85(8_Supplement_1):2418–2418, 2025
work page 2025
-
[45]
Roxana Zahedi, Reza Ghamsari, Ahmadreza Argha, Callum Macphillamy, Amin Beheshti, Roohallah Alizadehsani, Nigel H Lovell, Mohammad Lotfollahi, and Hamid Alinejad-Rokny. Deep learning in spatially resolved transcriptomics: a comprehensive technical view.Briefings in Bioinformatics, 25(2),
-
[46]
URLhttps://academic.oup.com/bib/article/25/2/bbae082/7628264
-
[47]
Fayyaz Ahmad, Li Zhang, and Ming Chen. Deep learning-based multimodal spatial transcriptomics analysis for cancer.Methods in Molecular Biology, 2024. URL https://pmc.ncbi.nlm.nih.gov/ articles/PMC11431148/
work page 2024
-
[48]
Deep generative modeling for single-cell transcriptomics.Nature methods, 15(12):1053–1058, 2018
Romain Lopez, Jeffrey Regier, Michael B Cole, Michael I Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics.Nature methods, 15(12):1053–1058, 2018
work page 2018
-
[49]
Adam Gayoso, Zoë Steier, Romain Lopez, Jeffrey Regier, Kristopher L Nazor, Aaron Streets, and Nir Yosef. Joint probabilistic modeling of single-cell multi-omic data with totalvi.Nature methods, 18(3): 272–282, 2021
work page 2021
-
[50]
A foundation model of transcription across human cell types
Alexander Karollus, Thomas Mauermeier, Maximilian Holzleitner, Johannes Lehner, Irina Poernbacher, Stefan Schoenauer, and Julien Gagneur. A foundation model of transcription across human cell types. Nature, 2024. URLhttps://www.nature.com/articles/s41586-024-08391-z
work page 2024
-
[51]
Zeyu Huang, Luis E Carvalho, Theo A Knijnenburg, Stuart J Aitken, Amaia Lujambio, Daifeng Wang, Mathieu Lupien, Anshul Kundaje, David R Kelley, and Christina S Leslie. Enhancing personalized gene expression prediction from dna sequences using genomic foundation models.Nature Communications,
-
[52]
URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC11416237/
-
[53]
Nicheformer: a foundation model for single-cell and spatial omics.bioRxiv, 2024
Alvaro Ciudad Schaar, Wajid Jawaid, Yinhan Yang, Katie Branson, Arian R Vento, Emma Dann, and Sarah A Teichmann. Nicheformer: a foundation model for single-cell and spatial omics.bioRxiv, 2024. URLhttps://www.biorxiv.org/content/10.1101/2024.04.15.589472v1.full
-
[54]
Gunsagar S Gulati, Jeremy Philip D’Silva, Yunhe Liu, Linghua Wang, and Aaron M Newman. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics.Nature reviews Molecular cell biology, 26(1):11–31, 2025
work page 2025
-
[55]
Claudia Cantoni, Roman A Smirnov, Maria Firulyova, Prabhakar S Andhey, Tara R Bradstreet, Ekaterina Esaulova, Marina Terekhova, Elizabeth A Schwarzkopf, Nada M Abdalla, Maksim Kleverov, et al. A single-cell compendium of human cerebrospinal fluid identifies disease-associated immune cell populations. The Journal of Clinical Investigation, 135(1), 2025
work page 2025
-
[56]
Atefeh Lafzi, Catia Moutinho, Simone Picelli, and Holger Heyn. Tutorial: guidelines for the experimental design of single-cell rna sequencing studies.Nature protocols, 13(12):2742–2757, 2018
work page 2018
-
[57]
Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10795–10816, 2023
work page 2023
-
[58]
Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, and João Gama. A systematic review on long-tailed learning.IEEE Transactions on Neural Networks and Learning Systems, 2025
work page 2025
-
[59]
Jorge M Mendes, Aziz Barbar, and Marwa Refaie. Synthetic data generation: a privacy-preserving approach to accelerate rare disease research.Frontiers in Digital Health, 7:1563991, 2025. 12
work page 2025
-
[60]
Post-pre-training for modality alignment in vision-language foundation models
Shin’ya Yamaguchi, Dewei Feng, Sekitoshi Kanai, Kazuki Adachi, and Daiki Chijiwa. Post-pre-training for modality alignment in vision-language foundation models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4256–4266, 2025
work page 2025
-
[61]
Bianca Dumitrascu, Soledad Villar, Dustin G Mixon, and Barbara E Engelhardt. Optimal marker gene selection for cell type discrimination in single cell analyses.Nature communications, 12(1):1186, 2021
work page 2021
-
[62]
Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders
Nat Dilokthanakul, Pedro AM Mediano, Marta Garnelo, Matthew CH Lee, Hugh Salimbeni, Kai Arulku- maran, and Murray Shanahan. Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648, 2016
work page Pith review arXiv 2016
-
[63]
Adam Gayoso, Romain Lopez, Galen Xing, Pierre Boyeau, Valeh Valiollah Pour Amiri, Justin Hong, Katherine Wu, Michael Jayasuriya, Edouard Mehlman, Maxime Langevin, et al. A python library for probabilistic analysis of single-cell omics data.Nature biotechnology, 40(2):163–166, 2022
work page 2022
-
[64]
Transformer for one stop interpretable cell type annotation.Nature Communications, 14(1):223, 2023
Jiawei Chen, Hao Xu, Wanyu Tao, Zhaoxiong Chen, Yuxuan Zhao, and Jing-Dong J Han. Transformer for one stop interpretable cell type annotation.Nature Communications, 14(1):223, 2023
work page 2023
-
[65]
Yingxin Lin, Yue Cao, Hani Jieun Kim, Agus Salim, Terence P Speed, David M Lin, Pengyi Yang, and Jean Yee Hwa Yang. scclassify: sample size estimation and multiscale classification of cells using single and multiple reference.Molecular systems biology, 16(6):e9389, 2020
work page 2020
-
[66]
A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells
Sijin Cheng, Ziyi Li, Ranran Gao, Baocai Xing, Yunong Gao, Yu Yang, Shishang Qin, Lei Zhang, Hanqiang Ouyang, Peng Du, et al. A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells. Cell, 184(3):792–809, 2021
work page 2021
-
[67]
Lucas Schirmer, Dmitry Velmeshev, Staffan Holmqvist, Max Kaufmann, Sebastian Werneburg, Diane Jung, Stephanie Vistnes, John H Stockley, Adam Young, Maike Steindel, et al. Neuronal vulnerability and multilineage diversity in multiple sclerosis.Nature, 573(7772):75–82, 2019
work page 2019
-
[68]
Transcriptional and cellular diversity of the human heart.Circulation, 142(5):466–482, 2020
Nathan R Tucker, Mark Chaffin, Stephen J Fleming, Amelia W Hall, Victoria A Parsons, Kenneth C Bedi Jr, Amer-Denis Akkad, Caroline N Herndon, Alessandro Arduini, Irinna Papangeli, et al. Transcriptional and cellular diversity of the human heart.Circulation, 142(5):466–482, 2020
work page 2020
-
[69]
Nayoung Kim, Hong Kwan Kim, Kyungjong Lee, Yourae Hong, Jong Ho Cho, Jung Won Choi, Jung-Il Lee, Yeon-Lim Suh, Bo Mi Ku, Hye Hyeon Eum, et al. Single-cell rna sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma.Nature communications, 11(1): 2285, 2020
work page 2020
-
[70]
10x Genomics. 10x Genomics Datasets. https://www.10xgenomics.com/datasets, 2025. Accessed: 2025-11-20
work page 2025
-
[71]
Hyun Min Kang, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, Simon Wong, Lauren Byrnes, Cristina M Lanata, et al. Multiplexed droplet single-cell rna-sequencing using natural genetic variation.Nature biotechnology, 36(1):89–94, 2018
work page 2018
-
[72]
Suchin Gururangan, Ana Marasovi ´c, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. Don’t stop pretraining: Adapt language models to domains and tasks.arXiv preprint arXiv:2004.10964, 2020
-
[73]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
work page 2022
- [74]
-
[75]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[76]
Sachini Weerasekara, Zhenyuan Lu, Burcu Ozek, Jacqueline Isaacs, and Sagar Kamarthi. Trends in adopting industry 4.0 for asset life cycle management for sustainability: a keyword co-occurrence network review and analysis.Sustainability, 14(19):12233, 2022
work page 2022
-
[77]
Sachini Weerasekara, Wei Li, Jacqueline Isaacs, and Sagar Kamarthi. Reinforcement learning for disas- sembly task control.Computers & Industrial Engineering, 190:110044, 2024
work page 2024
-
[78]
Sachini Weerasekara, Wei Li, Jacqueline Isaacs, and Sagar Kamarthi. Improvements to disassembly lot sizing with task control through reinforcement learning.Journal of Advanced Manufacturing and Processing, 7(4):e70032, 2025. 13
work page 2025
-
[79]
Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Benjamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R Golub, Eric S Lander, and Jill P Mesirov. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):155...
-
[80]
The molecular signatures database (msigdb) hallmark gene set collection.Cell Systems, 1(6):417–425,
Arthur Liberzon, Chet Birger, Helga Thorvaldsdóttir, Mahmoud Ghandi, Jill P Mesirov, and Pablo Tamayo. The molecular signatures database (msigdb) hallmark gene set collection.Cell Systems, 1(6):417–425,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.