Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models
Pith reviewed 2026-05-21 22:48 UTC · model grok-4.3
The pith
Genome-Factory is the first integrated Python library that handles data collection, tuning, inference, benchmarking, and biological interpretation for genomic foundation models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Genome-Factory supplies an automated data pipeline, unified support for full and parameter-efficient fine-tuning, embedding and generation inference, benchmark interfaces, and an open-source sparse auto-encoder interpreter that turns model representations into biological signals, demonstrated on DNABERT-2.
What carries the argument
The Genome-Factory library itself, whose core component is a sparse auto-encoder that maps high-dimensional genomic embeddings to sparse, biologically readable features.
If this is right
- Researchers can switch between genomic models and fine-tuning methods without rewriting data or training code.
- Standardized benchmarking becomes possible through the included interfaces and two supplied evaluation suites.
- Interpretability is added as a default step rather than a separate research project.
- Synthetic sequence generation and embedding extraction become routine operations inside one codebase.
Where Pith is reading between the lines
- If the interpreter proves stable across models, it could serve as a common lens for comparing what different genomic foundation models have actually learned.
- The automated data pipeline lowers the barrier for labs that lack large curated sequence collections.
- Open interfaces for new benchmarks could gradually create community standards for evaluating genomic models.
- Integration of generation and interpretation in one tool might enable closed-loop experiments where generated sequences are immediately tested for biological plausibility.
Load-bearing premise
The sparse auto-encoder delivers reliable biological interpretation when validated only on DNABERT-2 and may not generalize or yield falsifiable predictions on other genomic models.
What would settle it
Applying the same sparse auto-encoder to a different genomic foundation model and finding that recovered features fail to match known regulatory motifs or produce non-reproducible biological annotations would show the interpreter does not generalize.
Figures
read the original abstract
We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection, model tuning, inference, benchmarking, and interpretability. For data collection, Genome-Factory offers an automated pipeline to download genomic sequences and preprocess them. For model tuning, Genome-Factory supports both full and parameter-efficient fine-tuning across diverse genomic models. For inference, Genome-Factory enables both embedding extraction and DNA sequence generation. For benchmarking, we include two existing benchmarks and provide a flexible interface to incorporate additional benchmarks. For interpretability, Genome-Factory introduces an open-source biological interpreter based on a sparse auto-encoder. We validate the utility of Genome-Factory across three dimensions: (i) Compatibility with diverse models and fine-tuning methods; (ii) Benchmarking downstream performance using two open-source benchmarks; (iii) Biological interpretation of learned representations with DNABERT-2. These results highlight its practical value for real-world genomic analysis. GitHub: https://github.com/WeiminWu2000/Genome_Factory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Genome-Factory, the first integrated Python library for genomic foundation models. It unifies data collection and preprocessing pipelines, full and parameter-efficient fine-tuning across multiple models, inference for embeddings and sequence generation, benchmarking interfaces, and an open-source biological interpreter based on a sparse auto-encoder. Utility is shown via three validation axes: compatibility with diverse models and tuning methods, downstream performance on two existing benchmarks, and biological interpretation of representations from DNABERT-2.
Significance. If the library and its components function as described, the work would offer a practical, unified toolkit that reduces fragmentation in genomic model development workflows. The open-source release, support for both full and PEFT tuning, and inclusion of an interpretability module represent concrete contributions to the community. The GitHub repository further enables reproducibility and extension.
major comments (1)
- The biological interpreter based on the sparse auto-encoder is validated exclusively with DNABERT-2. The manuscript positions the library for compatibility with a range of genomic models (including those used for full/PEFT tuning and embedding extraction), yet provides no results demonstrating that the same interpreter yields reliable or generalizable biological interpretations on other models. This weakens the claim of a unified interpretability workflow.
minor comments (2)
- The abstract asserts that Genome-Factory is 'the first integrated' library but does not reference or compare against prior tools for genomic model handling; a short related-work paragraph would strengthen the novelty claim.
- Quantitative performance numbers, error analysis, or baseline comparisons for the benchmarking and tuning components are not summarized in the abstract; ensure these are clearly reported with tables or figures in the main text.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the single major comment below and will revise the manuscript to better delineate the current scope of the interpretability module.
read point-by-point responses
-
Referee: The biological interpreter based on the sparse auto-encoder is validated exclusively with DNABERT-2. The manuscript positions the library for compatibility with a range of genomic models (including those used for full/PEFT tuning and embedding extraction), yet provides no results demonstrating that the same interpreter yields reliable or generalizable biological interpretations on other models. This weakens the claim of a unified interpretability workflow.
Authors: We agree that the empirical validation of the biological interpreter is currently limited to DNABERT-2. The interpreter is implemented to operate directly on embedding vectors produced by any model supported by the library, making it architecture-agnostic in principle. Nevertheless, we acknowledge that demonstrating reliable biological interpretations on additional models would strengthen the claim of a unified workflow. We will revise the manuscript to (i) explicitly state that the interpreter is designed for general use across supported models and (ii) clarify that comprehensive cross-model validation remains future work. If space allows, we will also include a short additional demonstration with at least one other model (e.g., a different DNABERT variant or Enformer) to illustrate transferability of the approach. revision: yes
Circularity Check
No circularity: software library with external validations, no derivation chain
full rationale
The manuscript introduces Genome-Factory as an integrated Python library supporting data pipelines, full/PEFT tuning, embedding/generation inference, existing benchmarks, and a sparse-autoencoder interpreter. Validation is reported across compatibility with listed models, two open benchmarks, and biological interpretation performed on DNABERT-2. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims are engineering and empirical (library features plus reported runs on external models/benchmarks), not reductions of outputs to inputs by construction. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For interpretability, GENOME-FACTORY introduces an open-source biological interpreter based on a sparse auto-encoder. ... links them to interpretable genomic features by regressing on external readouts.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We validate ... (iii) Biological interpretation of learned representations with DNABERT-2.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Discrete Flow Matching Policy Optimization
DoMinO reformulates discrete flow matching sampling as an MDP for unbiased RL fine-tuning with new TV regularizers, yielding better enhancer activity and naturalness on DNA design tasks.
Reference graph
Works this paper leans on
-
[1]
Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild
Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, and James Zou. Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[2]
Pwmscan: a fast tool for scanning entire genomes with a position-specific weight matrix
Giovanna Ambrosini, Romain Groux, and Philipp Bucher. Pwmscan: a fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics, 34 0 (14): 0 2483--2484, 2018
work page 2018
-
[3]
Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo
Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, Benjamin Schmidt, and Andriy Mulyar. Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. GitHub https://github. com/nomic-ai/gpt4all, 2023
work page 2023
-
[4]
Genomic language models could transform medicine but not yet
Micaela Elisa Consens, Ben Li, Anna R Poetsch, and Stephen Gilbert. Genomic language models could transform medicine but not yet. NPJ Digit. Med., 8 0 (1): 0 212, April 2025
work page 2025
-
[5]
Nucleotide transformer: building and evaluating robust foundation models for human genomics
Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Bernardo P de Almeida, Hassan Sirelkhatim, et al. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nature Methods, 22 0 (2): 0 287--297, 2025
work page 2025
-
[6]
Flashattention: Fast and memory-efficient exact attention with io-awareness
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R \'e . Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35: 0 16344--16359, 2022
work page 2022
-
[7]
A tutorial on the cross-entropy method
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. A tutorial on the cross-entropy method. Annals of operations research, 134 0 (1): 0 19--67, 2005
work page 2005
-
[8]
Lmflow: An extensible toolkit for finetuning and inference of large foundation models
Shizhe Diao, Rui Pan, Hanze Dong, Kashun Shum, Jipeng Zhang, Wei Xiong, and Tong Zhang. Lmflow: An extensible toolkit for finetuning and inference of large foundation models. In NAACL (Demonstrations), 2024
work page 2024
-
[9]
Extreme compression of large language models via additive quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, and Dan Alistarh. Extreme compression of large language models via additive quantization. In Proceedings of the 41st International Conference on Machine Learning, pages 12284--12303, 2024
work page 2024
-
[10]
EpiGePT : a pretrained transformer-based language model for context-specific human epigenomics
Zijing Gao, Qiao Liu, Wanwen Zeng, Rui Jiang, and Wing Hung Wong. EpiGePT : a pretrained transformer-based language model for context-specific human epigenomics. Genome Biol., 25 0 (1): 0 310, December 2024
work page 2024
-
[11]
Lewis Y Geer, Aron Marchler-Bauer, Renata C Geer, Lianyi Han, Jane He, Siqian He, Chunlei Liu, Wenyao Shi, and Stephen H Bryant. The ncbi biosystems database. Nucleic acids research, 38 0 (suppl\_1): 0 D492--D496, 2010
work page 2010
-
[12]
Genomic benchmarks: a collection of datasets for genomic sequence classification
Katar \' na Gre s ov \'a , Vlastimil Martinek, David C ech \'a k, Petr S ime c ek, and Panagiotis Alexiou. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data, 24 0 (1): 0 25, 2023
work page 2023
-
[13]
Mamba: Linear-time sequence modeling with selective state spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. First Conference on Language Modeling, 2024
work page 2024
-
[14]
On the effectiveness of adapter-based tuning for pretrained language model adaptation
Ruidan He, Linlin Liu, Hai Ye, Qingyu Tan, Bosheng Ding, Liying Cheng, Jiawei Low, Lidong Bing, and Luo Si. On the effectiveness of adapter-based tuning for pretrained language model adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Process...
work page 2021
-
[15]
Liger Kernel: Efficient Triton Kernels for
Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, and Yanning Chen. Liger kernel: Efficient triton kernels for llm training. arXiv preprint arXiv:2410.10989, 2024
-
[16]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. International Conference on Learning Representations, 2022
work page 2022
-
[17]
Genomic language model predicts protein co-regulation and function
Yunha Hwang, Andre L Cornman, Elizabeth H Kellogg, Sergey Ovchinnikov, and Peter R Girguis. Genomic language model predicts protein co-regulation and function. Nat. Commun., 15 0 (1): 0 2880, April 2024
work page 2024
-
[18]
Pytorch distributed: experiences on accelerating data parallel training
Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. Pytorch distributed: experiences on accelerating data parallel training. Proceedings of the VLDB Endowment, 13 0 (12): 0 3005--3018, 2020
work page 2020
-
[19]
Colossal-ai: A unified deep learning system for large-scale parallel training
Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, and Yang You. Colossal-ai: A unified deep learning system for large-scale parallel training. In Proceedings of the 52nd International Conference on Parallel Processing, pages 766--775, 2023
work page 2023
-
[20]
Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution
Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Michael Wornow, Callum Birch-Sykes, Stefano Massaroli, Aman Patel, Clayton Rabideau, Yoshua Bengio, et al. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. Advances in Neural Information Processing Systems, 36: 0 43177--43201, 2023
work page 2023
-
[21]
Sequence modeling and design from molecular to genome scale with evo
Eric Nguyen, Michael Poli, Matthew G Durrant, Brian Kang, Dhruva Katrekar, David B Li, Liam J Bartie, Armin W Thomas, Samuel H King, Garyk Brixi, et al. Sequence modeling and design from molecular to genome scale with evo. Science, 386 0 (6723): 0 eado9336, 2024
work page 2024
-
[22]
An Introduction to Convolutional Neural Networks
Keiron O'shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[23]
Multilayer perceptron and neural networks
Marius-Constantin Popescu, Valentina E Balas, Liliana Perescu-Popescu, and Nikos Mastorakis. Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8 0 (7): 0 579--588, 2009
work page 2009
-
[24]
Hpeak: an hmm-based algorithm for defining read-enriched regions in chip-seq data
Zhaohui S Qin, Jianjun Yu, Jincheng Shen, Christopher A Maher, Ming Hu, Shanker Kalyana-Sundaram, Jindan Yu, and Arul M Chinnaiyan. Hpeak: an hmm-based algorithm for defining read-enriched regions in chip-seq data. BMC bioinformatics, 11: 0 1--13, 2010
work page 2010
-
[25]
Neurips 2023 llm efficiency fine-tuning competition
Mark Saroufim, Yotam Perlitz, Leshem Choshen, Luca Antiga, Greg Bowyer, Christian Puhrsch, Driss Guessous, Supriya Rao, Geeta Chauhan, Ashvini Kumar, et al. Neurips 2023 llm efficiency fine-tuning competition. arXiv preprint arXiv:2503.13507, 2025
-
[26]
Caduceus: Bi-directional equivariant long-range dna sequence modeling
Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov. Caduceus: Bi-directional equivariant long-range dna sequence modeling. In International Conference on Machine Learning, pages 43632--43648. PMLR, 2024
work page 2024
-
[27]
Mark D Schluchter. Mean square error. Encyclopedia of Biostatistics, 5, 2005
work page 2005
-
[28]
How far can camels go? exploring the state of instruction tuning on open resources
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Chandu, David Wadden, Kelsey MacMillan, Noah A Smith, Iz Beltagy, et al. How far can camels go? exploring the state of instruction tuning on open resources. Advances in Neural Information Processing Systems, 36: 0 74764--74786, 2023
work page 2023
-
[29]
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R \'e mi Louf, Morgan Funtowicz, et al. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[30]
Llama-adapter: Efficient fine-tuning of language models with zero-init attention
Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, and Yu Qiao. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. International Conference on Learning Representations, 2024
work page 2024
-
[31]
Judging llm-as-a-judge with mt-bench and chatbot arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36: 0 46595--46623, 2023
work page 2023
-
[32]
Llamafactory: Unified efficient fine-tuning of 100+ language models
Yaowei Zheng, Richong Zhang, Junhao Zhang, YeYanhan YeYanhan, and Zheyan Luo. Llamafactory: Unified efficient fine-tuning of 100+ language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 400--410, 2024
work page 2024
-
[33]
Dnabert-2: Efficient foundation model and benchmark for multi-species genome
Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. In International Conference on Learning Representations, 2024
work page 2024
-
[34]
Genomeocean: An efficient genome foundation model trained on large-scale metagenomic assemblies
Zhihan Zhou, Robert Riley, Satria Kautsar, Weimin Wu, Rob Egan, Steven Hofmeyr, Shira Goldhaber-Gordon, Mutian Yu, Harrison Ho, Fengchen Liu, et al. Genomeocean: An efficient genome foundation model trained on large-scale metagenomic assemblies. bioRxiv, pages 2025--01, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.