pith. sign in

arxiv: 2509.12266 · v2 · pith:OKJSSWAZnew · submitted 2025-09-13 · 🧬 q-bio.GN · cs.LG

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

Pith reviewed 2026-05-21 22:48 UTC · model grok-4.3

classification 🧬 q-bio.GN cs.LG
keywords genomic foundation modelsPython librarysparse auto-encodermodel interpretabilityfine-tuningDNA sequence analysisbenchmarkinggenomic embeddings
0
0 comments X

The pith

Genome-Factory is the first integrated Python library that handles data collection, tuning, inference, benchmarking, and biological interpretation for genomic foundation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Genome-Factory as a unified Python library meant to remove friction from the full cycle of genomic foundation model work. It supplies automated pipelines for downloading and preprocessing DNA sequences, supports full and parameter-efficient fine-tuning on multiple models, enables embedding extraction and sequence generation, includes existing benchmarks with an open interface for more, and adds a sparse auto-encoder interpreter for extracting biological meaning from learned representations. Validation shows the library works across models and fine-tuning styles, produces competitive benchmark scores, and yields interpretations when applied to DNABERT-2. If correct, the library would let researchers move from raw sequences to testable biological hypotheses with far less custom engineering.

Core claim

Genome-Factory supplies an automated data pipeline, unified support for full and parameter-efficient fine-tuning, embedding and generation inference, benchmark interfaces, and an open-source sparse auto-encoder interpreter that turns model representations into biological signals, demonstrated on DNABERT-2.

What carries the argument

The Genome-Factory library itself, whose core component is a sparse auto-encoder that maps high-dimensional genomic embeddings to sparse, biologically readable features.

If this is right

  • Researchers can switch between genomic models and fine-tuning methods without rewriting data or training code.
  • Standardized benchmarking becomes possible through the included interfaces and two supplied evaluation suites.
  • Interpretability is added as a default step rather than a separate research project.
  • Synthetic sequence generation and embedding extraction become routine operations inside one codebase.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the interpreter proves stable across models, it could serve as a common lens for comparing what different genomic foundation models have actually learned.
  • The automated data pipeline lowers the barrier for labs that lack large curated sequence collections.
  • Open interfaces for new benchmarks could gradually create community standards for evaluating genomic models.
  • Integration of generation and interpretation in one tool might enable closed-loop experiments where generated sequences are immediately tested for biological plausibility.

Load-bearing premise

The sparse auto-encoder delivers reliable biological interpretation when validated only on DNABERT-2 and may not generalize or yield falsifiable predictions on other genomic models.

What would settle it

Applying the same sparse auto-encoder to a different genomic foundation model and finding that recovered features fail to match known regulatory motifs or produce non-reproducible biological annotations would show the interpreter does not generalize.

Figures

Figures reproduced from arXiv: 2509.12266 by Han Liu, Jerry Yao-Chieh Hu, Qinjie Lin, Weimin Wu, Xuefeng Song, Yibo Wen, Zhihan Zhou, Zhong Wang.

Figure 1
Figure 1. Figure 1: Overview of GENOME-FACTORY. The framework consists of six components. Genome Col￾lector acquires genomic sequences from public repositories and performs preprocessing (e.g., GC normal￾ization, ambiguous base correction). Model Loader supports major genomic models (e.g., GenomeOcean, EVO, DNABERT-2, HyenaDNA, Caduceus, Nucleotide Transformer) and their tokenizers. Model Trainer configures workflows, adapts … view at source ↗
Figure 2
Figure 2. Figure 2: Trade-off between tuning efficiency and performance. The figure shows memory usage in gigabytes (GB), throughput in kilotokens per second (KTok/s), and averaged scores on the GUE benchmark for three models: DNABERT-2, HyenaDNA-160k, and Nucleotide Transformer-500M. We report results for full-tuning (Full), low-rank adaptation (LoRA), and adapter-based fine-tuning (Adapter). The results highlight the trade-… view at source ↗
read the original abstract

We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection, model tuning, inference, benchmarking, and interpretability. For data collection, Genome-Factory offers an automated pipeline to download genomic sequences and preprocess them. For model tuning, Genome-Factory supports both full and parameter-efficient fine-tuning across diverse genomic models. For inference, Genome-Factory enables both embedding extraction and DNA sequence generation. For benchmarking, we include two existing benchmarks and provide a flexible interface to incorporate additional benchmarks. For interpretability, Genome-Factory introduces an open-source biological interpreter based on a sparse auto-encoder. We validate the utility of Genome-Factory across three dimensions: (i) Compatibility with diverse models and fine-tuning methods; (ii) Benchmarking downstream performance using two open-source benchmarks; (iii) Biological interpretation of learned representations with DNABERT-2. These results highlight its practical value for real-world genomic analysis. GitHub: https://github.com/WeiminWu2000/Genome_Factory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces Genome-Factory, the first integrated Python library for genomic foundation models. It unifies data collection and preprocessing pipelines, full and parameter-efficient fine-tuning across multiple models, inference for embeddings and sequence generation, benchmarking interfaces, and an open-source biological interpreter based on a sparse auto-encoder. Utility is shown via three validation axes: compatibility with diverse models and tuning methods, downstream performance on two existing benchmarks, and biological interpretation of representations from DNABERT-2.

Significance. If the library and its components function as described, the work would offer a practical, unified toolkit that reduces fragmentation in genomic model development workflows. The open-source release, support for both full and PEFT tuning, and inclusion of an interpretability module represent concrete contributions to the community. The GitHub repository further enables reproducibility and extension.

major comments (1)
  1. The biological interpreter based on the sparse auto-encoder is validated exclusively with DNABERT-2. The manuscript positions the library for compatibility with a range of genomic models (including those used for full/PEFT tuning and embedding extraction), yet provides no results demonstrating that the same interpreter yields reliable or generalizable biological interpretations on other models. This weakens the claim of a unified interpretability workflow.
minor comments (2)
  1. The abstract asserts that Genome-Factory is 'the first integrated' library but does not reference or compare against prior tools for genomic model handling; a short related-work paragraph would strengthen the novelty claim.
  2. Quantitative performance numbers, error analysis, or baseline comparisons for the benchmarking and tuning components are not summarized in the abstract; ensure these are clearly reported with tables or figures in the main text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the single major comment below and will revise the manuscript to better delineate the current scope of the interpretability module.

read point-by-point responses
  1. Referee: The biological interpreter based on the sparse auto-encoder is validated exclusively with DNABERT-2. The manuscript positions the library for compatibility with a range of genomic models (including those used for full/PEFT tuning and embedding extraction), yet provides no results demonstrating that the same interpreter yields reliable or generalizable biological interpretations on other models. This weakens the claim of a unified interpretability workflow.

    Authors: We agree that the empirical validation of the biological interpreter is currently limited to DNABERT-2. The interpreter is implemented to operate directly on embedding vectors produced by any model supported by the library, making it architecture-agnostic in principle. Nevertheless, we acknowledge that demonstrating reliable biological interpretations on additional models would strengthen the claim of a unified workflow. We will revise the manuscript to (i) explicitly state that the interpreter is designed for general use across supported models and (ii) clarify that comprehensive cross-model validation remains future work. If space allows, we will also include a short additional demonstration with at least one other model (e.g., a different DNABERT variant or Enformer) to illustrate transferability of the approach. revision: yes

Circularity Check

0 steps flagged

No circularity: software library with external validations, no derivation chain

full rationale

The manuscript introduces Genome-Factory as an integrated Python library supporting data pipelines, full/PEFT tuning, embedding/generation inference, existing benchmarks, and a sparse-autoencoder interpreter. Validation is reported across compatibility with listed models, two open benchmarks, and biological interpretation performed on DNABERT-2. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims are engineering and empirical (library features plus reported runs on external models/benchmarks), not reductions of outputs to inputs by construction. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no mathematical axioms, free parameters, or new physical entities; its contribution is an engineering artifact whose correctness depends on standard software assumptions such as correct implementation of existing fine-tuning algorithms and faithful reproduction of benchmark datasets.

pith-pipeline@v0.9.0 · 5757 in / 1151 out tokens · 49153 ms · 2026-05-21T22:48:48.941862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Discrete Flow Matching Policy Optimization

    cs.LG 2026-04 unverdicted novelty 7.0

    DoMinO reformulates discrete flow matching sampling as an MDP for unbiased RL fine-tuning with new TV regularizers, yielding better enhancer activity and naturalness on DNA design tasks.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

    Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, and James Zou. Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569, 2019

  2. [2]

    Pwmscan: a fast tool for scanning entire genomes with a position-specific weight matrix

    Giovanna Ambrosini, Romain Groux, and Philipp Bucher. Pwmscan: a fast tool for scanning entire genomes with a position-specific weight matrix. Bioinformatics, 34 0 (14): 0 2483--2484, 2018

  3. [3]

    Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo

    Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, Benjamin Schmidt, and Andriy Mulyar. Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. GitHub https://github. com/nomic-ai/gpt4all, 2023

  4. [4]

    Genomic language models could transform medicine but not yet

    Micaela Elisa Consens, Ben Li, Anna R Poetsch, and Stephen Gilbert. Genomic language models could transform medicine but not yet. NPJ Digit. Med., 8 0 (1): 0 212, April 2025

  5. [5]

    Nucleotide transformer: building and evaluating robust foundation models for human genomics

    Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Bernardo P de Almeida, Hassan Sirelkhatim, et al. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nature Methods, 22 0 (2): 0 287--297, 2025

  6. [6]

    Flashattention: Fast and memory-efficient exact attention with io-awareness

    Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R \'e . Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35: 0 16344--16359, 2022

  7. [7]

    A tutorial on the cross-entropy method

    Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. A tutorial on the cross-entropy method. Annals of operations research, 134 0 (1): 0 19--67, 2005

  8. [8]

    Lmflow: An extensible toolkit for finetuning and inference of large foundation models

    Shizhe Diao, Rui Pan, Hanze Dong, Kashun Shum, Jipeng Zhang, Wei Xiong, and Tong Zhang. Lmflow: An extensible toolkit for finetuning and inference of large foundation models. In NAACL (Demonstrations), 2024

  9. [9]

    Extreme compression of large language models via additive quantization

    Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, and Dan Alistarh. Extreme compression of large language models via additive quantization. In Proceedings of the 41st International Conference on Machine Learning, pages 12284--12303, 2024

  10. [10]

    EpiGePT : a pretrained transformer-based language model for context-specific human epigenomics

    Zijing Gao, Qiao Liu, Wanwen Zeng, Rui Jiang, and Wing Hung Wong. EpiGePT : a pretrained transformer-based language model for context-specific human epigenomics. Genome Biol., 25 0 (1): 0 310, December 2024

  11. [11]

    The ncbi biosystems database

    Lewis Y Geer, Aron Marchler-Bauer, Renata C Geer, Lianyi Han, Jane He, Siqian He, Chunlei Liu, Wenyao Shi, and Stephen H Bryant. The ncbi biosystems database. Nucleic acids research, 38 0 (suppl\_1): 0 D492--D496, 2010

  12. [12]

    Genomic benchmarks: a collection of datasets for genomic sequence classification

    Katar \' na Gre s ov \'a , Vlastimil Martinek, David C ech \'a k, Petr S ime c ek, and Panagiotis Alexiou. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data, 24 0 (1): 0 25, 2023

  13. [13]

    Mamba: Linear-time sequence modeling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. First Conference on Language Modeling, 2024

  14. [14]

    On the effectiveness of adapter-based tuning for pretrained language model adaptation

    Ruidan He, Linlin Liu, Hai Ye, Qingyu Tan, Bosheng Ding, Liying Cheng, Jiawei Low, Lidong Bing, and Luo Si. On the effectiveness of adapter-based tuning for pretrained language model adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Process...

  15. [15]

    Liger Kernel: Efficient Triton Kernels for

    Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, and Yanning Chen. Liger kernel: Efficient triton kernels for llm training. arXiv preprint arXiv:2410.10989, 2024

  16. [16]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. International Conference on Learning Representations, 2022

  17. [17]

    Genomic language model predicts protein co-regulation and function

    Yunha Hwang, Andre L Cornman, Elizabeth H Kellogg, Sergey Ovchinnikov, and Peter R Girguis. Genomic language model predicts protein co-regulation and function. Nat. Commun., 15 0 (1): 0 2880, April 2024

  18. [18]

    Pytorch distributed: experiences on accelerating data parallel training

    Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. Pytorch distributed: experiences on accelerating data parallel training. Proceedings of the VLDB Endowment, 13 0 (12): 0 3005--3018, 2020

  19. [19]

    Colossal-ai: A unified deep learning system for large-scale parallel training

    Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, and Yang You. Colossal-ai: A unified deep learning system for large-scale parallel training. In Proceedings of the 52nd International Conference on Parallel Processing, pages 766--775, 2023

  20. [20]

    Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

    Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Michael Wornow, Callum Birch-Sykes, Stefano Massaroli, Aman Patel, Clayton Rabideau, Yoshua Bengio, et al. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. Advances in Neural Information Processing Systems, 36: 0 43177--43201, 2023

  21. [21]

    Sequence modeling and design from molecular to genome scale with evo

    Eric Nguyen, Michael Poli, Matthew G Durrant, Brian Kang, Dhruva Katrekar, David B Li, Liam J Bartie, Armin W Thomas, Samuel H King, Garyk Brixi, et al. Sequence modeling and design from molecular to genome scale with evo. Science, 386 0 (6723): 0 eado9336, 2024

  22. [22]

    An Introduction to Convolutional Neural Networks

    Keiron O'shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015

  23. [23]

    Multilayer perceptron and neural networks

    Marius-Constantin Popescu, Valentina E Balas, Liliana Perescu-Popescu, and Nikos Mastorakis. Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8 0 (7): 0 579--588, 2009

  24. [24]

    Hpeak: an hmm-based algorithm for defining read-enriched regions in chip-seq data

    Zhaohui S Qin, Jianjun Yu, Jincheng Shen, Christopher A Maher, Ming Hu, Shanker Kalyana-Sundaram, Jindan Yu, and Arul M Chinnaiyan. Hpeak: an hmm-based algorithm for defining read-enriched regions in chip-seq data. BMC bioinformatics, 11: 0 1--13, 2010

  25. [25]

    Neurips 2023 llm efficiency fine-tuning competition

    Mark Saroufim, Yotam Perlitz, Leshem Choshen, Luca Antiga, Greg Bowyer, Christian Puhrsch, Driss Guessous, Supriya Rao, Geeta Chauhan, Ashvini Kumar, et al. Neurips 2023 llm efficiency fine-tuning competition. arXiv preprint arXiv:2503.13507, 2025

  26. [26]

    Caduceus: Bi-directional equivariant long-range dna sequence modeling

    Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov. Caduceus: Bi-directional equivariant long-range dna sequence modeling. In International Conference on Machine Learning, pages 43632--43648. PMLR, 2024

  27. [27]

    Mean square error

    Mark D Schluchter. Mean square error. Encyclopedia of Biostatistics, 5, 2005

  28. [28]

    How far can camels go? exploring the state of instruction tuning on open resources

    Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Chandu, David Wadden, Kelsey MacMillan, Noah A Smith, Iz Beltagy, et al. How far can camels go? exploring the state of instruction tuning on open resources. Advances in Neural Information Processing Systems, 36: 0 74764--74786, 2023

  29. [29]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R \'e mi Louf, Morgan Funtowicz, et al. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019

  30. [30]

    Llama-adapter: Efficient fine-tuning of language models with zero-init attention

    Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, and Yu Qiao. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. International Conference on Learning Representations, 2024

  31. [31]

    Judging llm-as-a-judge with mt-bench and chatbot arena

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36: 0 46595--46623, 2023

  32. [32]

    Llamafactory: Unified efficient fine-tuning of 100+ language models

    Yaowei Zheng, Richong Zhang, Junhao Zhang, YeYanhan YeYanhan, and Zheyan Luo. Llamafactory: Unified efficient fine-tuning of 100+ language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 400--410, 2024

  33. [33]

    Dnabert-2: Efficient foundation model and benchmark for multi-species genome

    Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. In International Conference on Learning Representations, 2024

  34. [34]

    Genomeocean: An efficient genome foundation model trained on large-scale metagenomic assemblies

    Zhihan Zhou, Robert Riley, Satria Kautsar, Weimin Wu, Rob Egan, Steven Hofmeyr, Shira Goldhaber-Gordon, Mutian Yu, Harrison Ho, Fengchen Liu, et al. Genomeocean: An efficient genome foundation model trained on large-scale metagenomic assemblies. bioRxiv, pages 2025--01, 2025