pith. sign in

arxiv: 2602.02014 · v2 · pith:WU3AYYJBnew · submitted 2026-02-02 · 💻 cs.CV · cs.AI· cs.CL· cs.LG

Rethinking Genomic Modeling Through Optical Character Recognition

classification 💻 cs.CV cs.AIcs.CLcs.LG
keywords genomicopticaldnavisualcharactercompressiondocumenteffectiveencoder
0
0 comments X
read the original abstract

Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequential reading is structurally misaligned with sparse and discontinuous genomic semantics, leading to wasted computation on low-information background and preventing understanding-driven compression for long contexts. Here, we present OpticalDNA, a vision-based framework that reframes genomic modeling as Optical Character Recognition (OCR)-style document understanding. OpticalDNA renders DNA into structured visual layouts and trains an OCR-capable vision--language model with a visual DNA encoder and a document decoder, where the encoder produces compact, reconstructible visual tokens for high-fidelity compression. Building on this representation, OpticalDNA defines prompt-conditioned objectives over core genomic primitives-reading, region grounding, subsequence retrieval, and masked span completion-thereby learning layout-aware DNA representations that retain fine-grained genomic information under a reduced effective token budget. Across diverse genomic benchmarks, OpticalDNA consistently outperforms recent baselines; on sequences up to 450k bases, it achieves the best overall performance with nearly 20$\times$ fewer effective tokens, and surpasses models with up to 985$\times$ more activated parameters while tuning only 256k trainable parameters.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.