JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

Alexander W. Charney; Amit Bleiweiss; Ariel Larey; Dan Dominissini; Dan Ofer; Dung Hoang; Elay Dahan; Gideon Rechavi; Guy Leib; Marissa Wirth

arxiv: 2602.17162 · v2 · pith:NXFTGE4Wnew · submitted 2026-02-19 · 💻 cs.AI · q-bio.GN

JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

Ariel Larey , Elay Dahan , Amit Bleiweiss , Raizy Kellerman , Guy Leib , Omri Nayshool , Dan Ofer , Tal Zinger

show 10 more authors

Dan Dominissini Gideon Rechavi Nicole Bussola Simon Lee Shane O'Connell Dung Hoang Marissa Wirth Alexander W. Charney Nati Daniel Yoli Shavit

This is my paper

classification 💻 cs.AI q-bio.GN

keywords generativejepa-dnagenomicmodelslatentarchitecturefoundationframework

0 comments

read the original abstract

Genomic Foundation Models (GFMs) typically rely on Masked Language Modeling (MLM) or Next-Token Prediction (NTP) to learn the "Laws of Nature". While effective at capturing local syntax, these generative paradigms prioritize token-level reconstruction over high-level functional context. We introduce JEPA-DNA, a model-agnostic continual training framework that integrates a Joint-Embedding Predictive Architecture (JEPA) with traditional generative objectives. By supervising global sequence embeddings in a latent space, JEPA-DNA forces models to predict the functional representations of masked genomic segments, shifting the learning signal from token recovery to semantic alignment. We evaluate JEPA-DNA on 17 diverse genomic benchmark tasks, demonstrating consistent gains in linear probing and zero-shot performance regardless of the underlying GFM architecture or generative objective. Our framework establishes a new state-of-the-art for GFMs, surpassing the best existing models by bridging generative precision with latent semantic grounding. Through extensive ablation studies, we further characterize the synergistic interplay between generative and latent objectives. Our code is publicly available at https://github.com/NVIDIA-Digital-Bio/JEPA-DNA.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ProteinJEPA: Latent prediction complements protein language models
cs.LG 2026-05 unverdicted novelty 7.0

Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.
AURORA: Contextual Orthogonalization for Geometric Representation Learning in Healthcare Foundation Models
cs.LG 2026-05 unverdicted novelty 6.0

AURORA is a representation learning framework that uses contextual orthogonalization and relational alignment to create disentangled, geometrically interpretable latent spaces in healthcare foundation models.
Event Fields: Learning Latent Event Structure for Waveform Foundation Models
cs.LG 2026-05 unverdicted novelty 6.0

Event-centric waveform foundation models are learned via self-supervised consistency on latent event structures and interactions, yielding improved performance and label efficiency over sequence-based baselines on phy...
Uncertainty-Aware Foundation Models for Clinical Data
cs.LG 2026-04 unverdicted novelty 6.0

The work introduces uncertainty-aware foundation models for clinical data by learning set-valued patient representations that enforce consistency across partial observations and integrate multimodal self-supervised ob...
WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records
cs.LG 2026-05 unverdicted novelty 5.0

WISTERIA learns robust clinical representations from noisy EHR labels by enforcing consistency across multiple weak supervision views plus ontology regularization.