Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.
LLM-JEPA: Large language models meet joint embedding predictive architectures
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 2polarities
background 2representative citing papers
Applying STP at consecutive semantic reasoning steps achieves 168x more accurate multi-step latent prediction on ProcessBench than frozen baselines, with trajectories forming smooth curves best captured by non-linear predictors.
Crys-JEPA introduces a joint embedding predictive architecture that creates an energy-aware latent space, enabling embedding-based stability screening and a refinement pipeline that yields up to 72.7% gains on the V.S.U.N. metric for crystal generation.
Clin-JEPA is a multi-phase co-training framework for JEPA pretraining on EHR data that achieves convergent latent rollouts and improved multi-task AUROC on MIMIC-IV ICU records.
Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency in 360° environments.
DLLM-JEPA pairs JEPA with masked diffusion LMs to enable single-pass self-supervised fine-tuning that improves task accuracy, lowers held-out loss, and preserves base-model performance.
An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries visibly alter hidden-state geometry.
citing papers explorer
No citing papers match the current filters.