M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

Hongyang Lei, Xiaolong Cheng, Dan Wang, Kun Fan, Qi Qin, Huazhen Huang, Yetao Wu, Qingqing Gu, Zhonglin Jiang, Yong Chen, et al · 2024 · arXiv 2409.05929

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

cs.LG · 2026-05-22 · accept · novelty 5.0

A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

citing papers explorer

Showing 2 of 2 citing papers.

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring cs.CV · 2026-05-06 · unverdicted · none · ref 27
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey cs.LG · 2026-05-22 · accept · none · ref 6
A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

fields

years

verdicts

representative citing papers

citing papers explorer