M²-REPA decouples modality-specific features from diffusion intermediates and aligns them to complementary expert foundation models via a multi-modal alignment loss and modality-specific decoupling regularization for improved multimodal video generation.
In: International conference on machine learning
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Mirage auditing framework reveals that VFL unlearning methods passing output-level certification retain substantial class structure in representations, with no method achieving high utility plus both output and representation forgetting, plus class-sample asymmetry in residual traces.
HILBERT uses joint-centric dual contrastive learning with CKA and mutual information regularizers to align long-sequence audio-text embeddings while preserving structure and balancing modalities.
citing papers explorer
-
Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models
M²-REPA decouples modality-specific features from diffusion intermediates and aligns them to complementary expert foundation models via a multi-modal alignment loss and modality-specific decoupling regularization for improved multimodal video generation.
-
Do Vision Models Truly Forget? New Findings from Representation-Level Certification of Visual Unlearning in Vertical Federated Learning
Mirage auditing framework reveals that VFL unlearning methods passing output-level certification retain substantial class structure in representations, with no method achieving high utility plus both output and representation forgetting, plus class-sample asymmetry in residual traces.
-
Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization
HILBERT uses joint-centric dual contrastive learning with CKA and mutual information regularizers to align long-sequence audio-text embeddings while preserving structure and balancing modalities.