AIA loss teaches unified multimodal models task-specific cross-modal attention patterns to reduce conflicts between image understanding and generation without architecture decoupling.
Sparsemm: Head sparsity emerges from visual concept re- sponses in mllms
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
A tri-modal model with LLM-generated text from MRIs and a vision-guided dual alignment fusion module achieves state-of-the-art performance on real-world ischemic stroke prognosis prediction.
HybridKV reduces KV cache memory by up to 7.9x and speeds decoding by 1.52x in MLLMs with almost no performance loss by classifying heads into static and dynamic types and compressing them differently.
citing papers explorer
-
AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model
AIA loss teaches unified multimodal models task-specific cross-modal attention patterns to reduce conflicts between image understanding and generation without architecture decoupling.
-
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke
A tri-modal model with LLM-generated text from MRIs and a vision-guided dual alignment fusion module achieves state-of-the-art performance on real-world ischemic stroke prognosis prediction.
-
HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference
HybridKV reduces KV cache memory by up to 7.9x and speeds decoding by 1.52x in MLLMs with almost no performance loss by classifying heads into static and dynamic types and compressing them differently.