Mme-unify: A comprehensive benchmark for unified multimodal understanding and generation models

Wulin Xie, Yi-Fan Zhang, Chaoyou Fu, Yang Shi, Bingyan Nie, Hongkai Chen, Zhang Zhang, Liang Wang, Tieniu Tan · 2025 · arXiv 2504.03641

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning objectives are across modalities.

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

Unified multimodal models exhibit pseudo-unification due to modality-asymmetric entropy encoding and pattern-split responses between text and image generation.

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

LatentRouter routes image-question queries to the best MLLM by predicting counterfactual performance via latent communication between learned query capsules and model capability tokens.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models cs.CV · 2026-04-27 · unverdicted · none · ref 39
XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning objectives are across modalities.
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models cs.CV · 2026-04-13 · unverdicted · none · ref 71
Unified multimodal models exhibit pseudo-unification due to modality-asymmetric entropy encoding and pattern-split responses between text and image generation.
LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer? cs.AI · 2026-05-11 · unverdicted · none · ref 37
LatentRouter routes image-question queries to the best MLLM by predicting counterfactual performance via latent communication between learned query capsules and model capability tokens.

Mme-unify: A comprehensive benchmark for unified multimodal understanding and generation models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer