pith. sign in

MMAR: towards lossless multi-modal auto-regressive probabilistic modeling

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

baseline 1

citation-polarity summary

fields

cs.CV 3

years

2025 3

verdicts

UNVERDICTED 3

roles

baseline 1

polarities

baseline 1

representative citing papers

Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

citing papers explorer

Showing 3 of 3 citing papers.

  • MARRS: Masked Autoregressive Unit-based Reaction Synthesis cs.CV · 2025-05-16 · unverdicted · none · ref 38

    MARRS synthesizes fine-grained reaction motions via unit-distinguished VAE, masked action-conditioned fusion, mutual unit modulation, and compact MLP diffusion predictors.

  • Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation cs.CV · 2025-05-08 · unverdicted · none · ref 90

    Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

  • Show-o2: Improved Native Unified Multimodal Models cs.CV · 2025-06-18 · unverdicted · none · ref 134

    Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.