pith. sign in

Interleaved scene graphs for interleaved text-and-image generation assess- ment

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

baseline 1 method 1

citation-polarity summary

fields

cs.CV 3

years

2026 2 2025 1

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Emu3.5: Native Multimodal Models are World Learners

cs.CV · 2025-10-30 · unverdicted · novelty 6.0

Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.

Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation

cs.CV · 2026-06-29 · unverdicted · novelty 4.0

ILLUME-X is a unified multimodal model that generates free-form interleaved text-image sequences via an expanded data pipeline, progressive self-adaptive training, and ILScore evaluation, claiming outperformance over prior unified models on style transfer, image decomposition, and storytelling.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Emu3.5: Native Multimodal Models are World Learners cs.CV · 2025-10-30 · unverdicted · none · ref 14

    Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.