pith. sign in

Open-sora: Democratizing efficient video production for all

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

baseline 1

citation-polarity summary

fields

cs.CV 1 cs.RO 1

years

2026 1 2024 1

verdicts

UNVERDICTED 2

roles

baseline 1

polarities

baseline 1

representative citing papers

MotuBrain: An Advanced World Action Model for Robot Control

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

MotuBrain jointly models video and action via a three-stream Mixture-of-Transformers UniDiffuser to reach 95.8-96.1% success on RoboTwin 2.0 benchmarks, top EWMScore, and fast 11 Hz inference while adapting to new robots with 50-100 trajectories.

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

citing papers explorer

Showing 2 of 2 citing papers.

  • MotuBrain: An Advanced World Action Model for Robot Control cs.RO · 2026-04-30 · unverdicted · none · ref 39

    MotuBrain jointly models video and action via a three-stream Mixture-of-Transformers UniDiffuser to reach 95.8-96.1% success on RoboTwin 2.0 benchmarks, top EWMScore, and fast 11 Hz inference while adapting to new robots with 50-100 trajectories.

  • Emu3: Next-Token Prediction is All You Need cs.CV · 2024-09-27 · unverdicted · none · ref 107

    Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.