pith. sign in

Canonical reference

Unified autoregressive visual generation and understanding with continuous tokens

Canonical reference. 100% of citing Pith papers cite this work as background.

7 Pith papers citing it
Background 100% of classified citations

citation-role summary

background 6

citation-polarity summary

years

2026 4 2025 3

roles

background 5

polarities

background 5

representative citing papers

Lance: Unified Multimodal Modeling by Multi-Task Synergy

cs.CV · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.

Image Diffusion Preview with Consistency Solver

cs.LG · 2025-12-15 · unverdicted · novelty 6.0

ConsistencySolver enables high-quality low-step diffusion previews by adapting general linear multistep methods into a lightweight RL-optimized solver, matching multistep DPM-Solver FID with 47% fewer steps and cutting user interaction time by nearly 50%.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

Step1X-Edit: A Practical Framework for General Image Editing

cs.CV · 2025-04-24 · unverdicted · novelty 4.0

Step1X-Edit integrates a multimodal LLM with a diffusion decoder, trained on a custom high-quality dataset, to deliver image editing performance that surpasses open-source baselines and approaches proprietary models on the new GEdit-Bench.

citing papers explorer

Showing 7 of 7 citing papers.