pith. sign in

hub Canonical reference

Unified multimodal discrete diffusion

Canonical reference. 80% of citing Pith papers cite this work as background.

13 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 baseline 1

citation-polarity summary

years

2026 7 2025 6

clear filters

representative citing papers

AsyncPatch Diffusion: spatially-flexible image generation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

AsyncPatch Diffusion introduces asynchronous per-region noise levels in diffusion models, proves a valid ELBO, and uses a controlled sampler to support spatially adaptive generation and native inpainting.

DVD: Discrete Voxel Diffusion for 3D Generation and Editing

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

DVD applies discrete diffusion directly to voxel occupancy for 3D generation, uncertainty estimation via entropy, and single-round editing via block perturbation fine-tuning.

TBD-VLA: Temporal Block Diffusion Vision Language Action Model

cs.CV · 2026-06-05 · unverdicted · novelty 5.0

TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

citing papers explorer

Showing 6 of 6 citing papers after filters.

  • AsyncPatch Diffusion: spatially-flexible image generation cs.CV · 2026-06-05 · unverdicted · none · ref 46

    AsyncPatch Diffusion introduces asynchronous per-region noise levels in diffusion models, proves a valid ELBO, and uses a controlled sampler to support spatially adaptive generation and native inpainting.

  • DVD: Discrete Voxel Diffusion for 3D Generation and Editing cs.CV · 2026-05-08 · unverdicted · none · ref 27 · 2 links

    DVD applies discrete diffusion directly to voxel occupancy for 3D generation, uncertainty estimation via entropy, and single-round editing via block perturbation fine-tuning.

  • Bridging Video Understanding and Generation in a Unified Framework cs.CV · 2026-06-30 · unverdicted · none · ref 50

    Vega unifies video understanding and generation via shared vocabulary and hybrid autoregressive-diffusion architecture, reporting strong results on VBench and VideoMME.

  • TBD-VLA: Temporal Block Diffusion Vision Language Action Model cs.CV · 2026-06-05 · unverdicted · none · ref 15

    TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.

  • MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset cs.CV · 2026-05-20 · unverdicted · none · ref 88

    MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.

  • Show-o2: Improved Native Unified Multimodal Models cs.CV · 2025-06-18 · unverdicted · none · ref 98

    Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.