pith. sign in

hub Mixed citations

Instructpix2pix: Learning to follow image editing instructions

Mixed citation behavior. Most common role is background (60%).

15 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 4 other 1

citation-polarity summary

fields

cs.CV 14 cs.AI 1

years

2026 14 2025 1

polarities

background 3 unclear 2

representative citing papers

Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM

cs.CV · 2025-05-23 · unverdicted · novelty 6.0

Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.

SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

SWEET is a one-shot sparse visual planning framework that progressively generates manipulation keyframes via image editing conditioned on language and spatial guidance, then converts them to actions with a diffusion predictor, showing better fidelity and lower cost than video models on DROID and Rob

citing papers explorer

Showing 15 of 15 citing papers.