pith. sign in

hub Canonical reference

InstructPix2Pix: Learning to Follow Image Editing Instructions

Canonical reference. 71% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 71% of classified citations

hub tools

citation-role summary

background 5 baseline 1 method 1

citation-polarity summary

clear filters

representative citing papers

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

Delta Rectified Flow Sampling for Text-to-Image Editing

cs.CV · 2025-09-01 · unverdicted · novelty 7.0

DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.

Visual Instruction Tuning

cs.CV · 2023-04-17 · unverdicted · novelty 7.0

LLaVA is trained on GPT-4 generated visual instruction data to achieve 85.1% relative performance to GPT-4 on synthetic multimodal tasks and 92.53% accuracy on Science QA.

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

Stylistic Attribute Control in Latent Diffusion Models

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.

Toward Native Multimodal Modeling: A Roadmap

cs.CV · 2026-05-25 · unverdicted · novelty 3.0

A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.

citing papers explorer

Showing 11 of 11 citing papers after filters.