pith. sign in

Video-lavit: Unified video-language pre-training with decoupled visual-motional tokenization

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

fields

cs.CV 7 cs.IR 1

roles

background 3

polarities

background 3

representative citing papers

Semantic Generative Tuning for Unified Multimodal Models

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Semantic Generative Tuning uses image segmentation as a generative proxy to align misaligned representation spaces in unified multimodal models and improve both perception and generative layout fidelity.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

citing papers explorer

Showing 8 of 8 citing papers.