pith. sign in

hub

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

years

2026 10 2025 3

roles

background 2

polarities

background 2

representative citing papers

Learning to Track Instance from Single Nature Language Description

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

Tracker is a self-supervised VL tracker that uses a Dynamic Token Aggregation Module to learn instance tracking from single language descriptions in unlabeled videos and outperforms prior self-supervised methods.

PLUME: Latent Reasoning Based Universal Multimodal Embedding

cs.CV · 2026-04-02 · unverdicted · novelty 7.0

PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.

citing papers explorer

Showing 13 of 13 citing papers.