pith. sign in

hub

Generative multimodal models are in-context learners.CoRR, abs/2312.13286, 2023a

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

citation-role summary

background 3 baseline 1

citation-polarity summary

fields

cs.CV 9 cs.CL 1

representative citing papers

MMaDA: Multimodal Large Diffusion Language Models

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

MMaDA is a unified multimodal diffusion model using mixed chain-of-thought fine-tuning and a new UniGRPO reinforcement learning algorithm that outperforms specialized models in reasoning, understanding, and text-to-image tasks.

CogVLM: Visual Expert for Pretrained Language Models

cs.CV · 2023-11-06 · conditional · novelty 6.0

CogVLM adds a trainable visual expert inside frozen language model layers for deep vision-language fusion and reports state-of-the-art results on ten cross-modal benchmarks while preserving NLP performance.

PaliGemma: A versatile 3B VLM for transfer

cs.CV · 2024-07-10 · unverdicted · novelty 4.0

PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.

citing papers explorer

Showing 10 of 10 citing papers.