pith. sign in

hub Mixed citations

Ocrbench: on the hidden mystery of ocr in large multimodal models.Science China Information Sciences, 67(12):220102

Mixed citation behavior. Most common role is background (60%).

10 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 3 dataset 2

citation-polarity summary

fields

cs.CV 9 cs.GR 1

years

2026 7 2025 3

clear filters

representative citing papers

Let ViT Speak: Generative Language-Image Pre-training

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

GenLIP pretrains ViTs to generate language tokens from images via LM objective without contrastive batches or extra decoders, matching baselines on less data and improving on OCR after multi-resolution continued pretraining.

DeepEyesV2: Toward Agentic Multimodal Model

cs.CV · 2025-11-07 · unverdicted · novelty 6.0

DeepEyesV2 uses a two-stage cold-start plus reinforcement learning pipeline to produce an agentic multimodal model that adaptively invokes tools and outperforms direct RL on real-world reasoning benchmarks.

Seed1.5-VL Technical Report

cs.CV · 2025-05-11 · unverdicted · novelty 4.0

Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

citing papers explorer

Showing 9 of 9 citing papers after filters.