pith. sign in

super hub Mixed citations

PaliGemma: A versatile 3B VLM for transfer

Mixed citation behavior. Most common role is background (59%).

109 Pith papers citing it
Background 59% of classified citations
abstract

PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

hub tools

citation-role summary

background 19 method 6 baseline 5 dataset 2

citation-polarity summary

claims ledger

  • abstract PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.

authors

co-cited works

clear filters

representative citing papers

Koshur Pixel: a large-scale synthetic ocr dataset for kashmiri

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Koshur Pixel is the first large-scale synthetic OCR dataset for Kashmiri with 613,078 image-text pairs generated via SynthOCR-Gen from the KS-PRET-5M corpus across multiple fonts and granularities with 25+ augmentations.

Large Language Model Selection with Limited Annotations

cs.CL · 2026-05-24 · unverdicted · novelty 7.0

SELECT-LLM is the first active model selection framework for LLMs that uses expected information gain from pairwise output similarities to minimize required annotations, reporting up to 84.78% cost reduction across 23 datasets and 156 models.

Dynamic Execution Commitment of Vision-Language-Action Models

cs.CV · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

A3 reframes dynamic action chunk commitment in VLA models as self-speculative prefix verification, accepting the longest continuous sequence of actions that satisfies consensus-ordered conditional invariance and prefix-closed sequential consistency.

DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

DSCA turns concept isolation into an architectural property by dynamically creating orthogonal subspaces for non-interfering lifelong edits in vision-language models, sustaining over 95% success after 1000 sequential edits.

SAM 3: Segment Anything with Concepts

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models cs.CV · 2026-04-02 · conditional · none · ref 44 · internal anchor

    UAV-Track VLA modifies the π0.5 VLA architecture with temporal compression and dual-branch decoding to reach 61.76% success and 269.65 average frames in long-distance pedestrian tracking on a new 890K-frame UAV dataset, while cutting inference latency by 33.4%.

  • ZAYA1-VL-8B Technical Report cs.CV · 2026-05-08 · unverdicted · none · ref 143 · internal anchor

    ZAYA1-VL-8B is a new MoE vision-language model with vision-specific LoRA adapters and bidirectional image attention that reports competitive performance against several 3B-4B models on image, reasoning, and counting benchmarks.