pith. machine review for the scientific record. sign in

Learning transferable visual models from natural language supervision

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CV 2

years

2026 2

representative citing papers

Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks

cs.CV · 2026-04-13 · accept · novelty 7.0

Boxes2Pixels distills noisy SAM pseudo-masks into a compact DINOv2-based student with auxiliary localization and one-sided self-correction, delivering +6.97 anomaly mIoU and +9.71 binary IoU gains over baselines on wind turbine data with 80% fewer parameters.

Do Vision Language Models Need to Process Image Tokens?

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

Visual representations in VLMs converge quickly to stable low-complexity forms while text continues evolving, with task-dependent needs for sustained image token access.

citing papers explorer

Showing 2 of 2 citing papers.

  • Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks cs.CV · 2026-04-13 · accept · none · ref 23

    Boxes2Pixels distills noisy SAM pseudo-masks into a compact DINOv2-based student with auxiliary localization and one-sided self-correction, delivering +6.97 anomaly mIoU and +9.71 binary IoU gains over baselines on wind turbine data with 80% fewer parameters.

  • Do Vision Language Models Need to Process Image Tokens? cs.CV · 2026-04-10 · unverdicted · none · ref 4

    Visual representations in VLMs converge quickly to stable low-complexity forms while text continues evolving, with task-dependent needs for sustained image token access.