pith. sign in

Lawrence Zitnick, and Devi Parikh

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 2 dataset 1

citation-polarity summary

fields

cs.CL 2 cs.CV 2

polarities

background 3

representative citing papers

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

cs.CV · 2026-04-28 · conditional · novelty 6.0 · 2 refs

SIEVES improves selective prediction coverage by up to 3x on OOD VQA benchmarks by training a selector to score the quality of visual evidence produced by reasoner models, generalizing across benchmarks and proprietary models without internal access or per-task retraining.

Multilingual Vision-Language Models, A Survey

cs.CL · 2025-09-26 · accept · novelty 3.0

The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.

citing papers explorer

Showing 4 of 4 citing papers.

  • MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark cs.CL · 2024-09-04 · accept · none · ref 4

    MMMU-Pro is a stricter multimodal benchmark that removes text-only solvable questions, augments options, and requires reading text from images, yielding substantially lower model scores of 16.8-26.9%.

  • Evaluating Object Hallucination in Large Vision-Language Models cs.CV · 2023-05-17 · accept · none · ref 4

    Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.

  • SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring cs.CV · 2026-04-28 · conditional · none · ref 2 · 2 links

    SIEVES improves selective prediction coverage by up to 3x on OOD VQA benchmarks by training a selector to score the quality of visual evidence produced by reasoner models, generalizing across benchmarks and proprietary models without internal access or per-task retraining.

  • Multilingual Vision-Language Models, A Survey cs.CL · 2025-09-26 · accept · none · ref 6

    The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.