Title resolution pending

Hu, Y · 2022 · arXiv 2211.09699

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models

cs.CV · 2026-05-03 · unverdicted · novelty 7.0 · 2 refs

GEASS is a logit-level gating module that selectively trusts generated captions in VLMs per query by combining clean-path confidence, entropy reduction, and pathway disagreement, improving results on POPE and HallusionBench across four models.

TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

cs.AI · 2026-04-04 · conditional · novelty 7.0

TableVision benchmark shows explicit spatial grounding recovers MLLM reasoning on hierarchical tables, delivering 12.3% accuracy improvement through a decoupled perception-reasoning framework.

ViperGPT: Visual Inference via Python Execution for Reasoning

cs.CV · 2023-03-14 · unverdicted · novelty 7.0

ViperGPT generates executable Python code to compose pre-trained vision-and-language modules into programs that answer visual queries, reaching state-of-the-art results with no additional training.

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning

cs.CV · 2026-04-12 · unverdicted · novelty 6.0 · 3 refs

Visual replay module and adaptive depth scaling improve multimodal latent reasoning, reaching SOTA benchmarks with faster inference than explicit chain-of-thought methods.

BLINK: Multimodal Large Language Models Can See but Not Perceive

cs.CV · 2024-04-18 · accept · novelty 6.0

BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

cs.CV · 2023-03-20 · unverdicted · novelty 6.0

MM-REACT uses textual prompts to let ChatGPT collaborate with external vision experts for zero-shot multimodal reasoning and action on advanced visual tasks.

REPLUG: Retrieval-Augmented Black-Box Language Models

cs.CL · 2023-01-30 · conditional · novelty 6.0

REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.

citing papers explorer

Showing 1 of 1 citing paper after filters.

BLINK: Multimodal Large Language Models Can See but Not Perceive cs.CV · 2024-04-18 · accept · none · ref 38
BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer