Language–image consistency augmentation and distillation network for visual grounding.Pattern Recognition, 166:111663

Xiao Ke, Peirong Xu, Wenzhong Guo · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.

citing papers explorer

Showing 1 of 1 citing paper.

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs cs.CV · 2025-05-21 · unverdicted · none · ref 19
Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.

Language–image consistency augmentation and distillation network for visual grounding.Pattern Recognition, 166:111663

fields

years

verdicts

representative citing papers

citing papers explorer