In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

Zhou, B · 2017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

AdaVFM integrates neural architecture search into vision foundation model backbones and uses a cloud multimodal LLM agent to enable runtime-adaptive lightweight subnet execution, delivering up to 7.9% higher accuracy and 77.9% lower FLOPs than fixed-size baselines on edge devices.

Accelerating Vision Foundation Models with Drop-in Depthwise Convolution

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Replacing selected attention heads in pretrained ViTs with depthwise convolutions, identified by simple strategies and recovered via fine-tuning, delivers 17-20% inference speedup on image tasks with minimal accuracy loss.

Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions

cs.CV · 2025-09-17 · unverdicted · novelty 5.0

STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.

citing papers explorer

Showing 3 of 3 citing papers.

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution cs.CV · 2026-04-17 · unverdicted · none · ref 77
AdaVFM integrates neural architecture search into vision foundation model backbones and uses a cloud multimodal LLM agent to enable runtime-adaptive lightweight subnet execution, delivering up to 7.9% higher accuracy and 77.9% lower FLOPs than fixed-size baselines on edge devices.
Accelerating Vision Foundation Models with Drop-in Depthwise Convolution cs.CV · 2026-05-21 · unverdicted · none · ref 50
Replacing selected attention heads in pretrained ViTs with depthwise convolutions, identified by simple strategies and recovered via fine-tuning, delivers 17-20% inference speedup on image tasks with minimal accuracy loss.
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions cs.CV · 2025-09-17 · unverdicted · none · ref 65
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

fields

years

verdicts

representative citing papers

citing papers explorer