Dynamicvit: Efficient vision transformers with dynamic token sparsification.Advances in neural information processing systems, 34:13937–13949

Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh · 2021

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

EventPrune: Cascaded Event-Assisted Token Pruning for Efficient First-Person Dynamic Spatial Reasoning

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

EventPrune prunes 80% of visual tokens in Video-LLMs using event camera motion cues, yielding 1.89x speedup, 52% fewer GFLOPs, and slightly higher accuracy than full-token baselines on first-person dynamic spatial reasoning.

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% for high-resolution images in MLLMs via slice-based encoding plus intra-ViT early compression while matching or exceeding baseline performance on document, OCR, and VQA benchmarks.

ASAP: Attention Sink Anchored Pruning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

ASAP prunes tokens in ViTs by anchoring on attention sinks modeled as lazy random walks, using cumulative transition matrices and radial diffusion clustering to compress redundancy while preserving accuracy.

citing papers explorer

Showing 3 of 3 citing papers.

EventPrune: Cascaded Event-Assisted Token Pruning for Efficient First-Person Dynamic Spatial Reasoning cs.CV · 2026-05-19 · unverdicted · none · ref 27
EventPrune prunes 80% of visual tokens in Video-LLMs using event camera motion cues, yielding 1.89x speedup, 52% fewer GFLOPs, and slightly higher accuracy than full-token baselines on first-person dynamic spatial reasoning.
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? cs.CV · 2026-05-09 · unverdicted · none · ref 36
LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% for high-resolution images in MLLMs via slice-based encoding plus intra-ViT early compression while matching or exceeding baseline performance on document, OCR, and VQA benchmarks.
ASAP: Attention Sink Anchored Pruning cs.LG · 2026-05-21 · unverdicted · none · ref 8
ASAP prunes tokens in ViTs by anchoring on attention sinks modeled as lazy random walks, using cumulative transition matrices and radial diffusion clustering to compress redundancy while preserving accuracy.

Dynamicvit: Efficient vision transformers with dynamic token sparsification.Advances in neural information processing systems, 34:13937–13949

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer