EventPrune prunes 80% of visual tokens in Video-LLMs using event camera motion cues, yielding 1.89x speedup, 52% fewer GFLOPs, and slightly higher accuracy than full-token baselines on first-person dynamic spatial reasoning.
Snapkv: Llm knows what you are looking for before generation
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
FrameVGGT replaces token-level KV retention with frame-level segments and prototypes to bound memory while preserving geometric coherence in streaming VGGT.
Attention-Shifting uses importance-aware suppression on unlearning data and retention enhancement on retained data via dual-loss optimization to achieve selective unlearning with better utility preservation than prior methods.
CompactAttention accelerates chunked-prefill attention via Block-Union KV Selection, delivering up to 2.72x speedup at 128K context on LLaMA-3.1-8B while matching dense accuracy on RULER.
citing papers explorer
-
EventPrune: Cascaded Event-Assisted Token Pruning for Efficient First-Person Dynamic Spatial Reasoning
EventPrune prunes 80% of visual tokens in Video-LLMs using event camera motion cues, yielding 1.89x speedup, 52% fewer GFLOPs, and slightly higher accuracy than full-token baselines on first-person dynamic spatial reasoning.
-
FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT
FrameVGGT replaces token-level KV retention with frame-level segments and prototypes to bound memory while preserving geometric coherence in streaming VGGT.
-
Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting
Attention-Shifting uses importance-aware suppression on unlearning data and retention enhancement on retained data via dual-loss optimization to achieve selective unlearning with better utility preservation than prior methods.
-
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection
CompactAttention accelerates chunked-prefill attention via Block-Union KV Selection, delivering up to 2.72x speedup at 128K context on LLaMA-3.1-8B while matching dense accuracy on RULER.