Memory-efficient streaming VideoLLMs for real-time procedural video understanding

Dibyadip Chatterjee, Edoardo Remelli, Yale Song, Bugra Tekin, Abhay Mittal, Bharat Bhatnagar, Necati Cihan Camg ˜Ak ¸z, Shreyas Hampali, Eric Sauser, Shugao Ma, et al · 2025 · arXiv 2504.13915

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Personal Visual Context Learning in Large Multimodal Models

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.

MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

cs.CV · 2026-05-21 · conditional · novelty 5.0

MuKV adds multi-grained KV cache compression at patch-frame-segment levels plus semi-hierarchical retrieval to raise accuracy and cut memory in long video question-answering.

Decouple and Cache: KV Cache Construction for Streaming Video Understanding

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

DSCache decouples cumulative past and instant KV caches with position-agnostic encoding to adapt offline VideoVLLMs to streaming video, delivering 2.5% average accuracy gains on QA benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Personal Visual Context Learning in Large Multimodal Models cs.CV · 2026-05-11 · unverdicted · none · ref 11
Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering cs.CV · 2026-05-21 · conditional · none · ref 4
MuKV adds multi-grained KV cache compression at patch-frame-segment levels plus semi-hierarchical retrieval to raise accuracy and cut memory in long video question-answering.
Decouple and Cache: KV Cache Construction for Streaming Video Understanding cs.CV · 2026-05-03 · unverdicted · none · ref 2
DSCache decouples cumulative past and instant KV caches with position-agnostic encoding to adapt offline VideoVLLMs to streaming video, delivering 2.5% average accuracy gains on QA benchmarks.

Memory-efficient streaming VideoLLMs for real-time procedural video understanding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer