SAVEMem improves streaming video understanding scores by adding semantic awareness to memory compression and query-adaptive retrieval without any model training.
Videollm-online: Online video large language model for streaming video
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Introduces the first dedicated benchmark for live multi-modal LLM task guidance with mistake detection and a streaming baseline model.
Cambrian-S introduces VSI-SUPER benchmarks for long-horizon spatial recall and counting, shows data scaling yields 30% gains on existing tests, and demonstrates a self-supervised next-latent predictor using surprise outperforms baselines on the new spatial supersensing tasks.
citing papers explorer
-
Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding
SAVEMem improves streaming video understanding scores by adding semantic awareness to memory compression and query-adaptive retrieval without any model training.
-
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Introduces the first dedicated benchmark for live multi-modal LLM task guidance with mistake detection and a streaming baseline model.
-
Cambrian-S: Towards Spatial Supersensing in Video
Cambrian-S introduces VSI-SUPER benchmarks for long-horizon spatial recall and counting, shows data scaling yields 30% gains on existing tests, and demonstrates a self-supervised next-latent predictor using surprise outperforms baselines on the new spatial supersensing tasks.
- Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback