Streamchat: Chatting with streaming video

Liu, J · 2024 · arXiv 2412.08646

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Channel fusion gives better semantic grounding and QA performance in full-duplex LLM dialogue but is vulnerable to context corruption during interruptions, while cross-attention routing is more robust at the cost of weaker integration.

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?

cs.CV · 2025-11-27 · unverdicted · novelty 7.0

Introduces the first dedicated benchmark for live multi-modal LLM task guidance with mistake detection and a streaming baseline model.

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

Stream3D-VLM adds autoregressive streaming control, VSFI geometry integration, GAVC compression, and a 1M-pair benchmark to enable real-time 3D VLM performance that beats prior models on 29 online and offline tasks.

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

ProactiveLLM enables active interaction in streaming LLMs by learning semantic sufficiency cues from partial inputs through mask-based modeling and synchronized privileged self-distillation without external supervision.

StreamOV: Streaming Omni-Video Understanding via Evidence-Guided Memory and Response Triggering

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

StreamOV proposes evidence-guided long-short term memory and a hidden-state-driven trigger for efficient online audio-visual reasoning in streaming videos, along with the SOVBench benchmark for multi-turn evaluation.

CodecSight: Leveraging Video Codec Signals for Efficient Streaming VLM Inference

cs.DC · 2026-04-07 · unverdicted · novelty 6.0

CodecSight reuses video codec signals for online patch pruning before the vision transformer and selective KV-cache refresh in the LLM, delivering up to 3x higher throughput and 87% lower GPU compute than prior baselines with 0-8% F1 drop.

LiveVLN: Breaking the Stop-and-Go Loop in Vision-Language Navigation

cs.RO · 2026-04-21 · unverdicted · novelty 5.0

LiveVLN enables smoother vision-language navigation by overlapping action execution with ongoing observation processing, preserving benchmark scores while cutting real-world waiting time by up to 77.7 percent.

citing papers explorer

Showing 1 of 1 citing paper after filters.

LiveVLN: Breaking the Stop-and-Go Loop in Vision-Language Navigation cs.RO · 2026-04-21 · unverdicted · none · ref 28
LiveVLN enables smoother vision-language navigation by overlapping action execution with ongoing observation processing, preserving benchmark scores while cutting real-world waiting time by up to 77.7 percent.

Streamchat: Chatting with streaming video

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer