ProactiveLLM enables active interaction in streaming LLMs by learning semantic sufficiency cues from partial inputs through mask-based modeling and synchronized privileged self-distillation without external supervision.
arXiv preprint arXiv:2603.02872 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
ViCoStream is a new coordinated pipeline framework for streaming VideoLLMs that achieves 134 FPS video throughput and less than 50 ms TTFT on A100 while keeping accuracy near full-history baselines.
A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.
citing papers explorer
-
ViCoStream: Streaming VideoLLMs Can Run Beyond 100 FPS with Stage-Wise Coordinated Inference
ViCoStream is a new coordinated pipeline framework for streaming VideoLLMs that achieves 134 FPS video throughput and less than 50 ms TTFT on A100 while keeping accuracy near full-history baselines.
-
Toward Native Multimodal Modeling: A Roadmap
A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.