A scalable training-free pipeline using video segmentation, filtering, and off-the-shelf multimodal models creates DenseStep2M, a dataset of 100K videos and 2M detailed instructional steps that improves dense captioning, step grounding, and cross-modal retrieval.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Blink enables CPU-free LLM inference via SmartNIC offload and persistent GPU kernel, delivering up to 8.47x lower P99 TTFT, 3.4x lower P99 TPOT, 2.1x higher decode throughput, and 48.6% lower energy per token while remaining stable under CPU interference.
DFLOP is a data-driven framework that profiles data-induced computation variance and uses predictive scheduling to balance workloads in multimodal LLM training pipelines, claiming up to 3.6x faster training than existing frameworks.
citing papers explorer
-
DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation
A scalable training-free pipeline using video segmentation, filtering, and off-the-shelf multimodal models creates DenseStep2M, a dataset of 100K videos and 2M detailed instructional steps that improves dense captioning, step grounding, and cross-modal retrieval.
-
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC
Blink enables CPU-free LLM inference via SmartNIC offload and persistent GPU kernel, delivering up to 8.47x lower P99 TTFT, 3.4x lower P99 TPOT, 2.1x higher decode throughput, and 48.6% lower energy per token while remaining stable under CPU interference.
-
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization
DFLOP is a data-driven framework that profiles data-induced computation variance and uses predictive scheduling to balance workloads in multimodal LLM training pipelines, claiming up to 3.6x faster training than existing frameworks.