pith. sign in

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CV 1 cs.LG 1

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Bernini: Latent Semantic Planning for Video Diffusion

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.

VDCook:DIY video data cook your MLLMs

cs.LG · 2026-03-04 · unverdicted · novelty 5.0

VDCook is an automated, self-evolving platform for generating in-domain video datasets for MLLMs via natural language queries, retrieval-synthesis, and multi-dimensional metadata.

citing papers explorer

Showing 2 of 2 citing papers.

  • Bernini: Latent Semantic Planning for Video Diffusion cs.CV · 2026-05-21 · unverdicted · none · ref 52

    Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.

  • VDCook:DIY video data cook your MLLMs cs.LG · 2026-03-04 · unverdicted · none · ref 15

    VDCook is an automated, self-evolving platform for generating in-domain video datasets for MLLMs via natural language queries, retrieval-synthesis, and multi-dimensional metadata.