Frozen in time: A joint video and image encoder for end-to-end retrieval

[Bainet al · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning

cs.CV · 2026-01-21 · conditional · novelty 7.0

LFS learns to select temporally diverse and event-aware frames for video captioning by using direct feedback from frozen video-LLMs, yielding gains up to 2% on VDC and over 4% on the new ICH-CC benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

LFS: Learnable Frame Selector for Event-Aware and Temporally Diverse Video Captioning cs.CV · 2026-01-21 · conditional · none · ref 2
LFS learns to select temporally diverse and event-aware frames for video captioning by using direct feedback from frozen video-LLMs, yielding gains up to 2% on VDC and over 4% on the new ICH-CC benchmark.

Frozen in time: A joint video and image encoder for end-to-end retrieval

fields

years

verdicts

representative citing papers

citing papers explorer