pith. sign in

Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning.Neurocomputing, 508:293–304, 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CV 3

years

2026 2 2025 1

verdicts

UNVERDICTED 3

representative citing papers

InstrAct: Towards Action-Centric Understanding in Instructional Videos

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

InstrAction pretrains video foundation models using action-centric data filtering, hard negatives, an Action Perceiver module, DTW-Align, and Masked Action Modeling to reduce static bias and outperform prior models on a new InstrAct Bench for semantic, procedural, and retrieval tasks.

Calibrated Multimodal Representation Learning with Missing Modalities

cs.CV · 2025-11-15 · unverdicted · novelty 6.0

CalMRL mitigates anchor shift in multimodal representation learning by calibrating incomplete alignments through representation-level imputation of missing modalities using priors and a bi-step optimization with closed-form shared latent posteriors.

citing papers explorer

Showing 3 of 3 citing papers.

  • TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions cs.CV · 2026-04-30 · unverdicted · none · ref 18

    TransVLM formalizes Shot Transition Detection as identifying full temporal transition segments rather than single cut points and introduces a VLM that injects optical flow as a motion prior via simple feature fusion, plus a synthetic data engine and benchmark.

  • InstrAct: Towards Action-Centric Understanding in Instructional Videos cs.CV · 2026-04-09 · unverdicted · none · ref 19

    InstrAction pretrains video foundation models using action-centric data filtering, hard negatives, an Action Perceiver module, DTW-Align, and Masked Action Modeling to reduce static bias and outperform prior models on a new InstrAct Bench for semantic, procedural, and retrieval tasks.

  • Calibrated Multimodal Representation Learning with Missing Modalities cs.CV · 2025-11-15 · unverdicted · none · ref 21

    CalMRL mitigates anchor shift in multimodal representation learning by calibrating incomplete alignments through representation-level imputation of missing modalities using priors and a bi-step optimization with closed-form shared latent posteriors.