IF-VidCap: Can video caption models follow instructions?

· 2025 · arXiv 2510.18726

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

VCIFBench: Evaluating Complex Instruction Following for Video Understanding

cs.CL · 2026-06-03 · unverdicted · novelty 5.0

VCIFBench provides 306 test instructions, a 540-pair DPO dataset, and a conflict diagnostic set to evaluate complex constraint satisfaction in video MLLMs, finding it challenging and showing DPO training helps.

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

cs.CV · 2026-06-05 · unverdicted · novelty 4.0

This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

citing papers explorer

Showing 2 of 2 citing papers.

VCIFBench: Evaluating Complex Instruction Following for Video Understanding cs.CL · 2026-06-03 · unverdicted · none · ref 2
VCIFBench provides 306 test instructions, a 540-pair DPO dataset, and a conflict diagnostic set to evaluate complex constraint satisfaction in video MLLMs, finding it challenging and showing DPO training helps.
Watch, Remember, Reason: Human-View Video Understanding with MLLMs cs.CV · 2026-06-05 · unverdicted · none · ref 100
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

IF-VidCap: Can video caption models follow instructions?

fields

years

verdicts

representative citing papers

citing papers explorer