pith. sign in

Video-r2: Reinforcing consistent and grounded reasoning in multimodal language models,

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CV 2

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

cs.CV · 2026-06-05 · unverdicted · novelty 4.0

This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning cs.CV · 2026-06-04 · unverdicted · none · ref 22

    VTI-CoT proposes a visual-textual interleaved chain-of-thought method for video reasoning, built via automated annotation and OCR compression, claiming SOTA performance and better training efficiency on same-scale models.

  • Watch, Remember, Reason: Human-View Video Understanding with MLLMs cs.CV · 2026-06-05 · unverdicted · none · ref 231

    This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.