pith. sign in

hub Mixed citations

Videorft: Incentivizing video reasoning capability in mllms via reinforced fine-tuning

Mixed citation behavior. Most common role is background (60%).

19 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 4 baseline 1

citation-polarity summary

fields

cs.CV 18 cs.AI 1

years

2026 13 2025 6

clear filters

representative citing papers

MetaphorVU: Towards Metaphorical Video Understanding

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

Introduces the first benchmark for metaphorical video understanding, identifies MLLM weaknesses in cross-domain mapping, and proposes an inference-time enhancement using a knowledge graph.

AdaTooler-V: Adaptive Tool-Use for Images and Videos

cs.CV · 2025-12-18 · conditional · novelty 6.0

AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.

VISD: Enhancing Video Reasoning via Structured Self-Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 5.0 · 4 refs

VISD proposes structured self-distillation with a multi-dimensional judge model and direction-magnitude decoupling to improve token-level credit assignment and convergence speed in VideoLLM reasoning training.

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

cs.CV · 2026-06-05 · unverdicted · novelty 4.0

This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

citing papers explorer

Showing 13 of 13 citing papers after filters.