Temporal grounding of activities using multimodal large language models

Young Chol Song · 2024 · arXiv 2407.06157

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

DynFrame introduces tokenized learnable span-density retrieval and Segment-Decoupled GRPO in video MLLMs, achieving competitive or SOTA results on six benchmarks with 4B and 8B models.

citing papers explorer

Showing 1 of 1 citing paper.

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding cs.CV · 2026-05-26 · unverdicted · none · ref 27
DynFrame introduces tokenized learnable span-density retrieval and Segment-Decoupled GRPO in video MLLMs, achieving competitive or SOTA results on six benchmarks with 4B and 8B models.

Temporal grounding of activities using multimodal large language models

fields

years

verdicts

representative citing papers

citing papers explorer