Groundinggpt: Language enhanced multi-modal grounding model

Li, Z · 2024 · arXiv 2401.06071

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

DART: Difficulty-Adaptive Routing for Zero-Shot Video Temporal Grounding

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

DART routes zero-shot video temporal grounding queries by difficulty using DPP entropy, achieving up to 3.5 mIoU gains with 7x fewer frames on Charades-STA and ActivityNet Captions.

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning

cs.CV · 2025-07-01 · unverdicted · novelty 4.0

A pipeline of chain-of-thought data synthesis, LoRA-based supervised fine-tuning, rejection sampling, and rule-based reinforcement learning raises multi-image grounding accuracy by 9.04% on MIG-Bench and 4.41% on average across seven other benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

DART: Difficulty-Adaptive Routing for Zero-Shot Video Temporal Grounding cs.CV · 2026-07-01 · unverdicted · none · ref 34
DART routes zero-shot video temporal grounding queries by difficulty using DPP entropy, achieving up to 3.5 mIoU gains with 7x fewer frames on Charades-STA and ActivityNet Captions.
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning cs.CV · 2025-07-01 · unverdicted · none · ref 20
A pipeline of chain-of-thought data synthesis, LoRA-based supervised fine-tuning, rejection sampling, and rule-based reinforcement learning raises multi-image grounding accuracy by 9.04% on MIG-Bench and 4.41% on average across seven other benchmarks.

Groundinggpt: Language enhanced multi-modal grounding model

fields

years

verdicts

representative citing papers

citing papers explorer