ReMoT: Reinforcement Learning with Motion Contrast Triplets

· 2026 · cs.CV · arXiv 2603.00461

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We present ReMoT, a unified training paradigm to systematically address the fundamental shortcomings of VLMs in spatio-temporal consistency -- a critical failure point in navigation, robotics, and autonomous driving. ReMoT integrates two core components: (1) A rule-based automatic framework that generates ReMoT-16K, a large-scale (16.5K triplets) motion-contrast dataset derived from video meta-annotations, surpassing costly manual or model-based generation. (2) Group Relative Policy Optimization, which we empirically validate yields optimal performance and data efficiency for learning this contrastive reasoning, far exceeding standard Supervised Fine-Tuning. We also construct the first benchmark for fine-grained motion contrast triplets to measure a VLM's discrimination of subtle motion attributes (e.g., opposing directions). The resulting model achieves state-of-the-art performance on our new benchmark and multiple standard VLM benchmarks, culminating in a remarkable 25.1% performance leap on spatio-temporal reasoning tasks.

representative citing papers

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

ProSR adds a Counterfactual Invariance Penalty and a Tail Drift Penalty to shape VLM reasoning trajectories for better visual dependence and stability on spatial tasks.

citing papers explorer

Showing 1 of 1 citing paper.

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs cs.CV · 2026-05-25 · unverdicted · none · ref 7 · internal anchor
ProSR adds a Counterfactual Invariance Penalty and a Tail Drift Penalty to shape VLM reasoning trajectories for better visual dependence and stability on spatial tasks.

ReMoT: Reinforcement Learning with Motion Contrast Triplets

fields

years

verdicts

representative citing papers

citing papers explorer