RS-EoT uses a SocraticAgent self-play system and two-stage RL to train VLMs for genuine iterative reasoning and visual inspection on remote sensing VQA and grounding tasks, achieving SOTA results.
Easyr1: An efficient, scalable, multi-modality rl training framework.https:// github.com/hiyouga/EasyR1
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1