Training lan- guage models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

cs.CL · 2025-03-10 · unverdicted · novelty 6.0

A two-stage RL framework first boosts text reasoning in 3B LMMs then adapts it to multimodal inputs, producing modest benchmark gains of 4.5-4.8%.

Visual-RFT: Visual Reinforcement Fine-Tuning

cs.CV · 2025-03-03 · conditional · novelty 6.0

Visual-RFT applies reinforcement learning with verifiable perception rewards to improve large vision-language models on fine-grained classification, few-shot detection, and grounding tasks.

citing papers explorer

Showing 2 of 2 citing papers.

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL cs.CL · 2025-03-10 · unverdicted · none · ref 56
A two-stage RL framework first boosts text reasoning in 3B LMMs then adapts it to multimodal inputs, producing modest benchmark gains of 4.5-4.8%.
Visual-RFT: Visual Reinforcement Fine-Tuning cs.CV · 2025-03-03 · conditional · none · ref 25
Visual-RFT applies reinforcement learning with verifiable perception rewards to improve large vision-language models on fine-grained classification, few-shot detection, and grounding tasks.

Training lan- guage models to follow instructions with human feedback

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer