pith. sign in

2512.06201 , archiveprefix =

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 4 2025 1

verdicts

UNVERDICTED 5

roles

background 2

polarities

background 1 unclear 1

representative citing papers

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16 · unverdicted · novelty 2.0

The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

citing papers explorer

Showing 5 of 5 citing papers.