pith. sign in

Title resolution pending

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 3 2025 1

verdicts

UNVERDICTED 4

representative citing papers

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16 · unverdicted · novelty 2.0

The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

citing papers explorer

Showing 4 of 4 citing papers.