pith. sign in

hub Canonical reference

In: Findings of the Association for Computational Linguistics: ACL 2023

Canonical reference. 83% of citing Pith papers cite this work as background.

12 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 other 1

citation-polarity summary

polarities

background 5 unclear 1

representative citing papers

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16 · unverdicted · novelty 2.0

The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

citing papers explorer

Showing 12 of 12 citing papers.