pith. sign in

hub Canonical reference

Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 method 1

citation-polarity summary

representative citing papers

Alignment Dynamics in LLM Fine-Tuning

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

The paper introduces a dynamical model that decomposes alignment updates in LLM fine-tuning into rebound and driving forces and predicts a rehearsal priming effect.

Cat-DPO: Category-Adaptive Safety Alignment

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

Cat-DPO applies per-category adaptive safety margins during direct preference optimization to reduce variance in safety across harm categories.

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference

cs.CL · 2025-10-17 · unverdicted · novelty 5.0

POPI distills user preferences into reusable natural-language summaries via a shared inference model and conditions a generator on them, trained jointly with RL to improve personalization quality while cutting context length by up to 10x on benchmarks.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

citing papers explorer

Showing 18 of 18 citing papers.