pith. machine review for the scientific record. sign in

Training language models to follow instructions with human feedback

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

fields

cs.LG 7

representative citing papers

Leveraging RAG for Training-Free Alignment of LLMs

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

RouteLLM: Learning to Route LLMs with Preference Data

cs.LG · 2024-06-26 · unverdicted · novelty 6.0

Router models trained on preference data dynamically select between strong and weak LLMs, cutting inference costs by more than 2x on benchmarks with no quality loss and showing transfer to new model pairs.

citing papers explorer

Showing 7 of 7 citing papers.