Contrastive prefence learning: Learning from human feedback without rl

Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W Bradley Knox, Dorsa Sadigh · 2024 · arXiv 2310.13639

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

LOGICA adds context to pretrained biological LMs via logit-space contrastive alignment with gated adapters, improving AUC on held-out drug-resistance mutation ranking from ~0.55 to ~0.65 while preserving token likelihoods.

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

cs.RO · 2026-06-10 · unverdicted · novelty 6.0

UniIntervene uses future-conditioned action-value estimation and a temporal value-risk critic to trigger memory-based recovery interventions, reporting 8.6% higher success rates and 57% fewer human interventions than prior HiL-RL methods on real manipulation tasks.

A Regret Minimization Framework on Preference Learning in Large Language Models

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

RePO reframes RLHF through regret minimization by modeling preferences as behavior-conditioned relative suboptimality assessments and reports performance gains on reasoning and preference benchmarks.

Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach

cs.LG · 2024-11-01 · unverdicted · novelty 6.0

DIPPER uses bi-level optimization and DPO to train the higher-level policy from stationary preference comparisons and value regularization, claiming up to 40% gains on robotic navigation and manipulation tasks while introducing metrics for non-stationarity and infeasible subgoals.

citing papers explorer

Showing 1 of 1 citing paper after filters.

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning cs.RO · 2026-06-10 · unverdicted · none · ref 15
UniIntervene uses future-conditioned action-value estimation and a temporal value-risk critic to trigger memory-based recovery interventions, reporting 8.6% higher success rates and 57% fewer human interventions than prior HiL-RL methods on real manipulation tasks.

Contrastive prefence learning: Learning from human feedback without rl

fields

years

verdicts

representative citing papers

citing papers explorer