pith. sign in

hub Mixed citations

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Mixed citation behavior. Most common role is background (64%).

24 Pith papers citing it
Background 64% of classified citations
abstract

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) to address this problem, where generative models are fine-tuned with RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples. Our studies show that RAFT can effectively improve the model performance in both reward learning and other automated metrics in both large language models and diffusion models.

hub tools

citation-role summary

background 6 method 3 baseline 1 other 1

citation-polarity summary

representative citing papers

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

DISA: Offline Importance Sampling for Distribution-Matching LLM-RL

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

DISA decouples partition function estimation using offline importance sampling for distribution-matching LLM-RL, matching or exceeding online baselines like FlowRL on math and code benchmarks while retaining more strategy diversity.

AlignCultura: Towards Culturally Aligned Large Language Models?

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.

Bias at the End of the Score

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.

Improving Video Generation with Human Feedback

cs.CV · 2025-01-23 · unverdicted · novelty 6.0

A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.

Reinforced Self-Training (ReST) for Language Modeling

cs.CL · 2023-08-17 · unverdicted · novelty 6.0

ReST improves LLM translation quality on benchmarks via offline RL on self-generated data, achieving gains in a compute-efficient way compared to typical RLHF.

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

Failure Modes of Maximum Entropy RLHF

cs.LG · 2025-09-24 · unverdicted · novelty 5.0

Derives SimPO from MaxEnt RL and reports that MaxEnt RL in online RLHF exhibits frequent overoptimization and unstable KL dynamics across scales, unlike stable KL-constrained baselines.

TrustLLM: Trustworthiness in Large Language Models

cs.CL · 2024-01-10 · unverdicted · novelty 5.0

TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

citing papers explorer

Showing 24 of 24 citing papers.

  • Flow-GRPO: Training Flow Matching Models via Online RL cs.CV · 2025-05-08 · unverdicted · none · ref 37 · internal anchor

    Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

  • DISA: Offline Importance Sampling for Distribution-Matching LLM-RL cs.LG · 2026-05-17 · unverdicted · none · ref 26 · internal anchor

    DISA decouples partition function estimation using offline importance sampling for distribution-matching LLM-RL, matching or exceeding online baselines like FlowRL on math and code benchmarks while retaining more strategy diversity.

  • TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching cs.CL · 2026-05-12 · unverdicted · none · ref 106 · 2 links · internal anchor

    TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.

  • Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs cs.CV · 2026-05-10 · unverdicted · none · ref 7 · internal anchor

    PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.

  • Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation cs.IR · 2026-05-06 · conditional · none · ref 13 · internal anchor

    BLADE uses Bayesian list-wise alignment with dynamic estimation to create a self-evolving target that overcomes limitations of static references in LLM-based recommendation, yielding sustained gains in ranking and complex metrics.

  • Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards cs.CV · 2026-03-01 · unverdicted · none · ref 15 · internal anchor

    SOLACE improves text-to-image generation by using intrinsic self-confidence rewards from noise reconstruction accuracy during reinforcement learning post-training without external supervision.

  • CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators cs.AI · 2026-05-09 · unverdicted · none · ref 42 · internal anchor

    CauSim turns scarce causal reasoning labels into scalable supervised data by having LLMs incrementally construct complex executable structural causal models.

  • Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 48 · internal anchor

    Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

  • AlignCultura: Towards Culturally Aligned Large Language Models? cs.CL · 2026-04-21 · unverdicted · none · ref 20 · internal anchor

    Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.

  • Bias at the End of the Score cs.CV · 2026-04-14 · unverdicted · none · ref 13 · internal anchor

    Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.

  • Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training cs.LG · 2025-09-03 · unverdicted · none · ref 6 · internal anchor

    PROF curates RL training data via PRM-ORM consistency to improve both final-answer accuracy and intermediate reasoning quality while reducing reliance on strong process reward models.

  • Improving Video Generation with Human Feedback cs.CV · 2025-01-23 · unverdicted · none · ref 14 · internal anchor

    A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.

  • Directly Fine-Tuning Diffusion Models on Differentiable Rewards cs.CV · 2023-09-29 · conditional · none · ref 6 · internal anchor

    DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.

  • Reinforced Self-Training (ReST) for Language Modeling cs.CL · 2023-08-17 · unverdicted · none · ref 7 · internal anchor

    ReST improves LLM translation quality on benchmarks via offline RL on self-generated data, achieving gains in a compute-efficient way compared to typical RLHF.

  • Goal-Conditioned Supervised Learning for LLM Fine-Tuning cs.LG · 2026-05-08 · unverdicted · none · ref 6 · internal anchor

    GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.

  • ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 67 · internal anchor

    ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

  • Failure Modes of Maximum Entropy RLHF cs.LG · 2025-09-24 · unverdicted · none · ref 14 · internal anchor

    Derives SimPO from MaxEnt RL and reports that MaxEnt RL in online RLHF exhibits frequent overoptimization and unstable KL dynamics across scales, unlike stable KL-constrained baselines.

  • TrustLLM: Trustworthiness in Large Language Models cs.CL · 2024-01-10 · unverdicted · none · ref 94 · internal anchor

    TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

  • Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment cs.AI · 2023-08-10 · accept · none · ref 40 · internal anchor

    Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

  • Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation cs.IR · 2026-04-07 · unverdicted · none · ref 3 · internal anchor

    Curr-RLCER applies curriculum reinforcement learning with coherence-driven rewards to align generated explanations with predicted ratings in explainable recommendation systems.

  • A Survey on LLM-as-a-Judge cs.CL · 2024-11-23 · unverdicted · none · ref 29 · internal anchor

    A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

  • A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 115 · internal anchor

    A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

  • Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey cs.CR · 2024-09-26 · unverdicted · none · ref 36 · internal anchor

    Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.

  • A Comprehensive Overview of Large Language Models cs.CL · 2023-07-12 · unverdicted · none · ref 169 · internal anchor

    A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.