Title resolution pending

Training language models with language feedback · 2023 · arXiv 2204.14146

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification

cs.LG · 2026-01-07 · unverdicted · novelty 6.0

R³L combines reflect-then-retry exploration, pivotal credit assignment, and positive amplification in RL for LLMs, reporting 5-52% relative gains on agentic and reasoning tasks with stable training.

Aligning Text-to-Image Models using Human Feedback

cs.LG · 2023-02-23 · unverdicted · novelty 6.0

A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.

Self-Refine: Iterative Refinement with Self-Feedback

cs.CL · 2023-03-30 · unverdicted · novelty 5.0

Self-Refine boosts LLM outputs by ~20% on average across seven tasks by having the same model iteratively generate, critique, and refine its own responses.

citing papers explorer

Showing 3 of 3 citing papers.

R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification cs.LG · 2026-01-07 · unverdicted · none · ref 3
R³L combines reflect-then-retry exploration, pivotal credit assignment, and positive amplification in RL for LLMs, reporting 5-52% relative gains on agentic and reasoning tasks with stable training.
Aligning Text-to-Image Models using Human Feedback cs.LG · 2023-02-23 · unverdicted · none · ref 17
A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.
Self-Refine: Iterative Refinement with Self-Feedback cs.CL · 2023-03-30 · unverdicted · none · ref 38
Self-Refine boosts LLM outputs by ~20% on average across seven tasks by having the same model iteratively generate, critique, and refine its own responses.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer