Ultrafeedback: Boosting language models with high-quality feedback

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

DataComp-LM: In search of the next generation of training sets for language models

cs.LG · 2024-06-17 · unverdicted · novelty 6.0

DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

cs.LG · 2026-04-03 · unverdicted · novelty 5.0

Random sampling matches active preference learning on win-rate gains in online DPO yet both degrade benchmark performance, making active selection's overhead hard to justify.

InternLM2 Technical Report

cs.CL · 2024-03-26 · unverdicted · novelty 5.0

InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.

citing papers explorer

Showing 3 of 3 citing papers.

DataComp-LM: In search of the next generation of training sets for language models cs.LG · 2024-06-17 · unverdicted · none · ref 52
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs cs.LG · 2026-04-03 · unverdicted · none · ref 11
Random sampling matches active preference learning on win-rate gains in online DPO yet both degrade benchmark performance, making active selection's overhead hard to justify.
InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 30
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.

Ultrafeedback: Boosting language models with high-quality feedback

fields

years

verdicts

representative citing papers

citing papers explorer