pith. sign in

The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CL 1 cs.LG 1

years

2024 1 2023 1

roles

background 1

polarities

background 1

representative citing papers

Zephyr: Direct Distillation of LM Alignment

cs.LG · 2023-10-25 · accept · novelty 6.0

Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.

citing papers explorer

Showing 2 of 2 citing papers.

  • The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024-06-25 · unverdicted · none · ref 51

    FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.

  • Zephyr: Direct Distillation of LM Alignment cs.LG · 2023-10-25 · accept · none · ref 74

    Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.