Ext5: Towards extreme multi-task scaling for transfer learning

Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q Tran, Dara Bahri, Jianmo Ni, et al · 2021 · arXiv 2111.10952

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

cs.LG · 2025-05-21 · unverdicted · novelty 6.0

Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

cs.LG · 2024-04-06 · conditional · novelty 6.0

Length-controlled AlpacaEval applies regression adjustment to remove length bias from LLM auto-evaluations, raising Spearman correlation with Chatbot Arena from 0.94 to 0.98.

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

cs.AI · 2023-01-31 · conditional · novelty 6.0

The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

PaLM 2 Technical Report

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0 · 2 refs

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

citing papers explorer

Showing 6 of 6 citing papers.

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning cs.LG · 2025-05-21 · unverdicted · none · ref 3
Entropy minimization on self-generated outputs elicits strong reasoning in pretrained LLMs, matching or exceeding supervised RL methods on benchmarks.
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators cs.LG · 2024-04-06 · conditional · none · ref 139
Length-controlled AlpacaEval applies regression adjustment to remove length bias from LLM auto-evaluations, raising Spearman correlation with Chatbot Arena from 0.94 to 0.98.
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning cs.AI · 2023-01-31 · conditional · none · ref 2
The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 210
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 299
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 68 · 2 links
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Ext5: Towards extreme multi-task scaling for transfer learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer