pith. machine review for the scientific record. sign in

arxiv: 2301.13688 · v2 · pith:YWXZK3YDnew · submitted 2023-01-31 · 💻 cs.AI · cs.CL· cs.LG

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

classification 💻 cs.AI cs.CLcs.LG
keywords flaninstructionmethodstuningcollectionsettingstasksavailable
0
0 comments X
read the original abstract

We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/google-research/FLAN/tree/main/flan/v2.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond Static Personas: Situational Personality Steering for Large Language Models

    cs.CL 2026-04 unverdicted novelty 7.0

    IRIS is a neuron-based Identify-Retrieve-Steer method for situational personality control in LLMs that outperforms baselines on PersonalityBench and the new SPBench.

  2. QLoRA: Efficient Finetuning of Quantized LLMs

    cs.LG 2023-05 conditional novelty 7.0

    QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.

  3. WizardLM: Empowering large pre-trained language models to follow complex instructions

    cs.CL 2023-04 conditional novelty 7.0

    WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.

  4. Language Is Not All You Need: Aligning Perception with Language Models

    cs.CL 2023-02 conditional novelty 7.0

    Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.

  5. MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

    cs.LG 2026-05 unverdicted novelty 6.0

    MetaMoE unifies domain-specialized experts into a single MoE via diversity-aware public proxy selection that approximates private data distributions for router training and expert alignment.

  6. Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

    cs.CV 2025-05 unverdicted novelty 6.0

    Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interlea...

  7. The Falcon Series of Open Language Models

    cs.CL 2023-11 conditional novelty 6.0

    Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

  8. Kosmos-2: Grounding Multimodal Large Language Models to the World

    cs.CL 2023-06 unverdicted novelty 6.0

    Kosmos-2 grounds text to image regions by encoding refer expressions as Markdown links to sequences of location tokens and trains on a new GrIT dataset of grounded image-text pairs.

  9. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    cs.CL 2023-06 accept novelty 6.0

    GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.

  10. Scaling Data-Constrained Language Models

    cs.CL 2023-05 conditional novelty 6.0

    Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

  11. Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

    cs.CL 2023-05 conditional novelty 6.0

    UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

  12. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

    cs.AI 2023-03 conditional novelty 6.0

    CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.

  13. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

    cs.CL 2023-03 unverdicted novelty 6.0

    HuggingGPT is an agent system where ChatGPT plans and orchestrates calls to Hugging Face models to solve complex multi-modal AI tasks.

  14. Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

    cs.LG 2026-04 unverdicted novelty 5.0

    ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.

  15. AppAgent: Multimodal Agents as Smartphone Users

    cs.CV 2023-12 unverdicted novelty 5.0

    AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.

  16. PaLM 2 Technical Report

    cs.CL 2023-05 unverdicted novelty 5.0

    PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

  17. A Survey on Knowledge Distillation of Large Language Models

    cs.CL 2024-02 accept novelty 3.0

    A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.

  18. A Survey of Large Language Models

    cs.CL 2023-03 accept novelty 3.0

    This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.