IRIS is a neuron-based Identify-Retrieve-Steer method for situational personality control in LLMs that outperforms baselines on PersonalityBench and the new SPBench.
hub Mixed citations
The flan collection: Designing data and methods for effective instruction tuning
Mixed citation behavior. Most common role is background (43%).
abstract
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/google-research/FLAN/tree/main/flan/v2.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.
Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.
Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.
MetaMoE unifies domain-specialized experts into a single MoE via diversity-aware public proxy selection that approximates private data distributions for router training and expert alignment.
Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
Kosmos-2 grounds text to image regions by encoding refer expressions as Markdown links to sequences of location tokens and trains on a new GrIT dataset of grounded image-text pairs.
GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
HuggingGPT is an agent system where ChatGPT plans and orchestrates calls to Hugging Face models to solve complex multi-modal AI tasks.
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.
ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.
AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.
citing papers explorer
-
Beyond Static Personas: Situational Personality Steering for Large Language Models
IRIS is a neuron-based Identify-Retrieve-Steer method for situational personality control in LLMs that outperforms baselines on PersonalityBench and the new SPBench.
-
QLoRA: Efficient Finetuning of Quantized LLMs
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
-
WizardLM: Empowering large pre-trained language models to follow complex instructions
WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.
-
Language Is Not All You Need: Aligning Perception with Language Models
Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.
-
Scaling Data-Constrained Language Models
Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.
-
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
MetaMoE unifies domain-specialized experts into a single MoE via diversity-aware public proxy selection that approximates private data distributions for router training and expert alignment.
-
Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation
Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.
-
The Falcon Series of Open Language Models
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2 grounds text to image regions by encoding refer expressions as Markdown links to sequences of location tokens and trains on a new GrIT dataset of grounded image-text pairs.
-
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.
-
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
-
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
-
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
HuggingGPT is an agent system where ChatGPT plans and orchestrates calls to Hugging Face models to solve complex multi-modal AI tasks.
-
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.
-
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.
-
AppAgent: Multimodal Agents as Smartphone Users
AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
A Survey on Knowledge Distillation of Large Language Models
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
-
A Survey of Large Language Models
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
-
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.