Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
APE generates instruction candidates via LLM and selects the best by zero-shot performance of a second LLM, matching or beating human prompts on 19 of 24 NLP tasks.
Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.
citing papers explorer
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
-
Large Language Models Are Human-Level Prompt Engineers
APE generates instruction candidates via LLM and selects the best by zero-shot performance of a second LLM, matching or beating human prompts on 19 of 24 NLP tasks.
-
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.