ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.
citing papers explorer
-
ART: Automatic multi-step reasoning and tool-use for large language models
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
-
Aligning AI With Shared Human Values
Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.