Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al · 1901

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

TransitLM is a large-scale dataset and benchmark for training LLMs to generate structurally valid map-free transit routes from origin-destination pairs.

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.

Jailbroken: How Does LLM Safety Training Fail?

cs.LG · 2023-07-05 · unverdicted · novelty 6.0

LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.

Large Language Models for Sequential Decision-Making: Improving In-Context Learning via Supervised Fine-Tuning

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Supervised fine-tuning of pretrained LLMs on offline trajectories yields better few-shot sequential decision-making than in-context-only baselines, with a theoretical suboptimality bound derived for linear MDPs by interpreting attention as Q-function estimation.

citing papers explorer

Showing 4 of 4 citing papers.

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation cs.CL · 2026-05-21 · unverdicted · none · ref 4
TransitLM is a large-scale dataset and benchmark for training LLMs to generate structurally valid map-free transit routes from origin-destination pairs.
Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning cs.LG · 2026-05-11 · unverdicted · none · ref 25
METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.
Jailbroken: How Does LLM Safety Training Fail? cs.LG · 2023-07-05 · unverdicted · none · ref 12
LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.
Large Language Models for Sequential Decision-Making: Improving In-Context Learning via Supervised Fine-Tuning cs.LG · 2026-05-09 · unverdicted · none · ref 6
Supervised fine-tuning of pretrained LLMs on offline trajectories yields better few-shot sequential decision-making than in-context-only baselines, with a theoretical suboptimality bound derived for linear MDPs by interpreting attention as Q-function estimation.

Language models are few-shot learners

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer