A survey of generalisation in deep reinforcement learning.arXiv preprint arXiv:2111.09794, 2023

Roberta Kirk, Amy Zhang, Edward Grefenstette, Tim Rocktäschel · 2023 · arXiv 2111.09794

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving

cs.RO · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.

citing papers explorer

Showing 2 of 2 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 180
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving cs.RO · 2026-05-13 · unverdicted · none · ref 25 · 2 links
MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.

A survey of generalisation in deep reinforcement learning.arXiv preprint arXiv:2111.09794, 2023

fields

years

verdicts

representative citing papers

citing papers explorer