Improving generalization in meta reinforcement learning using learned objectives

Louis Kirsch, Sjoerd van Steenkiste, J¨ urgen Schmidhuber · 1910 · arXiv 1910.04098

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

cs.AI · 2024-08-12 · unverdicted · novelty 8.0

The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

cs.CL · 2025-09-17 · unverdicted · novelty 6.0

ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning

cs.LG · 2025-10-27 · unverdicted · novelty 5.0

The work establishes OOD generalization bounds for meta-supervised learning and meta-RL that exploit MDP structure, then analyzes a gradient-based meta-RL algorithm.

citing papers explorer

Showing 3 of 3 citing papers.

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery cs.AI · 2024-08-12 · unverdicted · none · ref 52
The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution cs.CL · 2025-09-17 · unverdicted · none · ref 94
ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning cs.LG · 2025-10-27 · unverdicted · none · ref 8
The work establishes OOD generalization bounds for meta-supervised learning and meta-RL that exploit MDP structure, then analyzes a gradient-based meta-RL algorithm.

Improving generalization in meta reinforcement learning using learned objectives

fields

years

verdicts

representative citing papers

citing papers explorer