pith. machine review for the scientific record.
sign in

arxiv: 1710.03641 · v2 · pith:EEGH5BPBnew · submitted 2017-10-10 · 💻 cs.LG · cs.AI

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

classification 💻 cs.LG cs.AI
keywords adaptationcontinuousmeta-learningcompetitiveenvironmentslearnnonstationaryability
0
0 comments X
read the original abstract

Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control

    eess.SY 2026-01 unverdicted novelty 7.0

    SODACER uses fast and slow buffers with adaptive clustering for experience replay in safe RL, integrated with CBFs and Sophia optimizer to achieve faster convergence and safety on nonlinear systems like HPV transmission.

  2. An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning

    cs.LG 2025-10 unverdicted novelty 5.0

    The work establishes OOD generalization bounds for meta-supervised learning and meta-RL that exploit MDP structure, then analyzes a gradient-based meta-RL algorithm.