MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
Pith reviewed 2026-05-23 03:13 UTC · model grok-4.3
The pith
MAGELLAN equips LLM agents with online metacognitive predictions of learning progress to master large evolving goal spaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MAGELLAN is a metacognitive framework that lets LLM agents learn to predict their competence and LP online. By capturing semantic relationships between goals, MAGELLAN enables sample-efficient LP estimation and dynamic adaptation to evolving goal spaces through generalization. In an interactive learning environment, MAGELLAN improves LP prediction efficiency and goal prioritization, being the only method allowing the agent to fully master a large and evolving goal space.
What carries the argument
MAGELLAN, the metacognitive framework that trains the LLM agent to predict its own competence and learning progress by exploiting semantic relationships among goals.
If this is right
- Goal prioritization becomes more efficient because the agent avoids goals with low predicted progress.
- The agent adapts its curriculum automatically when new goals appear without requiring expert re-grouping.
- Learning progress estimation requires fewer environment samples than traditional methods.
- Full mastery of the entire goal space becomes achievable where other approaches plateau.
- Curriculum learning scales to open-ended, high-dimensional goal spaces.
Where Pith is reading between the lines
- The same semantic-prediction idea could be tested on non-LLM agents that have access to goal embeddings.
- If semantic generalization works here, it may reduce the need for hand-designed task taxonomies in other exploration settings.
- One could measure whether the metacognitive module itself improves when the agent is allowed to update its predictions after each episode.
- The approach raises the question of how robust the predictions remain when the underlying LLM is swapped for a different model.
Load-bearing premise
Semantic relationships inside the LLM can be used to predict the agent's actual competence and learning progress accurately enough to guide prioritization without needing extensive new samples or expert groupings.
What would settle it
Run the agent with MAGELLAN in the same interactive environment; if it still fails to fully master the goal space or if its predicted learning progress shows no reliable correlation with measured progress, the central claim does not hold.
Figures
read the original abstract
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM agents trained with online RL in high-dimensional and evolving goal spaces, a key challenge for LP prediction is modeling one's own competence, a form of metacognitive monitoring. Traditional approaches either require extensive sampling or rely on brittle expert-defined goal groupings. We introduce MAGELLAN, a metacognitive framework that lets LLM agents learn to predict their competence and LP online. By capturing semantic relationships between goals, MAGELLAN enables sample-efficient LP estimation and dynamic adaptation to evolving goal spaces through generalization. In an interactive learning environment, we show that MAGELLAN improves LP prediction efficiency and goal prioritization, being the only method allowing the agent to fully master a large and evolving goal space. These results demonstrate how augmenting LLM agents with a metacognitive ability for LP predictions can effectively scale curriculum learning to open-ended goal spaces.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MAGELLAN, a metacognitive framework for LLM-based autotelic agents that learns to predict its own competence and learning progress (LP) online. By leveraging semantic relationships between goals encoded in the LLM, the method enables sample-efficient LP estimation and dynamic prioritization in large, evolving goal spaces without relying on extensive sampling or expert-defined groupings. Experiments in an interactive learning environment demonstrate improved LP prediction efficiency and goal prioritization, with the claim that MAGELLAN is the only method allowing the agent to fully master the goal space.
Significance. If the empirical results on mastery and sample efficiency hold under rigorous controls, the work would provide a concrete demonstration of how metacognitive monitoring can scale curriculum learning for open-ended LLM agents, addressing a key bottleneck in autotelic exploration. The approach of online competence prediction via LLM semantics, if validated, could influence designs for agents operating in high-dimensional goal spaces.
major comments (2)
- [Abstract, §4] Abstract and §4 (Experiments): The claim that MAGELLAN is 'the only method allowing the agent to fully master a large and evolving goal space' is load-bearing for the central contribution, yet the abstract provides no quantitative details on mastery metrics (e.g., fraction of goals mastered), sample complexity curves, baseline failure modes, or statistical tests. Without these, it is impossible to verify whether the mastery gap arises from the metacognitive predictor or from other implementation differences.
- [§3, §4.2] §3 (MAGELLAN framework) and §4.2 (LP prediction): The key assumption that LLM semantic embeddings reliably encode competence-relevant similarities (rather than surface-level semantics) for generalization to new goals is stated but not directly tested. No ablation on embedding validation, nearest-neighbor analysis, or out-of-distribution goal performance is referenced, leaving the sample-efficiency advantage ungrounded.
minor comments (2)
- [Abstract] The abstract uses 'metacognitive monitoring' and 'LP prediction' without a brief definition or pointer to the formalization in §2; adding one sentence would improve accessibility.
- [Abstract] No mention of environment details (state space, goal generation process, or reward structure) appears in the abstract; these should be summarized in one sentence for context.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Experiments): The claim that MAGELLAN is 'the only method allowing the agent to fully master a large and evolving goal space' is load-bearing for the central contribution, yet the abstract provides no quantitative details on mastery metrics (e.g., fraction of goals mastered), sample complexity curves, baseline failure modes, or statistical tests. Without these, it is impossible to verify whether the mastery gap arises from the metacognitive predictor or from other implementation differences.
Authors: We agree that the abstract would be strengthened by quantitative support for the mastery claim. Section 4 reports that MAGELLAN reaches 100% goal mastery while all baselines plateau below 70%, with the difference attributable to the metacognitive predictor as shown by the ablations in §4.3. We will revise the abstract to include the mastery fractions, a reference to the sample-complexity results, and a note on the statistical comparisons performed in §4.4. revision: yes
-
Referee: [§3, §4.2] §3 (MAGELLAN framework) and §4.2 (LP prediction): The key assumption that LLM semantic embeddings reliably encode competence-relevant similarities (rather than surface-level semantics) for generalization to new goals is stated but not directly tested. No ablation on embedding validation, nearest-neighbor analysis, or out-of-distribution goal performance is referenced, leaving the sample-efficiency advantage ungrounded.
Authors: The framework builds on the documented ability of LLM embeddings to capture goal semantics, which is reflected in the measured improvement in LP prediction sample efficiency. We acknowledge that an explicit validation would further ground the claim and will therefore add a nearest-neighbor analysis of embedding clusters together with their predicted competence values, plus results on a held-out out-of-distribution goal set, to the revised §4.2. revision: partial
Circularity Check
No circularity; empirical claims rest on environment interaction results
full rationale
The abstract presents MAGELLAN as a new metacognitive framework that learns competence and LP predictions online by leveraging LLM semantic relationships for generalization in evolving goal spaces. No equations, fitted parameters renamed as predictions, or self-citations are shown that would reduce the central result to its own inputs by construction. The claim of being the only method to fully master the space is presented as an empirical outcome from interactive learning experiments rather than a definitional or self-referential derivation. The derivation chain is therefore self-contained against external benchmarks of agent performance.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., Gopalakrishnan, K., Hausman, K., Herzog, A., Ho, D., Hsu, J., Ibarz, J., Ichter, B., Irpan, A., Jang, E., Ruano, R. M. J., Jeffrey, K., Jesmonth, S., Joshi, N. J., Julian, R. C., Kalashnikov, D., Kuang, Y., Lee, K.-H., Levine, S., Lu, Y., Luu, L., Parada, C., Pastor, P., Quiamb...
work page 2022
-
[3]
Grounding language to autonomously-acquired skills via goal generation
Akakzia, A., Colas, C., Oudeyer, P.-Y., Chetouani, M., and Sigaud, O. Grounding language to autonomously-acquired skills via goal generation. In International Conference on Learning Representations, 2021
work page 2021
-
[4]
Baldassarre, G. and Mirolli, M. Intrinsically motivated learning systems: an overview. Intrinsically motivated learning in natural and artificial systems, pp.\ 1--14, 2012
work page 2012
-
[5]
Baranes, A. and Oudeyer, P.-Y. R- IAC : Robust intrinsically motivated exploration and active learning. IEEE Transactions on Autonomous Mental Development , 1 0 (3): 0 155--169, 2009. ISSN 1943-0612. doi:10.1109/TAMD.2009.2037513. Conference Name: IEEE Transactions on Autonomous Mental Development
-
[6]
Baranes, A. and Oudeyer, P.-Y. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61 0 (1): 0 49--73, 2013. ISSN 0921-8890. doi:10.1016/j.robot.2012.05.008
-
[7]
Berlyne, D. E. A theory of human curiosity. British Journal of Psychology, 1954
work page 1954
-
[8]
Control what you can: Intrinsically motivated task-planning agent
Blaes, S., Vlastelica Pogančić, M., Zhu, J., and Martius, G. Control what you can: Intrinsically motivated task-planning agent. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
work page 2019
-
[9]
Grounding large language models in interactive environments with online reinforcement learning
Carta, T., Romac, C., Wolf, T., Lamprier, S., Sigaud, O., and Oudeyer, P.-Y. Grounding large language models in interactive environments with online reinforcement learning. In Proceedings of the 40th International Conference on Machine Learning, pp.\ 3676--3713. PMLR , 2023. ISSN : 2640-3498
work page 2023
-
[10]
Stein variational goal generation for adaptive exploration in multi-goal reinforcement learning
Castanet, N., Sigaud, O., and Lamprier, S. Stein variational goal generation for adaptive exploration in multi-goal reinforcement learning. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , volume 202 of Proceedings o...
work page 2023
-
[11]
Chevalier-Boisvert, M., Bahdanau, D., Lahlou, S., Willems, L., Saharia, C., Nguyen, T. H., and Bengio, Y. Babyai: A platform to study the sample efficiency of grounded language learning. In International Conference on Learning Representations, 2019
work page 2019
-
[12]
Multi-armed bandits for intelligent tutoring systems
Clement, B., Roy, D., Oudeyer, P.-Y., and Lopes, M. Multi-armed bandits for intelligent tutoring systems. Journal of Educational Data Mining, 7 0 (2), 2015
work page 2015
-
[13]
CURIOUS : intrinsically motivated modular multi-goal reinforcement learning
Colas, C., Fournier, P., Chetouani, M., Sigaud, O., and Oudeyer, P.-Y. CURIOUS : intrinsically motivated modular multi-goal reinforcement learning. In International conference on machine learning, pp.\ 1331--1340. PMLR, 2019
work page 2019
-
[14]
Colas, C., Karch, T., Lair, N., Dussoux, J.-M., Moulin-Frier, C., Dominey, P. F., and Oudeyer, P.-Y. Language as a cognitive tool to imagine goals in curiosity-driven exploration. arXiv :2002.09253 [cs] , 2020
-
[15]
Language and culture internalization for human-like autotelic ai
Colas, C., Karch, T., Moulin-Frier, C., and Oudeyer, P.-Y. Language and culture internalization for human-like autotelic ai. Nature Machine Intelligence, 4 0 (12): 0 1068--1076, 2022 a
work page 2022
-
[16]
Colas, C., Karch, T., Sigaud, O., and Oudeyer, P.-Y. Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research, 74: 0 1159--1199, 2022 b
work page 2022
-
[17]
Emergent complexity and zero-shot transfer via unsupervised environment design
Dennis, M., Jaques, N., Vinitsky, E., Bayen, A., Russell, S., Critch, A., and Levine, S. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in Neural Information Processing Systems, 33, 2020
work page 2020
-
[18]
QL o RA : Efficient finetuning of quantized LLM s
Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. QL o RA : Efficient finetuning of quantized LLM s. In Thirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[19]
Where’s the reward? a review of reinforcement learning for instructional sequencing
Doroudi, S., Aleven, V., and Brunskill, E. Where’s the reward? a review of reinforcement learning for instructional sequencing. International Journal of Artificial Intelligence in Education, 29: 0 568--620, 2019
work page 2019
-
[20]
Open r1: A fully open reproduction of deepseek-r1, January 2025
Face, H. Open r1: A fully open reproduction of deepseek-r1, January 2025. URL https://github.com/huggingface/open-r1
work page 2025
-
[21]
Forestier, S. and Oudeyer, P.-Y. Modular active curiosity-driven discovery of tool use. In Proceedings of the 2016 IEEE / RSJ International Conference on Intelligent Robots and Systems , Proceedings of the 2016 IEEE / RSJ International Conference on Intelligent Robots and Systems, 2016
work page 2016
-
[22]
Intrinsically motivated goal exploration processes with automatic curriculum learning
Forestier, S., Portelas, R., Mollard, Y., and Oudeyer, P.-Y. Intrinsically motivated goal exploration processes with automatic curriculum learning. Journal of Machine Learning Research, 23 0 (1), January 2022. ISSN 1532-4435
work page 2022
-
[23]
Accuracy-based curriculum learning in deep reinforcement learning, 2018
Fournier, P., Sigaud, O., Chetouani, M., and Oudeyer, P.-Y. Accuracy-based curriculum learning in deep reinforcement learning, 2018
work page 2018
-
[24]
Sac-glam: Improving online rl for llm agents with soft actor-critic and hindsight relabeling, 2024
Gaven, L., Romac, C., Carta, T., Lamprier, S., Sigaud, O., and Oudeyer, P.-Y. Sac-glam: Improving online rl for llm agents with soft actor-critic and hindsight relabeling, 2024
work page 2024
-
[25]
Gottlieb, J. and Oudeyer, P.-Y. Towards a neuroscience of active sampling and curiosity. Nature Reviews Neuroscience, 19 0 (12): 0 758--770, 2018
work page 2018
-
[26]
Benchmarking the spectrum of agent capabilities
Hafner, D. Benchmarking the spectrum of agent capabilities. In International Conference on Learning Representations, 2022
work page 2022
-
[27]
Reasoning with language model is planning with world model
Hao, S., Gu, Y., Ma, H., Hong, J., Wang, Z., Wang, D., and Hu, Z. Reasoning with language model is planning with world model. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 8154--8173, Singapore, 2023. Association for Computational Linguistics. doi:10.18653/v1/202...
-
[28]
Automatic goal generation for reinforcement learning agents
Held, D., Geng, X., Florensa, C., and Abbeel, P. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning, 2017
work page 2017
-
[29]
J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W
Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022
work page 2022
-
[30]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents
Huang, W., Abbeel, P., Pathak, D., and Mordatch, I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International conference on machine learning, pp.\ 9118--9147. PMLR, 2022
work page 2022
-
[31]
Wordcraft: An environment for benchmarking commonsense agents
Jiang, M., Luketina, J., Nardelli, N., Minervini, P., Torr, P., Whiteson, S., and Rockt \"a schel, T. Wordcraft: An environment for benchmarking commonsense agents. In Language in Reinforcement Learning Workshop at ICML 2020, 2020
work page 2020
-
[32]
Replay-guided adversarial environment design
Jiang, M., Dennis, M., Parker-Holder, J., Foerster, J., Grefenstette, E., and Rocktäschel, T. Replay-guided adversarial environment design. In Proceedings of the 35th International Conference on Neural Information Processing Systems , NIPS '21, pp.\ 1884--1897, Red Hook, NY, USA, 2021. Curran Associates Inc. ISBN 978-1-71384-539-3
work page 2021
-
[33]
General intelligence requires rethinking exploration
Jiang, M., Rockt \"a schel, T., and Grefenstette, E. General intelligence requires rethinking exploration. Royal Society Open Science, 10 0 (6): 0 230539, 2023
work page 2023
-
[34]
The malmo platform for artificial intelligence experimentation
Johnson, M., Hofmann, K., Hutton, T., and Bignell, D. The malmo platform for artificial intelligence experimentation. In Ijcai, volume 16, pp.\ 4246--4247, 2016
work page 2016
-
[35]
Kanitscheider, I., Huizinga, J., Farhi, D., Guss, W. H., Houghton, B., Sampedro, R., Zhokhov, P., Baker, B., Ecoffet, A., Tang, J., Klimov, O., and Clune, J. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft, 2021
work page 2021
-
[36]
Kaplan, F. and Oudeyer, P.-Y. In search of the neural circuits of intrinsic motivation. Frontiers in neuroscience, 1: 0 9, 2007
work page 2007
-
[37]
Kidd, C. and Hayden, B. Y. The psychology and neuroscience of curiosity. Neuron, 88 0 (3): 0 449--460, 2015
work page 2015
-
[38]
Grimgep: Learning progress for robust goal sampling in visual deep reinforcement learning
Kovač, G., Laversanne-Finot, A., and Oudeyer, P.-Y. Grimgep: Learning progress for robust goal sampling in visual deep reinforcement learning. IEEE Transactions on Cognitive and Developmental Systems, 15 0 (3): 0 1396--1407, 2023. doi:10.1109/TCDS.2022.3216911
-
[39]
Kumar, N., Silver, T., McClinton, W., Zhao, L., Proulx, S., Lozano-Pérez, T., Kaelbling, L. P., and Barry, J. Practice makes perfect: Planning to learn skill parameter policies. In Robotics: Science and Systems (RSS), 2024
work page 2024
-
[40]
Curiosity driven exploration of learned disentangled goal spaces
Laversanne-Finot, A., Pere, A., and Oudeyer, P.-Y. Curiosity driven exploration of learned disentangled goal spaces. In Billard, A., Dragan, A., Peters, J., and Morimoto, J. (eds.), Proceedings of The 2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research, pp.\ 487--504. PMLR, 29--31 Oct 2018
work page 2018
-
[41]
Leonard, J. A., Cordrey, S. R., Liu, H. Z., and Mackey, A. P. Young children calibrate effort based on the trajectory of their performance. Developmental Psychology, 59 0 (3): 0 609, 2023
work page 2023
-
[42]
Lopes, M. and Oudeyer, P.-Y. The strategic student approach for life-long exploration and learning. In 2012 IEEE international conference on development and learning and epigenetic robotics (ICDL), pp.\ 1--8. IEEE, 2012 a
work page 2012
-
[43]
Lopes, M. and Oudeyer, P.-Y. The strategic student approach for life-long exploration and learning. In 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics ( ICDL ) , pp.\ 1--8, 2012 b . doi:10.1109/DevLrn.2012.6400807. ISSN : 2161-9476
-
[44]
Teacher–student curriculum learning, 2020
Matiisen, T., Oliver, A., Cohen, T., and Schulman, J. Teacher–student curriculum learning, 2020
work page 2020
-
[45]
Matthews, M., Beukman, M., Lu, C., and Foerster, J. Kinetix: Investigating the training of general agents through open-ended physics-based control tasks, 2024
work page 2024
-
[46]
Moulin-Frier, C. and Oudeyer, P.-Y. Exploration strategies in developmental robotics: A unified probabilistic framework. In 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics ( ICDL ) , pp.\ 1--6, 2013. doi:10.1109/DevLrn.2013.6652535. ISSN : 2161-9476
-
[47]
Moulin-Frier, C., Nguyen, S. M., and Oudeyer, P.-Y. Self-organization of early vocal development in infants and machines: the role of intrinsic motivation. Frontiers in Psychology, 4, 2014. ISSN 1664-1078. doi:10.3389/fpsyg.2013.01006. Publisher: Frontiers
-
[48]
Oudeyer, P.-Y. and Smith, L. B. How evolution may work through curiosity-driven developmental process. Topics in Cognitive Science, 8 0 (2): 0 492--502, 2016
work page 2016
-
[49]
Oudeyer, P.-Y., Kaplan, F., and Hafner, V. V. Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation , 11 0 (2): 0 265--286, 2007. ISSN 1941-0026. doi:10.1109/TEVC.2006.890271. Conference Name: IEEE Transactions on Evolutionary Computation
-
[50]
Maximum Entropy Gain Exploration for Long Horizon Multi -goal Reinforcement Learning
Pitis, S., Chan, H., Zhao, S., Stadie, B., and Ba, J. Maximum Entropy Gain Exploration for Long Horizon Multi -goal Reinforcement Learning . In Proceedings of the 37th International Conference on Machine Learning , pp.\ 7750--7761. PMLR, November 2020. ISSN: 2640-3498
work page 2020
-
[51]
Poli, F., Meyer, M., Mars, R. B., and Hunnius, S. Exploration in 4-year-old children is guided by learning progress and novelty. Child Development, 2024 a
work page 2024
-
[52]
Poli, F., O’Reilly, J. X., Mars, R. B., and Hunnius, S. Curiosity and the dynamics of optimal exploration. Trends in Cognitive Sciences, 28 0 (5): 0 441--453, 2024 b
work page 2024
-
[53]
H., Dalal, M., Lin, S., Nair, A., Bahl, S., and Levine, S
Pong, V. H., Dalal, M., Lin, S., Nair, A., Bahl, S., and Levine, S. Skew-fit: State-covering self-supervised reinforcement learning. In International Conference on Machine Learning, 2019
work page 2019
-
[54]
Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments
Portelas, R., Colas, C., Hofmann, K., and Oudeyer, P.-Y. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning, pp.\ 835--853. PMLR, 2020 a
work page 2020
-
[55]
Automatic curriculum learning for deep rl: A short survey
Portelas, R., Colas, C., Weng, L., Hofmann, K., and Oudeyer, P.-Y. Automatic curriculum learning for deep rl: A short survey. In International Joint Conference on Artificial Intelligence, 2020 b
work page 2020
-
[56]
ACES : Generating a diversity of challenging programming puzzles with autotelic generative models
Pourcel, J., Colas, C., Molinaro, G., Oudeyer, P.-Y., and Teodorescu, L. ACES : Generating a diversity of challenging programming puzzles with autotelic generative models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[57]
Qwen, :, Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Tang, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wa...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
Automated curriculum generation through setter-solver interactions
Racaniere, S., Lampinen, A., Santoro, A., Reichert, D., Firoiu, V., and Lillicrap, T. Automated curriculum generation through setter-solver interactions. In International Conference on Learning Representations, 2020
work page 2020
-
[59]
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21 0 (140): 0 1--67, 2020. ISSN 1533-7928
work page 2020
-
[60]
TeachMyAgent : a benchmark for automatic curriculum learning in deep RL
Romac, C., Portelas, R., Hofmann, K., and Oudeyer, P.-Y. TeachMyAgent : a benchmark for automatic curriculum learning in deep RL . In International Conference on Machine Learning, pp.\ 9052--9063. PMLR , 2021. ISSN : 2640-3498
work page 2021
-
[61]
Learning progress mediates the link between cognitive effort and task engagement
Sayal , C., Heling, E., and Cools, R. Learning progress mediates the link between cognitive effort and task engagement. Cognition, 236: 0 105418, 2023
work page 2023
-
[62]
Schmidhuber, J. PowerPlay : Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in Psychology, 4, 2013. ISSN 1664-1078. doi:10.3389/fpsyg.2013.00313. Publisher: Frontiers
-
[63]
Reflexion: language agents with verbal reinforcement learning
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. Reflexion: language agents with verbal reinforcement learning. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 8634--8652. Curran Associates, Inc., 2023
work page 2023
-
[64]
J., Perrin-Gilbert, N., and Santucci, V
Sigaud, O., Baldassarre, G., Colas, C., Doncieux, S., Duro, R. J., Perrin-Gilbert, N., and Santucci, V. G. A definition of open-ended learning problems for goal-conditioned agents. ArXiv, abs/2311.00344, 2023
-
[65]
Stout, A. and Barto, A. G. Competence progress intrinsic motivation. In 2010 IEEE 9th International Conference on Development and Learning , pp.\ 257--262, 2010. doi:10.1109/DEVLRN.2010.5578835. ISSN : 2161-9476
-
[66]
Humans monitor learning progress in curiosity-driven exploration
Ten, A., Kaushik, P., Oudeyer, P.-Y., and Gottlieb, J. Humans monitor learning progress in curiosity-driven exploration. Nature Communications, 12 0 (1): 0 5972, 2021. ISSN 2041-1723. doi:10.1038/s41467-021-26196-w. Publisher: Nature Publishing Group
-
[67]
van der Maaten, L. and Hinton, G. Visualizing data using t-sne. Journal of Machine Learning Research, 9 0 (86): 0 2579--2605, 2008
work page 2008
-
[68]
Voyager: An open-ended embodied agent with large language models, 2024
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., and Anandkumar, A. Voyager: An open-ended embodied agent with large language models, 2024. ISSN 2835-8856
work page 2024
-
[69]
V., Kulkarni, T., Ionescu, C., Hansen, S., and Mnih, V
Warde-Farley, D., de Wiele, T. V., Kulkarni, T., Ionescu, C., Hansen, S., and Mnih, V. Unsupervised control through non-parametric discriminative rewards. In International Conference on Learning Representations, 2019
work page 2019
-
[70]
Entropy-regularized token-level policy optimization for large language models, 2024 a
Wen, M., Deng, C., Wang, J., Zhang, W., and Wen, Y. Entropy-regularized token-level policy optimization for large language models, 2024 a
work page 2024
-
[71]
Reinforcing LLM agents via policy optimization with action decomposition, 2024 b
Wen, M., Wan, Z., Wang, J., Zhang, W., and Wen, Y. Reinforcing LLM agents via policy optimization with action decomposition, 2024 b
work page 2024
-
[72]
React: Synergizing reasoning and acting in language models
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[73]
OMNI : Open-endedness via models of human notions of interestingness
Zhang, J., Lehman, J., Stanley, K., and Clune, J. OMNI : Open-endedness via models of human notions of interestingness. In The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[74]
A r CH er: Training language model agents via hierarchical multi-turn RL
Zhou, Y., Zanette, A., Pan, J., Levine, S., and Kumar, A. A r CH er: Training language model agents via hierarchical multi-turn RL . In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learni...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.