pith. sign in

arxiv: 2502.04512 · v3 · submitted 2025-02-06 · 💻 cs.AI

Safety Must Precede the Deployment of Open-Ended AI

Pith reviewed 2026-05-23 03:24 UTC · model grok-4.3

classification 💻 cs.AI
keywords open-ended AIAI safetyemergent misalignmentpredictability lossAI controlpreemptive safetyself-evolving agents
0
0 comments X

The pith

Open-ended AI systems pose unique safety challenges that existing methods cannot address and must be tackled before deployment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper claims that open-ended AI, where agents autonomously create new behaviors indefinitely, brings safety problems unlike those in fixed-task AI. These include losing the ability to predict what the system will do next, misalignment that appears as it evolves, and trouble keeping control once it goes beyond original plans. Because these issues are different in kind, standard safety tools will not work, so the risks need study now rather than after systems are widely used. The paper maps out the problems and urges joint efforts to develop solutions.

Core claim

The defining properties of open-ended AI systems introduce a distinct and underexplored class of safety challenges, including loss of predictability, emergent misalignment, and difficulties in maintaining effective control as systems evolve beyond their initial design assumptions, that must be addressed preemptively. These challenges differ qualitatively from those associated with task-bounded or static models and are unlikely to be addressed by existing safety frameworks alone, which is why these risks must be examined proactively, before large-scale deployment.

What carries the argument

Open-endedness, the property where AI agents autonomously and indefinitely generate novel behaviors, representations, or solutions, which drives the safety concerns.

If this is right

  • Open-ended AI must have safety addressed prior to any large-scale deployment.
  • Current safety approaches for static models will not suffice for open-ended systems.
  • Research must focus on new methods to handle loss of predictability and control.
  • Coordinated action across the field is needed for responsible development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Self-evolving agents in long-horizon tasks may amplify these control issues over time.
  • Without preemptive work, deployment could lead to unintended emergent behaviors that are hard to correct after the fact.
  • Testing frameworks might need to simulate indefinite evolution to check safety.

Load-bearing premise

The safety challenges of open-ended AI are qualitatively different from those of task-specific models and cannot be solved by adapting existing safety methods.

What would settle it

A demonstration that existing safety frameworks can maintain predictability and control over indefinitely evolving open-ended AI systems would undermine the position.

Figures

Figures reproduced from arXiv: 2502.04512 by Ivaxi Sheth, Jan Wehner, Mario Fritz, Ruta Binkyte, Sahar Abdelnabi.

Figure 1
Figure 1. Figure 1: Open-Ended (OE) AI generates increasingly novel ar￾tifacts over time and can be promising to co-evolve with their environments and societal values, hopefully leading to creative solutions, discoveries, and advances for humanity. However, this position paper argues that due to unpredictability, difficulty to con￾trol, and cascading misalignment, they can result in catastrophic risks that are harmful and thr… view at source ↗
Figure 2
Figure 2. Figure 2: The Impossible Triangle of OE AI shows that safety, speed of generating artifacts and novelty cannot be satisfied simul￾taneously; one has to be capped depending on the application. resources to evaluate. Unlike traditional ML models, OE AI requires more continuous evaluation without clear guar￾antees of utility. OE AI is run for a longer time before producing useful results since it involves much explorat… view at source ↗
read the original abstract

AI advancements have been significantly driven by a combination of foundation models and curiosity-driven learning aimed at increasing capability and adaptability. Within this landscape, open-endedness, where AI agents autonomously and indefinitely generate novel behaviors, representations, or solutions, has gained increasing interest. This has become relevant in the context of self-evolving agents and long-horizon discovery. This position paper argues that the defining properties of open-ended AI systems introduce a distinct and underexplored class of safety challenges, including loss of predictability, emergent misalignment, and difficulties in maintaining effective control as systems evolve beyond their initial design assumptions, that must be addressed preemptively. These challenges differ qualitatively from those associated with task-bounded or static models and are unlikely to be addressed by existing safety frameworks alone, which is why these risks must be examined proactively, before large-scale deployment. The paper outlines key challenges, discusses research opportunities, and calls for coordinated action to support the safe and responsible development of open-ended AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. This position paper argues that open-ended AI systems—defined by autonomous, indefinite generation of novel behaviors, representations, or solutions—introduce a qualitatively distinct class of safety challenges (loss of predictability, emergent misalignment, and loss of effective control as systems evolve beyond initial assumptions) that differ from those in task-bounded or static models and cannot be adequately addressed by existing safety frameworks, necessitating preemptive research and coordinated action prior to large-scale deployment.

Significance. If the asserted qualitative distinction holds, the paper would usefully flag an underexplored risk category for self-evolving agents and long-horizon discovery systems, potentially spurring targeted safety research; the call for proactive examination before deployment is a clear advocacy contribution.

major comments (1)
  1. [Abstract] Abstract: the central claim that the listed challenges 'differ qualitatively' from those of task-bounded models and 'are unlikely to be addressed by existing safety frameworks alone' is asserted without any explicit comparison, counterexample, or analysis of specific frameworks (e.g., alignment techniques or control methods) and why they fail for open-ended evolution; this assertion is load-bearing for the preemptive-action recommendation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the recommendation for major revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the listed challenges 'differ qualitatively' from those of task-bounded models and 'are unlikely to be addressed by existing safety frameworks alone' is asserted without any explicit comparison, counterexample, or analysis of specific frameworks (e.g., alignment techniques or control methods) and why they fail for open-ended evolution; this assertion is load-bearing for the preemptive-action recommendation.

    Authors: We acknowledge that the abstract asserts the qualitative distinction and limitations of existing frameworks without explicit comparisons or counterexamples. The manuscript body motivates these claims through discussion of predictability loss under indefinite evolution, emergent misalignment beyond initial training distributions, and control erosion as agent behaviors diverge from design assumptions. As a position paper, the core contribution is to flag this underexplored category rather than provide exhaustive framework analysis. To strengthen the manuscript in response to this comment, we will revise the abstract to reference the key distinctions briefly and add a short subsection in the main text with targeted comparisons (e.g., why RLHF and constitutional AI may not scale to open-ended self-modification). This will better ground the preemptive-action recommendation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; position paper with independent advocacy claims

full rationale

The paper is a position paper whose central argument—that open-ended AI introduces qualitatively distinct safety challenges (loss of predictability, emergent misalignment, control difficulties) not addressed by existing frameworks—is presented as a premise motivating preemptive research rather than derived from any formal chain, equations, or fitted parameters. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear; the distinction from task-bounded systems is asserted directly from general properties of open-endedness without looping back to the paper's own inputs. The claim of insufficiency of existing frameworks is an explicit advocacy stance, not a hidden derivation that reduces to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The position depends on domain assumptions about the behavior of open-ended systems and the inadequacy of prior safety methods without new supporting evidence.

axioms (2)
  • domain assumption Open-ended AI systems autonomously and indefinitely generate novel behaviors, representations, or solutions
    Stated as the core defining property in the abstract.
  • ad hoc to paper Existing safety frameworks will not suffice for open-ended systems
    Asserted without detailed evidence or comparison in the provided text.

pith-pipeline@v0.9.0 · 5704 in / 1335 out tokens · 52011 ms · 2026-05-23T03:24:01.912960+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  3. [3]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  4. [4]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  5. [5]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

  6. [6]

    L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al

    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report. arXiv, 2023

  7. [7]

    Llm-poet: Evolving complex environments using large language models

    Aki, F., Ikeda, R., Saito, T., Regan, C., and Oka, M. Llm-poet: Evolving complex environments using large language models. In the Genetic and Evolutionary Computation Conference Companion, 2024

  8. [8]

    Evolutionary optimization of model merging recipes

    Akiba, T., Shing, M., Tang, Y., Sun, Q., and Ha, D. Evolutionary optimization of model merging recipes. Nature Machine Intelligence, pp.\ 1--10, 2025

  9. [9]

    and Bengio, Y

    Alain, G. and Bengio, Y. Understanding intermediate layers using linear classifier probes. arXiv, 2018

  10. [10]

    E., Fort, S., Lanham, T., Telleen-Lawton, T., Conerly, T., Henighan, T., Hume, T., Bowman, S

    Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., Kerr, J., Mueller, J., Ladish, J., Landau, J., Ndousse, K., Lukosuite, K., Lovitt, L., Sellitto, M., Elhage, N., Schiefer, N., ...

  11. [11]

    Tell me about yourself: Llms are aware of their learned behaviors

    Betley, J., Bao, X., Soto, M., Sztyber-Betley, A., Chua, J., and Evans, O. Tell me about yourself: Llms are aware of their learned behaviors. arXiv, 2025

  12. [12]

    A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

    Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv, 2021

  13. [13]

    B., Zhang, J., Oostermeijer, K., Bellagente, M., Clune, J., Stanley, K., Schott, G., and Lehman, J

    Bradley, H., Dai, A., Teufel, H. B., Zhang, J., Oostermeijer, K., Bellagente, M., Clune, J., Stanley, K., Schott, G., and Lehman, J. Quality-diversity through AI feedback. In Second Agent Learning in Open-Endedness Workshop, 2023

  14. [14]

    Brant, J. C. and Stanley, K. O. Minimal criterion coevolution: a new approach to open-ended search. In the Genetic and Evolutionary Computation Conference, 2017

  15. [15]

    Video generation models as world simulators

    Brooks, T., Peebles, B., Holmes, C., DePue, W., Guo, Y., Jing, L., Schnurr, D., Taylor, J., Luhman, T., Luhman, E., et al. Video generation models as world simulators. [LINK] https://openai. com/research/video-generation-modelsas-world-simulators, 2024

  16. [16]

    D., Edwards, A., Parker-Holder, J., Shi, Y., Hughes, E., Lai, M., Mavalankar, A., Steigerwald, R., Apps, C., et al

    Bruce, J., Dennis, M. D., Edwards, A., Parker-Holder, J., Shi, Y., Hughes, E., Lai, M., Mavalankar, A., Steigerwald, R., Apps, C., et al. Genie: Generative interactive environments. In ICML, 2024

  17. [17]

    H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., et al

    Burns, C., Izmailov, P., Kirchner, J. H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., et al. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv, 2023

  18. [18]

    W., Lopez-Lopez, E., Hechtlinger, S., Rahwan, Z., Aeschbach, S., Bakker, M

    Burton, J. W., Lopez-Lopez, E., Hechtlinger, S., Rahwan, Z., Aeschbach, S., Bakker, M. A., Becker, J. A., Berditchevskaia, A., Berger, J., Brinkmann, L., et al. How large language models can reshape collective intelligence. Nature human behaviour, 8 0 (9): 0 1643--1655, 2024

  19. [19]

    Real-x--robot open-ended autonomous learning architectures: Achieving truly end-to-end sensorimotor autonomous learning systems

    Cartoni, E., Montella, D., Triesch, J., and Baldassarre, G. Real-x--robot open-ended autonomous learning architectures: Achieving truly end-to-end sensorimotor autonomous learning systems. arXiv, 2020

  20. [20]

    Real-x—robot open-ended autonomous learning architecture: Building truly end-to-end sensorimotor autonomous learning systems

    Cartoni, E., Montella, D., Triesch, J., and Baldassarre, G. Real-x—robot open-ended autonomous learning architecture: Building truly end-to-end sensorimotor autonomous learning systems. Transactions on Cognitive and Developmental Systems, 15 0 (4): 0 2014--2030, 2023

  21. [21]

    and Chavan, P

    Chavan, P. and Chavan, P. Automation of ad-ohc dashbord and monitoring of cloud resources using genrative ai to reduce costing and enhance performance. In the IEEE International Conference on Innovations and Challenges in Emerging Technologies (ICICET), 2024

  22. [22]

    Gamegen-x: Interactive open-world game video generation

    Che, H., He, X., Liu, Q., Jin, C., and Chen, H. Gamegen-x: Interactive open-world game video generation. arXiv, 2024

  23. [23]

    Supervising strong learners by amplifying weak experts

    Christiano, P., Shlegeris, B., and Amodei, D. Supervising strong learners by amplifying weak experts. arXiv, 2018

  24. [24]

    Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence

    Clune, J. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv, 2019

  25. [25]

    Sparse autoencoders find highly interpretable features in language models

    Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. arXiv, 2023

  26. [26]

    A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks

    Dawood, M., Shokry, A., and Bennewitz, M. A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks. arXiv, 2024

  27. [27]

    Y., Bhardwaj, R., and Poria, S

    Deep Pala, T., Toh, V. Y., Bhardwaj, R., and Poria, S. Ferret: Faster and effective automated red teaming with reward-based scoring technique. arXiv, 2024

  28. [28]

    K., Togelius, J., and Soros, L

    Dharna, A., Hoover, A. K., Togelius, J., and Soros, L. Transfer dynamics in emergent evolutionary curricula, 2022

  29. [29]

    L., Koch, J., Sharkey, L

    Di Langosco, L. L., Koch, J., Sharkey, L. D., Pfau, J., and Krueger, D. Goal misgeneralization in deep reinforcement learning. In ICML, 2022

  30. [30]

    Open questions in creating safe open-ended ai: tensions between control and creativity

    Ecoffet, A., Clune, J., and Lehman, J. Open questions in creating safe open-ended ai: tensions between control and creativity. In Artificial Life Conference Proceedings 32, pp.\ 27--35. MIT Press, 2020

  31. [31]

    Clha: A simple yet effective contrastive learning framework for human alignment

    Fang, F., Zhu, L., Feng, X., Hou, J., Zhao, Q., Li, C., Hu, X., Xu, R., and Yang, M. Clha: A simple yet effective contrastive learning framework for human alignment. In the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024

  32. [32]

    F., Akametalu, A

    Fisac, J. F., Akametalu, A. K., Zeilinger, M. N., Kaynama, S., Gillula, J., and Tomlin, C. J. A general safety framework for learning-based control in uncertain robotic systems. Transactions on Automatic Control, 64 0 (7): 0 2737--2752, 2018

  33. [33]

    and Cully, A

    Flageat, M. and Cully, A. Uncertain quality-diversity: evaluation methodology and new methods for quality-diversity in uncertain domains. Transactions on Evolutionary Computation, 2023

  34. [34]

    Exploring the performance-reproducibility trade-off in quality-diversity

    Flageat, M., Janmohamed, H., Lim, B., and Cully, A. Exploring the performance-reproducibility trade-off in quality-diversity. arXiv, 2024

  35. [35]

    Evaluating superhuman models with consistency checks

    Fluri, L., Paleka, D., and Tram \`e r, F. Evaluating superhuman models with consistency checks. In SaTML, 2024

  36. [36]

    and Fern \'a ndez, F

    Garc a, J. and Fern \'a ndez, F. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16 0 (1): 0 1437--1480, 2015

  37. [37]

    Y., Joglekar, M., Wallace, E., Jain, S., Barak, B., Helyar, A., Dias, R., Vallone, A., Ren, H., Wei, J., Chung, H

    Guan, M. Y., Joglekar, M., Wallace, E., Jain, S., Barak, B., Helyar, A., Dias, R., Vallone, A., Ren, H., Wei, J., Chung, H. W., Toyer, S., Heidecke, J., Beutel, A., and Glaese, A. Deliberative alignment: Reasoning enables safer language models, 2025

  38. [38]

    Han, V. T. Y., Bhardwaj, R., and Poria, S. Ruby teaming: Improving quality diversity search with memory for automated red teaming. arXiv, 2024

  39. [39]

    Unsolved problems in ml safety

    Hendrycks, D., Carlini, N., Schulman, J., and Steinhardt, J. Unsolved problems in ml safety. arXiv, 2021

  40. [40]

    An overview of catastrophic ai risks

    Hendrycks, D., Mazeika, M., and Woodside, T. An overview of catastrophic ai risks. arXiv, 2023

  41. [41]

    Adaptive preference scaling for reinforcement learning with human feedback

    Hong, I., Li, Z., Bukharin, A., Li, Y., Jiang, H., Yang, T., and Zhao, T. Adaptive preference scaling for reinforcement learning with human feedback. In NeurPS, 2024

  42. [42]

    and Clune, J

    Hu, S. and Clune, J. Thought cloning: Learning to think while acting by imitating human thinking. NeurIPS, 2024

  43. [43]

    S., Yu, A

    Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. Large language models cannot self-correct reasoning yet. In ICLR, 2024

  44. [44]

    Open-endedness is essential for artificial superhuman intelligence

    Hughes, E., Dennis, M., Parker-Holder, J., Behbahani, F., Mavalankar, A., Shi, Y., Schaul, T., and Rocktaschel, T. Open-endedness is essential for artificial superhuman intelligence. ICML, 2024

  45. [45]

    Reward learning from human preferences and demonstrations in atari

    Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., and Amodei, D. Reward learning from human preferences and demonstrations in atari. In NeurIPS, 2018

  46. [46]

    Ai safety via debate

    Irving, G., Christiano, P., and Amodei, D. Ai safety via debate. arXiv, 2018

  47. [47]

    Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.-C., Guo, Y., and Gao, W

    Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K. Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.-C., Guo, Y., and Gao, W. Ai alignment: A comprehensive survey, 2024

  48. [48]

    General intelligence requires rethinking exploration

    Jiang, M., Rockt \"a schel, T., and Grefenstette, E. General intelligence requires rethinking exploration. Royal Society Open Science, 10 0 (6): 0 230539, 2023

  49. [49]

    The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars

    Kafka, P. The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars. Procedia Engineering, 45: 0 2--10, 2012

  50. [50]

    When can llms actually correct their own mistakes? a critical survey of self-correction of llms

    Kamoi, R., Zhang, Y., Zhang, N., Han, J., and Zhang, R. When can llms actually correct their own mistakes? a critical survey of self-correction of llms. Transactions of the Association for Computational Linguistics, 12: 0 1417--1440, 2024

  51. [51]

    R., Rocktäschel, T., and Perez, E

    Khan, A., Hughes, J., Valentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Rocktäschel, T., and Perez, E. Debating with more persuasive llms leads to more truthful answers. In ICML, 2024

  52. [52]

    Causal reasoning and large language models: Opening a new frontier for causality

    K c man, E., Ness, R., Sharma, A., and Tan, C. Causal reasoning and large language models: Opening a new frontier for causality. TMLR, 2024

  53. [53]

    Penalizing side effects using stepwise relative reachability

    Krakovna, V., Orseau, L., Kumar, R., Martic, M., and Legg, S. Penalizing side effects using stepwise relative reachability. arXiv, 2018

  54. [54]

    Specification gaming: the flip side of ai ingenuity

    Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., and Legg, S. Specification gaming: the flip side of ai ingenuity. [LINK] https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/, 2020

  55. [55]

    and Stanley, K

    Lehman, J. and Stanley, K. O. Revising the evolutionary computation abstraction: minimal criteria novelty search. In the 12th annual conference on Genetic and evolutionary computation, 2010

  56. [56]

    and Stanley, K

    Lehman, J. and Stanley, K. O. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19 0 (2): 0 189--223, 2011

  57. [57]

    Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., and Stanley, K. O. Evolution through large models. In Handbook of Evolutionary Machine Learning, pp.\ 331--366. Springer, 2023

  58. [58]

    Leveson, N. G. Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press, 2012

  59. [59]

    Li, D., Zhang, C., Dong, K., Deik, D. G. X., Tang, R., and Liu, Y. Aligning crowd feedback via distributional preference reward modeling. arXiv, 2024

  60. [60]

    Large language models as evolutionary optimizers

    Liu, S., Chen, C., Qu, X., Tang, K., and Ong, Y.-S. Large language models as evolutionary optimizers. In 2024 IEEE Congress on Evolutionary Computation (CEC), pp.\ 1--8. IEEE, 2024

  61. [61]

    T., Foerster, J., Clune, J., and Ha, D

    Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., and Ha, D. The ai scientist: Towards fully automated open-ended scientific discovery. arXiv, 2024

  62. [62]

    R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., and Legg, S

    Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., and Legg, S. Position: Levels of agi for operationalizing progress on the path to agi. In ICML, 2024

  63. [63]

    K., Strouse, D., Sandholm, T., Salakhutdinov, R., Dragan, A., and McAleer, S

    Moskovitz, T., Singh, A. K., Strouse, D., Sandholm, T., Salakhutdinov, R., Dragan, A., and McAleer, S. M. Confronting reward model overoptimization with constrained RLHF . In ICLR, 2024

  64. [64]

    and Clune, J

    Mouret, J.-B. and Clune, J. Illuminating search spaces by mapping elites. arXiv, 2015

  65. [65]

    W., Teodorescu, L., Hayes, C

    Nisioti, E., Glanois, C., Najarro, E., Dai, A., Meyerson, E., Pedersen, J. W., Teodorescu, L., Hayes, C. F., Sudhakaran, S., and Risi, S. From text to life: On the reciprocal relationship between artificial life and large language models. In Artificial Life Conference Proceedings 36, volume 2024, pp.\ 39. MIT Press, 2024

  66. [66]

    A., Channon, A., Ikegami, T., Rasmussen, S., Stanley, K

    Packard, N., Bedau, M. A., Channon, A., Ikegami, T., Rasmussen, S., Stanley, K. O., and Taylor, T. An overview of open-ended evolution: Editorial introduction to the open-ended evolution ii special issue. Artificial life, 25 0 (2): 0 93--103, 2019

  67. [67]

    Carbon emissions and large neural network training

    Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. Carbon emissions and large neural network training. arXiv, 2021

  68. [68]

    Wired - the true cost of generative ai: Data centers and energy consumption

    Pfeiffer, E. Wired - the true cost of generative ai: Data centers and energy consumption. [LINK] https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/, 2023

  69. [69]

    W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I

    Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. Robust speech recognition via large-scale weak supervision. In ICML, 2023

  70. [70]

    Zero-shot text-to-image generation

    Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In ICML, 2021

  71. [71]

    and Everitt, T

    Richens, J. and Everitt, T. Robust agents learn causal world models. ICLR, 2024

  72. [72]

    G., Lyons, O., Summitt, A., Fatima, A., Pak, J., Shao, W., Chalmers, R., Englander, A., Staley, E

    Rivera, C. G., Lyons, O., Summitt, A., Fatima, A., Pak, J., Shao, W., Chalmers, R., Englander, A., Staley, E. W., Wang, I., et al. Tanksworld: a multi-agent environment for ai safety research. arXiv, 2020

  73. [73]

    High-resolution image synthesis with latent diffusion models

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

  74. [74]

    C., Lupu, A., Hambro, E., Markosyan, A

    Samvelyan, M., Raparthy, S. C., Lupu, A., Hambro, E., Markosyan, A. H., Bhatt, M., Mao, Y., Jiang, M., Parker-Holder, J., Foerster, J. N., Rockt \"a schel, T., and Raileanu, R. Rainbow teaming: Open-ended generation of diverse adversarial prompts. In NeurIPS, 2024

  75. [75]

    B., Rodriguez, A., Campbell, A., and Stanley, K

    Secretan, J., Beato, N., D Ambrosio, D. B., Rodriguez, A., Campbell, A., and Stanley, K. O. Picbreeder: evolving pictures collaboratively online. In the SIGCHI conference on human factors in computing systems, 2008

  76. [76]

    Goal misgeneralization: Why correct specifications aren't enough for correct goals

    Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J., and Kenton, Z. Goal misgeneralization: Why correct specifications aren't enough for correct goals. arXiv, 2022

  77. [77]

    Sigaud, O., Baldassarre, G., Colas, C., Doncieux, S., Duro, R., Oudeyer, P.-Y., Perrin-Gilbert, N., and Santucci, V. G. A definition of open-ended learning problems for goal-conditioned agents. arXiv, 2023

  78. [78]

    and Stanley, K

    Soros, L. and Stanley, K. Identifying necessary conditions for open-ended evolution through the artificial life world of chromaria. In Artificial Life Conference Proceedings, pp.\ 793--800. MIT Press, 2014

  79. [79]

    B., Lehman, J., and Stanley, K

    Soros, L. B., Lehman, J., and Stanley, K. O. Open-endedness: The last grand challenge you’ve never heard of, 2017

  80. [80]

    Stanley, K. O. Why open-endedness matters. Artificial life, 25 0 (3): 0 232--235, 2019

Showing first 80 references.