Safety Must Precede the Deployment of Open-Ended AI

Ivaxi Sheth; Jan Wehner; Mario Fritz; Ruta Binkyte; Sahar Abdelnabi

arxiv: 2502.04512 · v3 · submitted 2025-02-06 · 💻 cs.AI

Safety Must Precede the Deployment of Open-Ended AI

Ivaxi Sheth , Jan Wehner , Sahar Abdelnabi , Ruta Binkyte , Mario Fritz This is my paper

Pith reviewed 2026-05-23 03:24 UTC · model grok-4.3

classification 💻 cs.AI

keywords open-ended AIAI safetyemergent misalignmentpredictability lossAI controlpreemptive safetyself-evolving agents

0 comments

The pith

Open-ended AI systems pose unique safety challenges that existing methods cannot address and must be tackled before deployment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper claims that open-ended AI, where agents autonomously create new behaviors indefinitely, brings safety problems unlike those in fixed-task AI. These include losing the ability to predict what the system will do next, misalignment that appears as it evolves, and trouble keeping control once it goes beyond original plans. Because these issues are different in kind, standard safety tools will not work, so the risks need study now rather than after systems are widely used. The paper maps out the problems and urges joint efforts to develop solutions.

Core claim

The defining properties of open-ended AI systems introduce a distinct and underexplored class of safety challenges, including loss of predictability, emergent misalignment, and difficulties in maintaining effective control as systems evolve beyond their initial design assumptions, that must be addressed preemptively. These challenges differ qualitatively from those associated with task-bounded or static models and are unlikely to be addressed by existing safety frameworks alone, which is why these risks must be examined proactively, before large-scale deployment.

What carries the argument

Open-endedness, the property where AI agents autonomously and indefinitely generate novel behaviors, representations, or solutions, which drives the safety concerns.

If this is right

Open-ended AI must have safety addressed prior to any large-scale deployment.
Current safety approaches for static models will not suffice for open-ended systems.
Research must focus on new methods to handle loss of predictability and control.
Coordinated action across the field is needed for responsible development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Self-evolving agents in long-horizon tasks may amplify these control issues over time.
Without preemptive work, deployment could lead to unintended emergent behaviors that are hard to correct after the fact.
Testing frameworks might need to simulate indefinite evolution to check safety.

Load-bearing premise

The safety challenges of open-ended AI are qualitatively different from those of task-specific models and cannot be solved by adapting existing safety methods.

What would settle it

A demonstration that existing safety frameworks can maintain predictability and control over indefinitely evolving open-ended AI systems would undermine the position.

Figures

Figures reproduced from arXiv: 2502.04512 by Ivaxi Sheth, Jan Wehner, Mario Fritz, Ruta Binkyte, Sahar Abdelnabi.

**Figure 1.** Figure 1: Open-Ended (OE) AI generates increasingly novel artifacts over time and can be promising to co-evolve with their environments and societal values, hopefully leading to creative solutions, discoveries, and advances for humanity. However, this position paper argues that due to unpredictability, difficulty to control, and cascading misalignment, they can result in catastrophic risks that are harmful and thr… view at source ↗

**Figure 2.** Figure 2: The Impossible Triangle of OE AI shows that safety, speed of generating artifacts and novelty cannot be satisfied simultaneously; one has to be capped depending on the application. resources to evaluate. Unlike traditional ML models, OE AI requires more continuous evaluation without clear guarantees of utility. OE AI is run for a longer time before producing useful results since it involves much explorat… view at source ↗

read the original abstract

AI advancements have been significantly driven by a combination of foundation models and curiosity-driven learning aimed at increasing capability and adaptability. Within this landscape, open-endedness, where AI agents autonomously and indefinitely generate novel behaviors, representations, or solutions, has gained increasing interest. This has become relevant in the context of self-evolving agents and long-horizon discovery. This position paper argues that the defining properties of open-ended AI systems introduce a distinct and underexplored class of safety challenges, including loss of predictability, emergent misalignment, and difficulties in maintaining effective control as systems evolve beyond their initial design assumptions, that must be addressed preemptively. These challenges differ qualitatively from those associated with task-bounded or static models and are unlikely to be addressed by existing safety frameworks alone, which is why these risks must be examined proactively, before large-scale deployment. The paper outlines key challenges, discusses research opportunities, and calls for coordinated action to support the safe and responsible development of open-ended AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper restating that open-ended AI needs preemptive safety work, without new evidence or comparisons to show why existing approaches fall short.

read the letter

This paper argues that open-ended AI systems, which autonomously generate novel behaviors over time, bring safety issues like loss of predictability, emergent misalignment, and control problems that current frameworks won't cover, so we need to address them before any large-scale deployment. It focuses on self-evolving agents and long-horizon discovery tasks as the context where these properties matter most. The authors outline the challenges and push for coordinated research to handle them proactively. That part is straightforward and useful for setting priorities in the subfield. The paper does synthesize concerns around adaptability in foundation models combined with curiosity-driven learning, and it flags research opportunities without overclaiming technical fixes. The soft spot is the central claim itself. It asserts that the problems differ qualitatively from task-bounded systems and can't be handled by existing safety methods, but offers no comparisons, examples, or analysis to show why standard techniques would fail here. The argument stays at the level of logical distinctions drawn from general properties of open-ended systems. No data or formal breakdown supports the insufficiency point. This is mainly for AI safety researchers already working on autonomous agents and long-term behavior. Readers looking for new methods, derivations, or empirical tests will not find them. The reasoning is clear and engages the literature directly without contradictions or circular definitions. It deserves peer review in a venue that takes position papers, since the topic is relevant and the call for action could benefit from community feedback even if the evidence base stays thin.

Referee Report

1 major / 0 minor

Summary. This position paper argues that open-ended AI systems—defined by autonomous, indefinite generation of novel behaviors, representations, or solutions—introduce a qualitatively distinct class of safety challenges (loss of predictability, emergent misalignment, and loss of effective control as systems evolve beyond initial assumptions) that differ from those in task-bounded or static models and cannot be adequately addressed by existing safety frameworks, necessitating preemptive research and coordinated action prior to large-scale deployment.

Significance. If the asserted qualitative distinction holds, the paper would usefully flag an underexplored risk category for self-evolving agents and long-horizon discovery systems, potentially spurring targeted safety research; the call for proactive examination before deployment is a clear advocacy contribution.

major comments (1)

[Abstract] Abstract: the central claim that the listed challenges 'differ qualitatively' from those of task-bounded models and 'are unlikely to be addressed by existing safety frameworks alone' is asserted without any explicit comparison, counterexample, or analysis of specific frameworks (e.g., alignment techniques or control methods) and why they fail for open-ended evolution; this assertion is load-bearing for the preemptive-action recommendation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the recommendation for major revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the listed challenges 'differ qualitatively' from those of task-bounded models and 'are unlikely to be addressed by existing safety frameworks alone' is asserted without any explicit comparison, counterexample, or analysis of specific frameworks (e.g., alignment techniques or control methods) and why they fail for open-ended evolution; this assertion is load-bearing for the preemptive-action recommendation.

Authors: We acknowledge that the abstract asserts the qualitative distinction and limitations of existing frameworks without explicit comparisons or counterexamples. The manuscript body motivates these claims through discussion of predictability loss under indefinite evolution, emergent misalignment beyond initial training distributions, and control erosion as agent behaviors diverge from design assumptions. As a position paper, the core contribution is to flag this underexplored category rather than provide exhaustive framework analysis. To strengthen the manuscript in response to this comment, we will revise the abstract to reference the key distinctions briefly and add a short subsection in the main text with targeted comparisons (e.g., why RLHF and constitutional AI may not scale to open-ended self-modification). This will better ground the preemptive-action recommendation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; position paper with independent advocacy claims

full rationale

The paper is a position paper whose central argument—that open-ended AI introduces qualitatively distinct safety challenges (loss of predictability, emergent misalignment, control difficulties) not addressed by existing frameworks—is presented as a premise motivating preemptive research rather than derived from any formal chain, equations, or fitted parameters. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear; the distinction from task-bounded systems is asserted directly from general properties of open-endedness without looping back to the paper's own inputs. The claim of insufficiency of existing frameworks is an explicit advocacy stance, not a hidden derivation that reduces to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The position depends on domain assumptions about the behavior of open-ended systems and the inadequacy of prior safety methods without new supporting evidence.

axioms (2)

domain assumption Open-ended AI systems autonomously and indefinitely generate novel behaviors, representations, or solutions
Stated as the core defining property in the abstract.
ad hoc to paper Existing safety frameworks will not suffice for open-ended systems
Asserted without detailed evidence or comparison in the provided text.

pith-pipeline@v0.9.0 · 5704 in / 1335 out tokens · 52011 ms · 2026-05-23T03:24:01.912960+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[3]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[4]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[5]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page
[6]

L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report. arXiv, 2023

work page 2023
[7]

Llm-poet: Evolving complex environments using large language models

Aki, F., Ikeda, R., Saito, T., Regan, C., and Oka, M. Llm-poet: Evolving complex environments using large language models. In the Genetic and Evolutionary Computation Conference Companion, 2024

work page 2024
[8]

Evolutionary optimization of model merging recipes

Akiba, T., Shing, M., Tang, Y., Sun, Q., and Ha, D. Evolutionary optimization of model merging recipes. Nature Machine Intelligence, pp.\ 1--10, 2025

work page 2025
[9]

and Bengio, Y

Alain, G. and Bengio, Y. Understanding intermediate layers using linear classifier probes. arXiv, 2018

work page 2018
[10]

E., Fort, S., Lanham, T., Telleen-Lawton, T., Conerly, T., Henighan, T., Hume, T., Bowman, S

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., Kerr, J., Mueller, J., Ladish, J., Landau, J., Ndousse, K., Lukosuite, K., Lovitt, L., Sellitto, M., Elhage, N., Schiefer, N., ...

work page 2022
[11]

Tell me about yourself: Llms are aware of their learned behaviors

Betley, J., Bao, X., Soto, M., Sztyber-Betley, A., Chua, J., and Evans, O. Tell me about yourself: Llms are aware of their learned behaviors. arXiv, 2025

work page 2025
[12]

A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv, 2021

work page 2021
[13]

B., Zhang, J., Oostermeijer, K., Bellagente, M., Clune, J., Stanley, K., Schott, G., and Lehman, J

Bradley, H., Dai, A., Teufel, H. B., Zhang, J., Oostermeijer, K., Bellagente, M., Clune, J., Stanley, K., Schott, G., and Lehman, J. Quality-diversity through AI feedback. In Second Agent Learning in Open-Endedness Workshop, 2023

work page 2023
[14]

Brant, J. C. and Stanley, K. O. Minimal criterion coevolution: a new approach to open-ended search. In the Genetic and Evolutionary Computation Conference, 2017

work page 2017
[15]

Video generation models as world simulators

Brooks, T., Peebles, B., Holmes, C., DePue, W., Guo, Y., Jing, L., Schnurr, D., Taylor, J., Luhman, T., Luhman, E., et al. Video generation models as world simulators. [LINK] https://openai. com/research/video-generation-modelsas-world-simulators, 2024

work page 2024
[16]

D., Edwards, A., Parker-Holder, J., Shi, Y., Hughes, E., Lai, M., Mavalankar, A., Steigerwald, R., Apps, C., et al

Bruce, J., Dennis, M. D., Edwards, A., Parker-Holder, J., Shi, Y., Hughes, E., Lai, M., Mavalankar, A., Steigerwald, R., Apps, C., et al. Genie: Generative interactive environments. In ICML, 2024

work page 2024
[17]

H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., et al

Burns, C., Izmailov, P., Kirchner, J. H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., et al. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv, 2023

work page 2023
[18]

W., Lopez-Lopez, E., Hechtlinger, S., Rahwan, Z., Aeschbach, S., Bakker, M

Burton, J. W., Lopez-Lopez, E., Hechtlinger, S., Rahwan, Z., Aeschbach, S., Bakker, M. A., Becker, J. A., Berditchevskaia, A., Berger, J., Brinkmann, L., et al. How large language models can reshape collective intelligence. Nature human behaviour, 8 0 (9): 0 1643--1655, 2024

work page 2024
[19]

Real-x--robot open-ended autonomous learning architectures: Achieving truly end-to-end sensorimotor autonomous learning systems

Cartoni, E., Montella, D., Triesch, J., and Baldassarre, G. Real-x--robot open-ended autonomous learning architectures: Achieving truly end-to-end sensorimotor autonomous learning systems. arXiv, 2020

work page 2020
[20]

Real-x—robot open-ended autonomous learning architecture: Building truly end-to-end sensorimotor autonomous learning systems

Cartoni, E., Montella, D., Triesch, J., and Baldassarre, G. Real-x—robot open-ended autonomous learning architecture: Building truly end-to-end sensorimotor autonomous learning systems. Transactions on Cognitive and Developmental Systems, 15 0 (4): 0 2014--2030, 2023

work page 2014
[21]

and Chavan, P

Chavan, P. and Chavan, P. Automation of ad-ohc dashbord and monitoring of cloud resources using genrative ai to reduce costing and enhance performance. In the IEEE International Conference on Innovations and Challenges in Emerging Technologies (ICICET), 2024

work page 2024
[22]

Gamegen-x: Interactive open-world game video generation

Che, H., He, X., Liu, Q., Jin, C., and Chen, H. Gamegen-x: Interactive open-world game video generation. arXiv, 2024

work page 2024
[23]

Supervising strong learners by amplifying weak experts

Christiano, P., Shlegeris, B., and Amodei, D. Supervising strong learners by amplifying weak experts. arXiv, 2018

work page 2018
[24]

Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence

Clune, J. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv, 2019

work page 2019
[25]

Sparse autoencoders find highly interpretable features in language models

Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. arXiv, 2023

work page 2023
[26]

A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks

Dawood, M., Shokry, A., and Bennewitz, M. A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks. arXiv, 2024

work page 2024
[27]

Y., Bhardwaj, R., and Poria, S

Deep Pala, T., Toh, V. Y., Bhardwaj, R., and Poria, S. Ferret: Faster and effective automated red teaming with reward-based scoring technique. arXiv, 2024

work page 2024
[28]

K., Togelius, J., and Soros, L

Dharna, A., Hoover, A. K., Togelius, J., and Soros, L. Transfer dynamics in emergent evolutionary curricula, 2022

work page 2022
[29]

L., Koch, J., Sharkey, L

Di Langosco, L. L., Koch, J., Sharkey, L. D., Pfau, J., and Krueger, D. Goal misgeneralization in deep reinforcement learning. In ICML, 2022

work page 2022
[30]

Open questions in creating safe open-ended ai: tensions between control and creativity

Ecoffet, A., Clune, J., and Lehman, J. Open questions in creating safe open-ended ai: tensions between control and creativity. In Artificial Life Conference Proceedings 32, pp.\ 27--35. MIT Press, 2020

work page 2020
[31]

Clha: A simple yet effective contrastive learning framework for human alignment

Fang, F., Zhu, L., Feng, X., Hou, J., Zhao, Q., Li, C., Hu, X., Xu, R., and Yang, M. Clha: A simple yet effective contrastive learning framework for human alignment. In the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024

work page 2024
[32]

F., Akametalu, A

Fisac, J. F., Akametalu, A. K., Zeilinger, M. N., Kaynama, S., Gillula, J., and Tomlin, C. J. A general safety framework for learning-based control in uncertain robotic systems. Transactions on Automatic Control, 64 0 (7): 0 2737--2752, 2018

work page 2018
[33]

and Cully, A

Flageat, M. and Cully, A. Uncertain quality-diversity: evaluation methodology and new methods for quality-diversity in uncertain domains. Transactions on Evolutionary Computation, 2023

work page 2023
[34]

Exploring the performance-reproducibility trade-off in quality-diversity

Flageat, M., Janmohamed, H., Lim, B., and Cully, A. Exploring the performance-reproducibility trade-off in quality-diversity. arXiv, 2024

work page 2024
[35]

Evaluating superhuman models with consistency checks

Fluri, L., Paleka, D., and Tram \`e r, F. Evaluating superhuman models with consistency checks. In SaTML, 2024

work page 2024
[36]

and Fern \'a ndez, F

Garc a, J. and Fern \'a ndez, F. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16 0 (1): 0 1437--1480, 2015

work page 2015
[37]

Y., Joglekar, M., Wallace, E., Jain, S., Barak, B., Helyar, A., Dias, R., Vallone, A., Ren, H., Wei, J., Chung, H

Guan, M. Y., Joglekar, M., Wallace, E., Jain, S., Barak, B., Helyar, A., Dias, R., Vallone, A., Ren, H., Wei, J., Chung, H. W., Toyer, S., Heidecke, J., Beutel, A., and Glaese, A. Deliberative alignment: Reasoning enables safer language models, 2025

work page 2025
[38]

Han, V. T. Y., Bhardwaj, R., and Poria, S. Ruby teaming: Improving quality diversity search with memory for automated red teaming. arXiv, 2024

work page 2024
[39]

Unsolved problems in ml safety

Hendrycks, D., Carlini, N., Schulman, J., and Steinhardt, J. Unsolved problems in ml safety. arXiv, 2021

work page 2021
[40]

An overview of catastrophic ai risks

Hendrycks, D., Mazeika, M., and Woodside, T. An overview of catastrophic ai risks. arXiv, 2023

work page 2023
[41]

Adaptive preference scaling for reinforcement learning with human feedback

Hong, I., Li, Z., Bukharin, A., Li, Y., Jiang, H., Yang, T., and Zhao, T. Adaptive preference scaling for reinforcement learning with human feedback. In NeurPS, 2024

work page 2024
[42]

and Clune, J

Hu, S. and Clune, J. Thought cloning: Learning to think while acting by imitating human thinking. NeurIPS, 2024

work page 2024
[43]

S., Yu, A

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. Large language models cannot self-correct reasoning yet. In ICLR, 2024

work page 2024
[44]

Open-endedness is essential for artificial superhuman intelligence

Hughes, E., Dennis, M., Parker-Holder, J., Behbahani, F., Mavalankar, A., Shi, Y., Schaul, T., and Rocktaschel, T. Open-endedness is essential for artificial superhuman intelligence. ICML, 2024

work page 2024
[45]

Reward learning from human preferences and demonstrations in atari

Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., and Amodei, D. Reward learning from human preferences and demonstrations in atari. In NeurIPS, 2018

work page 2018
[46]

Ai safety via debate

Irving, G., Christiano, P., and Amodei, D. Ai safety via debate. arXiv, 2018

work page 2018
[47]

Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.-C., Guo, Y., and Gao, W

Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K. Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.-C., Guo, Y., and Gao, W. Ai alignment: A comprehensive survey, 2024

work page 2024
[48]

General intelligence requires rethinking exploration

Jiang, M., Rockt \"a schel, T., and Grefenstette, E. General intelligence requires rethinking exploration. Royal Society Open Science, 10 0 (6): 0 230539, 2023

work page 2023
[49]

The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars

Kafka, P. The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars. Procedia Engineering, 45: 0 2--10, 2012

work page 2012
[50]

When can llms actually correct their own mistakes? a critical survey of self-correction of llms

Kamoi, R., Zhang, Y., Zhang, N., Han, J., and Zhang, R. When can llms actually correct their own mistakes? a critical survey of self-correction of llms. Transactions of the Association for Computational Linguistics, 12: 0 1417--1440, 2024

work page 2024
[51]

R., Rocktäschel, T., and Perez, E

Khan, A., Hughes, J., Valentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Rocktäschel, T., and Perez, E. Debating with more persuasive llms leads to more truthful answers. In ICML, 2024

work page 2024
[52]

Causal reasoning and large language models: Opening a new frontier for causality

K c man, E., Ness, R., Sharma, A., and Tan, C. Causal reasoning and large language models: Opening a new frontier for causality. TMLR, 2024

work page 2024
[53]

Penalizing side effects using stepwise relative reachability

Krakovna, V., Orseau, L., Kumar, R., Martic, M., and Legg, S. Penalizing side effects using stepwise relative reachability. arXiv, 2018

work page 2018
[54]

Specification gaming: the flip side of ai ingenuity

Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., and Legg, S. Specification gaming: the flip side of ai ingenuity. [LINK] https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/, 2020

work page 2020
[55]

and Stanley, K

Lehman, J. and Stanley, K. O. Revising the evolutionary computation abstraction: minimal criteria novelty search. In the 12th annual conference on Genetic and evolutionary computation, 2010

work page 2010
[56]

and Stanley, K

Lehman, J. and Stanley, K. O. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19 0 (2): 0 189--223, 2011

work page 2011
[57]

Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., and Stanley, K. O. Evolution through large models. In Handbook of Evolutionary Machine Learning, pp.\ 331--366. Springer, 2023

work page 2023
[58]

Leveson, N. G. Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press, 2012

work page 2012
[59]

Li, D., Zhang, C., Dong, K., Deik, D. G. X., Tang, R., and Liu, Y. Aligning crowd feedback via distributional preference reward modeling. arXiv, 2024

work page 2024
[60]

Large language models as evolutionary optimizers

Liu, S., Chen, C., Qu, X., Tang, K., and Ong, Y.-S. Large language models as evolutionary optimizers. In 2024 IEEE Congress on Evolutionary Computation (CEC), pp.\ 1--8. IEEE, 2024

work page 2024
[61]

T., Foerster, J., Clune, J., and Ha, D

Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., and Ha, D. The ai scientist: Towards fully automated open-ended scientific discovery. arXiv, 2024

work page 2024
[62]

R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., and Legg, S

Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., and Legg, S. Position: Levels of agi for operationalizing progress on the path to agi. In ICML, 2024

work page 2024
[63]

K., Strouse, D., Sandholm, T., Salakhutdinov, R., Dragan, A., and McAleer, S

Moskovitz, T., Singh, A. K., Strouse, D., Sandholm, T., Salakhutdinov, R., Dragan, A., and McAleer, S. M. Confronting reward model overoptimization with constrained RLHF . In ICLR, 2024

work page 2024
[64]

and Clune, J

Mouret, J.-B. and Clune, J. Illuminating search spaces by mapping elites. arXiv, 2015

work page 2015
[65]

W., Teodorescu, L., Hayes, C

Nisioti, E., Glanois, C., Najarro, E., Dai, A., Meyerson, E., Pedersen, J. W., Teodorescu, L., Hayes, C. F., Sudhakaran, S., and Risi, S. From text to life: On the reciprocal relationship between artificial life and large language models. In Artificial Life Conference Proceedings 36, volume 2024, pp.\ 39. MIT Press, 2024

work page 2024
[66]

A., Channon, A., Ikegami, T., Rasmussen, S., Stanley, K

Packard, N., Bedau, M. A., Channon, A., Ikegami, T., Rasmussen, S., Stanley, K. O., and Taylor, T. An overview of open-ended evolution: Editorial introduction to the open-ended evolution ii special issue. Artificial life, 25 0 (2): 0 93--103, 2019

work page 2019
[67]

Carbon emissions and large neural network training

Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. Carbon emissions and large neural network training. arXiv, 2021

work page 2021
[68]

Wired - the true cost of generative ai: Data centers and energy consumption

Pfeiffer, E. Wired - the true cost of generative ai: Data centers and energy consumption. [LINK] https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/, 2023

work page 2023
[69]

W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. Robust speech recognition via large-scale weak supervision. In ICML, 2023

work page 2023
[70]

Zero-shot text-to-image generation

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In ICML, 2021

work page 2021
[71]

and Everitt, T

Richens, J. and Everitt, T. Robust agents learn causal world models. ICLR, 2024

work page 2024
[72]

G., Lyons, O., Summitt, A., Fatima, A., Pak, J., Shao, W., Chalmers, R., Englander, A., Staley, E

Rivera, C. G., Lyons, O., Summitt, A., Fatima, A., Pak, J., Shao, W., Chalmers, R., Englander, A., Staley, E. W., Wang, I., et al. Tanksworld: a multi-agent environment for ai safety research. arXiv, 2020

work page 2020
[73]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

work page 2022
[74]

C., Lupu, A., Hambro, E., Markosyan, A

Samvelyan, M., Raparthy, S. C., Lupu, A., Hambro, E., Markosyan, A. H., Bhatt, M., Mao, Y., Jiang, M., Parker-Holder, J., Foerster, J. N., Rockt \"a schel, T., and Raileanu, R. Rainbow teaming: Open-ended generation of diverse adversarial prompts. In NeurIPS, 2024

work page 2024
[75]

B., Rodriguez, A., Campbell, A., and Stanley, K

Secretan, J., Beato, N., D Ambrosio, D. B., Rodriguez, A., Campbell, A., and Stanley, K. O. Picbreeder: evolving pictures collaboratively online. In the SIGCHI conference on human factors in computing systems, 2008

work page 2008
[76]

Goal misgeneralization: Why correct specifications aren't enough for correct goals

Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J., and Kenton, Z. Goal misgeneralization: Why correct specifications aren't enough for correct goals. arXiv, 2022

work page 2022
[77]

Sigaud, O., Baldassarre, G., Colas, C., Doncieux, S., Duro, R., Oudeyer, P.-Y., Perrin-Gilbert, N., and Santucci, V. G. A definition of open-ended learning problems for goal-conditioned agents. arXiv, 2023

work page 2023
[78]

and Stanley, K

Soros, L. and Stanley, K. Identifying necessary conditions for open-ended evolution through the artificial life world of chromaria. In Artificial Life Conference Proceedings, pp.\ 793--800. MIT Press, 2014

work page 2014
[79]

B., Lehman, J., and Stanley, K

Soros, L. B., Lehman, J., and Stanley, K. O. Open-endedness: The last grand challenge you’ve never heard of, 2017

work page 2017
[80]

Stanley, K. O. Why open-endedness matters. Artificial life, 25 0 (3): 0 232--235, 2019

work page 2019

Showing first 80 references.

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[3] [3]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[4] [4]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[5] [5]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page

[6] [6]

L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report. arXiv, 2023

work page 2023

[7] [7]

Llm-poet: Evolving complex environments using large language models

Aki, F., Ikeda, R., Saito, T., Regan, C., and Oka, M. Llm-poet: Evolving complex environments using large language models. In the Genetic and Evolutionary Computation Conference Companion, 2024

work page 2024

[8] [8]

Evolutionary optimization of model merging recipes

Akiba, T., Shing, M., Tang, Y., Sun, Q., and Ha, D. Evolutionary optimization of model merging recipes. Nature Machine Intelligence, pp.\ 1--10, 2025

work page 2025

[9] [9]

and Bengio, Y

Alain, G. and Bengio, Y. Understanding intermediate layers using linear classifier probes. arXiv, 2018

work page 2018

[10] [10]

E., Fort, S., Lanham, T., Telleen-Lawton, T., Conerly, T., Henighan, T., Hume, T., Bowman, S

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., Kerr, J., Mueller, J., Ladish, J., Landau, J., Ndousse, K., Lukosuite, K., Lovitt, L., Sellitto, M., Elhage, N., Schiefer, N., ...

work page 2022

[11] [11]

Tell me about yourself: Llms are aware of their learned behaviors

Betley, J., Bao, X., Soto, M., Sztyber-Betley, A., Chua, J., and Evans, O. Tell me about yourself: Llms are aware of their learned behaviors. arXiv, 2025

work page 2025

[12] [12]

A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv, 2021

work page 2021

[13] [13]

B., Zhang, J., Oostermeijer, K., Bellagente, M., Clune, J., Stanley, K., Schott, G., and Lehman, J

Bradley, H., Dai, A., Teufel, H. B., Zhang, J., Oostermeijer, K., Bellagente, M., Clune, J., Stanley, K., Schott, G., and Lehman, J. Quality-diversity through AI feedback. In Second Agent Learning in Open-Endedness Workshop, 2023

work page 2023

[14] [14]

Brant, J. C. and Stanley, K. O. Minimal criterion coevolution: a new approach to open-ended search. In the Genetic and Evolutionary Computation Conference, 2017

work page 2017

[15] [15]

Video generation models as world simulators

Brooks, T., Peebles, B., Holmes, C., DePue, W., Guo, Y., Jing, L., Schnurr, D., Taylor, J., Luhman, T., Luhman, E., et al. Video generation models as world simulators. [LINK] https://openai. com/research/video-generation-modelsas-world-simulators, 2024

work page 2024

[16] [16]

D., Edwards, A., Parker-Holder, J., Shi, Y., Hughes, E., Lai, M., Mavalankar, A., Steigerwald, R., Apps, C., et al

Bruce, J., Dennis, M. D., Edwards, A., Parker-Holder, J., Shi, Y., Hughes, E., Lai, M., Mavalankar, A., Steigerwald, R., Apps, C., et al. Genie: Generative interactive environments. In ICML, 2024

work page 2024

[17] [17]

H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., et al

Burns, C., Izmailov, P., Kirchner, J. H., Baker, B., Gao, L., Aschenbrenner, L., Chen, Y., Ecoffet, A., Joglekar, M., Leike, J., et al. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv, 2023

work page 2023

[18] [18]

W., Lopez-Lopez, E., Hechtlinger, S., Rahwan, Z., Aeschbach, S., Bakker, M

Burton, J. W., Lopez-Lopez, E., Hechtlinger, S., Rahwan, Z., Aeschbach, S., Bakker, M. A., Becker, J. A., Berditchevskaia, A., Berger, J., Brinkmann, L., et al. How large language models can reshape collective intelligence. Nature human behaviour, 8 0 (9): 0 1643--1655, 2024

work page 2024

[19] [19]

Real-x--robot open-ended autonomous learning architectures: Achieving truly end-to-end sensorimotor autonomous learning systems

Cartoni, E., Montella, D., Triesch, J., and Baldassarre, G. Real-x--robot open-ended autonomous learning architectures: Achieving truly end-to-end sensorimotor autonomous learning systems. arXiv, 2020

work page 2020

[20] [20]

Real-x—robot open-ended autonomous learning architecture: Building truly end-to-end sensorimotor autonomous learning systems

Cartoni, E., Montella, D., Triesch, J., and Baldassarre, G. Real-x—robot open-ended autonomous learning architecture: Building truly end-to-end sensorimotor autonomous learning systems. Transactions on Cognitive and Developmental Systems, 15 0 (4): 0 2014--2030, 2023

work page 2014

[21] [21]

and Chavan, P

Chavan, P. and Chavan, P. Automation of ad-ohc dashbord and monitoring of cloud resources using genrative ai to reduce costing and enhance performance. In the IEEE International Conference on Innovations and Challenges in Emerging Technologies (ICICET), 2024

work page 2024

[22] [22]

Gamegen-x: Interactive open-world game video generation

Che, H., He, X., Liu, Q., Jin, C., and Chen, H. Gamegen-x: Interactive open-world game video generation. arXiv, 2024

work page 2024

[23] [23]

Supervising strong learners by amplifying weak experts

Christiano, P., Shlegeris, B., and Amodei, D. Supervising strong learners by amplifying weak experts. arXiv, 2018

work page 2018

[24] [24]

Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence

Clune, J. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv, 2019

work page 2019

[25] [25]

Sparse autoencoders find highly interpretable features in language models

Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. arXiv, 2023

work page 2023

[26] [26]

A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks

Dawood, M., Shokry, A., and Bennewitz, M. A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks. arXiv, 2024

work page 2024

[27] [27]

Y., Bhardwaj, R., and Poria, S

Deep Pala, T., Toh, V. Y., Bhardwaj, R., and Poria, S. Ferret: Faster and effective automated red teaming with reward-based scoring technique. arXiv, 2024

work page 2024

[28] [28]

K., Togelius, J., and Soros, L

Dharna, A., Hoover, A. K., Togelius, J., and Soros, L. Transfer dynamics in emergent evolutionary curricula, 2022

work page 2022

[29] [29]

L., Koch, J., Sharkey, L

Di Langosco, L. L., Koch, J., Sharkey, L. D., Pfau, J., and Krueger, D. Goal misgeneralization in deep reinforcement learning. In ICML, 2022

work page 2022

[30] [30]

Open questions in creating safe open-ended ai: tensions between control and creativity

Ecoffet, A., Clune, J., and Lehman, J. Open questions in creating safe open-ended ai: tensions between control and creativity. In Artificial Life Conference Proceedings 32, pp.\ 27--35. MIT Press, 2020

work page 2020

[31] [31]

Clha: A simple yet effective contrastive learning framework for human alignment

Fang, F., Zhu, L., Feng, X., Hou, J., Zhao, Q., Li, C., Hu, X., Xu, R., and Yang, M. Clha: A simple yet effective contrastive learning framework for human alignment. In the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024

work page 2024

[32] [32]

F., Akametalu, A

Fisac, J. F., Akametalu, A. K., Zeilinger, M. N., Kaynama, S., Gillula, J., and Tomlin, C. J. A general safety framework for learning-based control in uncertain robotic systems. Transactions on Automatic Control, 64 0 (7): 0 2737--2752, 2018

work page 2018

[33] [33]

and Cully, A

Flageat, M. and Cully, A. Uncertain quality-diversity: evaluation methodology and new methods for quality-diversity in uncertain domains. Transactions on Evolutionary Computation, 2023

work page 2023

[34] [34]

Exploring the performance-reproducibility trade-off in quality-diversity

Flageat, M., Janmohamed, H., Lim, B., and Cully, A. Exploring the performance-reproducibility trade-off in quality-diversity. arXiv, 2024

work page 2024

[35] [35]

Evaluating superhuman models with consistency checks

Fluri, L., Paleka, D., and Tram \`e r, F. Evaluating superhuman models with consistency checks. In SaTML, 2024

work page 2024

[36] [36]

and Fern \'a ndez, F

Garc a, J. and Fern \'a ndez, F. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16 0 (1): 0 1437--1480, 2015

work page 2015

[37] [37]

Y., Joglekar, M., Wallace, E., Jain, S., Barak, B., Helyar, A., Dias, R., Vallone, A., Ren, H., Wei, J., Chung, H

Guan, M. Y., Joglekar, M., Wallace, E., Jain, S., Barak, B., Helyar, A., Dias, R., Vallone, A., Ren, H., Wei, J., Chung, H. W., Toyer, S., Heidecke, J., Beutel, A., and Glaese, A. Deliberative alignment: Reasoning enables safer language models, 2025

work page 2025

[38] [38]

Han, V. T. Y., Bhardwaj, R., and Poria, S. Ruby teaming: Improving quality diversity search with memory for automated red teaming. arXiv, 2024

work page 2024

[39] [39]

Unsolved problems in ml safety

Hendrycks, D., Carlini, N., Schulman, J., and Steinhardt, J. Unsolved problems in ml safety. arXiv, 2021

work page 2021

[40] [40]

An overview of catastrophic ai risks

Hendrycks, D., Mazeika, M., and Woodside, T. An overview of catastrophic ai risks. arXiv, 2023

work page 2023

[41] [41]

Adaptive preference scaling for reinforcement learning with human feedback

Hong, I., Li, Z., Bukharin, A., Li, Y., Jiang, H., Yang, T., and Zhao, T. Adaptive preference scaling for reinforcement learning with human feedback. In NeurPS, 2024

work page 2024

[42] [42]

and Clune, J

Hu, S. and Clune, J. Thought cloning: Learning to think while acting by imitating human thinking. NeurIPS, 2024

work page 2024

[43] [43]

S., Yu, A

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. Large language models cannot self-correct reasoning yet. In ICLR, 2024

work page 2024

[44] [44]

Open-endedness is essential for artificial superhuman intelligence

Hughes, E., Dennis, M., Parker-Holder, J., Behbahani, F., Mavalankar, A., Shi, Y., Schaul, T., and Rocktaschel, T. Open-endedness is essential for artificial superhuman intelligence. ICML, 2024

work page 2024

[45] [45]

Reward learning from human preferences and demonstrations in atari

Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., and Amodei, D. Reward learning from human preferences and demonstrations in atari. In NeurIPS, 2018

work page 2018

[46] [46]

Ai safety via debate

Irving, G., Christiano, P., and Amodei, D. Ai safety via debate. arXiv, 2018

work page 2018

[47] [47]

Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.-C., Guo, Y., and Gao, W

Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng, K. Y., Dai, J., Pan, X., O'Gara, A., Lei, Y., Xu, H., Tse, B., Fu, J., McAleer, S., Yang, Y., Wang, Y., Zhu, S.-C., Guo, Y., and Gao, W. Ai alignment: A comprehensive survey, 2024

work page 2024

[48] [48]

General intelligence requires rethinking exploration

Jiang, M., Rockt \"a schel, T., and Grefenstette, E. General intelligence requires rethinking exploration. Royal Society Open Science, 10 0 (6): 0 230539, 2023

work page 2023

[49] [49]

The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars

Kafka, P. The automotive standard iso 26262, the innovative driver for enhanced safety assessment & technology for motor cars. Procedia Engineering, 45: 0 2--10, 2012

work page 2012

[50] [50]

When can llms actually correct their own mistakes? a critical survey of self-correction of llms

Kamoi, R., Zhang, Y., Zhang, N., Han, J., and Zhang, R. When can llms actually correct their own mistakes? a critical survey of self-correction of llms. Transactions of the Association for Computational Linguistics, 12: 0 1417--1440, 2024

work page 2024

[51] [51]

R., Rocktäschel, T., and Perez, E

Khan, A., Hughes, J., Valentine, D., Ruis, L., Sachan, K., Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Rocktäschel, T., and Perez, E. Debating with more persuasive llms leads to more truthful answers. In ICML, 2024

work page 2024

[52] [52]

Causal reasoning and large language models: Opening a new frontier for causality

K c man, E., Ness, R., Sharma, A., and Tan, C. Causal reasoning and large language models: Opening a new frontier for causality. TMLR, 2024

work page 2024

[53] [53]

Penalizing side effects using stepwise relative reachability

Krakovna, V., Orseau, L., Kumar, R., Martic, M., and Legg, S. Penalizing side effects using stepwise relative reachability. arXiv, 2018

work page 2018

[54] [54]

Specification gaming: the flip side of ai ingenuity

Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., and Legg, S. Specification gaming: the flip side of ai ingenuity. [LINK] https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/, 2020

work page 2020

[55] [55]

and Stanley, K

Lehman, J. and Stanley, K. O. Revising the evolutionary computation abstraction: minimal criteria novelty search. In the 12th annual conference on Genetic and evolutionary computation, 2010

work page 2010

[56] [56]

and Stanley, K

Lehman, J. and Stanley, K. O. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19 0 (2): 0 189--223, 2011

work page 2011

[57] [57]

Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., and Stanley, K. O. Evolution through large models. In Handbook of Evolutionary Machine Learning, pp.\ 331--366. Springer, 2023

work page 2023

[58] [58]

Leveson, N. G. Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press, 2012

work page 2012

[59] [59]

Li, D., Zhang, C., Dong, K., Deik, D. G. X., Tang, R., and Liu, Y. Aligning crowd feedback via distributional preference reward modeling. arXiv, 2024

work page 2024

[60] [60]

Large language models as evolutionary optimizers

Liu, S., Chen, C., Qu, X., Tang, K., and Ong, Y.-S. Large language models as evolutionary optimizers. In 2024 IEEE Congress on Evolutionary Computation (CEC), pp.\ 1--8. IEEE, 2024

work page 2024

[61] [61]

T., Foerster, J., Clune, J., and Ha, D

Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., and Ha, D. The ai scientist: Towards fully automated open-ended scientific discovery. arXiv, 2024

work page 2024

[62] [62]

R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., and Legg, S

Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., and Legg, S. Position: Levels of agi for operationalizing progress on the path to agi. In ICML, 2024

work page 2024

[63] [63]

K., Strouse, D., Sandholm, T., Salakhutdinov, R., Dragan, A., and McAleer, S

Moskovitz, T., Singh, A. K., Strouse, D., Sandholm, T., Salakhutdinov, R., Dragan, A., and McAleer, S. M. Confronting reward model overoptimization with constrained RLHF . In ICLR, 2024

work page 2024

[64] [64]

and Clune, J

Mouret, J.-B. and Clune, J. Illuminating search spaces by mapping elites. arXiv, 2015

work page 2015

[65] [65]

W., Teodorescu, L., Hayes, C

Nisioti, E., Glanois, C., Najarro, E., Dai, A., Meyerson, E., Pedersen, J. W., Teodorescu, L., Hayes, C. F., Sudhakaran, S., and Risi, S. From text to life: On the reciprocal relationship between artificial life and large language models. In Artificial Life Conference Proceedings 36, volume 2024, pp.\ 39. MIT Press, 2024

work page 2024

[66] [66]

A., Channon, A., Ikegami, T., Rasmussen, S., Stanley, K

Packard, N., Bedau, M. A., Channon, A., Ikegami, T., Rasmussen, S., Stanley, K. O., and Taylor, T. An overview of open-ended evolution: Editorial introduction to the open-ended evolution ii special issue. Artificial life, 25 0 (2): 0 93--103, 2019

work page 2019

[67] [67]

Carbon emissions and large neural network training

Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. Carbon emissions and large neural network training. arXiv, 2021

work page 2021

[68] [68]

Wired - the true cost of generative ai: Data centers and energy consumption

Pfeiffer, E. Wired - the true cost of generative ai: Data centers and energy consumption. [LINK] https://www.wired.com/story/true-cost-generative-ai-data-centers-energy/, 2023

work page 2023

[69] [69]

W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. Robust speech recognition via large-scale weak supervision. In ICML, 2023

work page 2023

[70] [70]

Zero-shot text-to-image generation

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In ICML, 2021

work page 2021

[71] [71]

and Everitt, T

Richens, J. and Everitt, T. Robust agents learn causal world models. ICLR, 2024

work page 2024

[72] [72]

G., Lyons, O., Summitt, A., Fatima, A., Pak, J., Shao, W., Chalmers, R., Englander, A., Staley, E

Rivera, C. G., Lyons, O., Summitt, A., Fatima, A., Pak, J., Shao, W., Chalmers, R., Englander, A., Staley, E. W., Wang, I., et al. Tanksworld: a multi-agent environment for ai safety research. arXiv, 2020

work page 2020

[73] [73]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

work page 2022

[74] [74]

C., Lupu, A., Hambro, E., Markosyan, A

Samvelyan, M., Raparthy, S. C., Lupu, A., Hambro, E., Markosyan, A. H., Bhatt, M., Mao, Y., Jiang, M., Parker-Holder, J., Foerster, J. N., Rockt \"a schel, T., and Raileanu, R. Rainbow teaming: Open-ended generation of diverse adversarial prompts. In NeurIPS, 2024

work page 2024

[75] [75]

B., Rodriguez, A., Campbell, A., and Stanley, K

Secretan, J., Beato, N., D Ambrosio, D. B., Rodriguez, A., Campbell, A., and Stanley, K. O. Picbreeder: evolving pictures collaboratively online. In the SIGCHI conference on human factors in computing systems, 2008

work page 2008

[76] [76]

Goal misgeneralization: Why correct specifications aren't enough for correct goals

Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J., and Kenton, Z. Goal misgeneralization: Why correct specifications aren't enough for correct goals. arXiv, 2022

work page 2022

[77] [77]

Sigaud, O., Baldassarre, G., Colas, C., Doncieux, S., Duro, R., Oudeyer, P.-Y., Perrin-Gilbert, N., and Santucci, V. G. A definition of open-ended learning problems for goal-conditioned agents. arXiv, 2023

work page 2023

[78] [78]

and Stanley, K

Soros, L. and Stanley, K. Identifying necessary conditions for open-ended evolution through the artificial life world of chromaria. In Artificial Life Conference Proceedings, pp.\ 793--800. MIT Press, 2014

work page 2014

[79] [79]

B., Lehman, J., and Stanley, K

Soros, L. B., Lehman, J., and Stanley, K. O. Open-endedness: The last grand challenge you’ve never heard of, 2017

work page 2017

[80] [80]

Stanley, K. O. Why open-endedness matters. Artificial life, 25 0 (3): 0 232--235, 2019

work page 2019