AI Risk Management Should Incorporate Both Safety and Security

Arvind Narayanan; Bo Li; Boyi Wei; Chaowei Xiao; Danqi Chen; Dawn Song; Edoardo Debenedetti; Jeffrey Ding; Jiaqi Ma; Jonas Geiping

arxiv: 2405.19524 · v1 · pith:P5GXBBZInew · submitted 2024-05-29 · 💻 cs.CR · cs.AI

AI Risk Management Should Incorporate Both Safety and Security

Xiangyu Qi , Yangsibo Huang , Yi Zeng , Edoardo Debenedetti , Jonas Geiping , Luxi He , Kaixuan Huang , Udari Madhushani

show 17 more authors

Vikash Sehwag Weijia Shi Boyi Wei Tinghao Xie Danqi Chen Pin-Yu Chen Jeffrey Ding Ruoxi Jia Jiaqi Ma Arvind Narayanan Weijie J Su Mengdi Wang Chaowei Xiao Bo Li Dawn Song Peter Henderson Prateek Mittal

This is my paper

classification 💻 cs.CR cs.AI

keywords securityrisksafetymanagementinterplayacrosscommunitiesdisciplines

0 comments

read the original abstract

The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Designing Incident Reporting Systems for Harms from General-Purpose AI
cs.CY 2025-11 conditional novelty 4.0

A framework with seven dimensions for AI incident reporting systems is developed from literature and case studies in safety-critical industries to guide institutional design choices.