pith. machine review for the scientific record. sign in

arxiv: 2502.05206 · v6 · submitted 2025-02-02 · 💻 cs.CR · cs.AI· cs.CL· cs.CV

Recognition: unknown

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

Authors on Pith no claims yet
classification 💻 cs.CR cs.AIcs.CLcs.CV
keywords modelssafetyattackslargecomprehensivedatadefensemodel
0
0 comments X
read the original abstract

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

    cs.CR 2026-04 unverdicted novelty 7.0

    ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisone...

  2. Safety, Security, and Cognitive Risks in World Models

    cs.CR 2026-04 unverdicted novelty 6.0

    World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and D...

  3. SoK: Robustness in Large Language Models against Jailbreak Attacks

    cs.CR 2026-05 accept novelty 5.0

    The paper taxonomizes jailbreak attacks and defenses for LLMs, introduces the Security Cube multi-dimensional evaluation framework, benchmarks 13 attacks and 5 defenses, and identifies open challenges in LLM robustness.