pith. sign in

A survey on model extraction attacks and defenses for large language models

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CR 2

years

2026 2

roles

background 1

polarities

background 1

representative citing papers

Safety, Security, and Cognitive Risks in World Models

cs.CR · 2026-04-01 · unverdicted · novelty 6.0

World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.

citing papers explorer

Showing 2 of 2 citing papers.