pith. sign in

arxiv: 2504.12335 · v1 · pith:U6AX37ZCnew · submitted 2025-04-14 · 💻 cs.CL · cs.AI

You've Changed: Detecting Modification of Black-Box Large Language Models

classification 💻 cs.CL cs.AI
keywords modelsapproachfeatureslanguagetextchangedchangesdetect
0
0 comments X
read the original abstract

Large Language Models (LLMs) are often provided as a service via an API, making it challenging for developers to detect changes in their behavior. We present an approach to monitor LLMs for changes by comparing the distributions of linguistic and psycholinguistic features of generated text. Our method uses a statistical test to determine whether the distributions of features from two samples of text are equivalent, allowing developers to identify when an LLM has changed. We demonstrate the effectiveness of our approach using five OpenAI completion models and Meta's Llama 3 70B chat model. Our results show that simple text features coupled with a statistical test can distinguish between language models. We also explore the use of our approach to detect prompt injection attacks. Our work enables frequent LLM change monitoring and avoids computationally expensive benchmark evaluations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Referential Security as a New Paradigm for AI Evaluations

    cs.CR 2026-05 unverdicted novelty 5.0

    Proposes referential security as a paradigm for AI evaluations that reframes model identity as verifiable to support reproducible audits and regulatory decisions despite system changes.