A comparative statistical framework is proposed to audit proprietary alignment in black-box LLMs by quantifying behavioral divergences from reference models rather than absolute correctness.
Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, and Jonathan Cohen
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Auditing Proprietary Alignment in Large Language Models: A Comparative Framework Without a Ground-Truth Standard
A comparative statistical framework is proposed to audit proprietary alignment in black-box LLMs by quantifying behavioral divergences from reference models rather than absolute correctness.