Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.
Helm, and Carey E
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
DKPS-based methods predict new model benchmark scores using cached responses, matching baseline mean absolute error with substantially fewer queries and an offline query selection approach.
Behavioral geometry of model populations enables high-accuracy jailbreak susceptibility prediction and defense transfer with 98% fewer evaluations.
Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against adversaries.
citing papers explorer
-
Black-box model classification under the discriminative factorization
Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.
-
Query-efficient model evaluation using cached responses
DKPS-based methods predict new model benchmark scores using cached responses, matching baseline mean absolute error with substantially fewer queries and an offline query selection approach.
-
Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models
Behavioral geometry of model populations enables high-accuracy jailbreak susceptibility prediction and defense transfer with 98% fewer evaluations.
-
Control Charts for Multi-agent Systems
Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against adversaries.