This happens with probability (1 − β) · (qd m,c − qa m,c)

The current problem is not backdoored, Blue chooses ⟨Defer⟩, the game continues withc audit coins

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols

cs.AI · 2024-09-12 · unverdicted · novelty 7.0

Introduces AI-Control Games as a formal decision model for AI deployment red-teaming, with reductions to zero-sum POSGs for synthesizing Pareto-optimal trusted monitoring protocols and analysis of modeling assumptions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols cs.AI · 2024-09-12 · unverdicted · none · ref 25
Introduces AI-Control Games as a formal decision model for AI deployment red-teaming, with reductions to zero-sum POSGs for synthesizing Pareto-optimal trusted monitoring protocols and analysis of modeling assumptions.

This happens with probability (1 − β) · (qd m,c − qa m,c)

fields

years

verdicts

representative citing papers

citing papers explorer