WMDP is a public benchmark measuring hazardous LLM knowledge across biosecurity, cybersecurity, and chemical security, paired with RMU unlearning that reduces WMDP performance without degrading general capabilities.
Does this advance safety along with, or as a consequence of, advancing other capabilities or the study of AI? □ 30 E.3 Elaborations and Other Considerations
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
WMDP is a public benchmark measuring hazardous LLM knowledge across biosecurity, cybersecurity, and chemical security, paired with RMU unlearning that reduces WMDP performance without degrading general capabilities.