RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.
arXiv [preprint]
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
dataset 1polarities
use dataset 1representative citing papers
A survey of RLM use in 28 disciplines reveals uneven adoption and introduces a maturity assessment framework showing larger gaps when limited to public resources.
Agentic biological AI systems like Biomni and K-Dense assist with dual-use tasks blocked by safeguards and gain performance uplift on WMDP proxies; BioVeil MATRIX is introduced as a 10-category taxonomy with 22 techniques to categorize and red-team AI-enabled biosecurity risks.
Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.
Meta's safety report states that Muse Spark meets acceptable risk thresholds for release after mitigations reduced elevated pre-mitigation risks in chemical and biological domains.
citing papers explorer
-
Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches
A survey of RLM use in 28 disciplines reveals uneven adoption and introduces a maturity assessment framework showing larger gaps when limited to public resources.
-
BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists
Agentic biological AI systems like Biomni and K-Dense assist with dual-use tasks blocked by safeguards and gain performance uplift on WMDP proxies; BioVeil MATRIX is introduced as a 10-category taxonomy with 22 techniques to categorize and red-team AI-enabled biosecurity risks.
-
Risk Reporting for Developers' Internal AI Model Use
A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.
-
Muse Spark Safety & Preparedness Report
Meta's safety report states that Muse Spark meets acceptable risk thresholds for release after mitigations reduced elevated pre-mitigation risks in chemical and biological domains.