arXiv [preprint]

Jasper Götting, Pedro Medeiros, Jon G Sanders, Nathaniel Li, Long Phan, Karam Elabd, Lennart Justen, Dan Hendrycks, Seth Donoughe · 2025 · arXiv 2504.16137

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

cs.SE · 2026-05-20 · conditional · novelty 8.0

RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.

Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches

cs.AI · 2026-05-31 · unverdicted · novelty 6.0 · 2 refs

A survey of RLM use in 28 disciplines reveals uneven adoption and introduces a maturity assessment framework showing larger gaps when limited to public resources.

BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists

q-bio.OT · 2026-04-30 · unverdicted · novelty 6.0

Agentic biological AI systems like Biomni and K-Dense assist with dual-use tasks blocked by safeguards and gain performance uplift on WMDP proxies; BioVeil MATRIX is introduced as a 10-category taxonomy with 22 techniques to categorize and red-team AI-enabled biosecurity risks.

An Independent Safety Evaluation of Kimi K2.5

cs.CR · 2026-04-03 · conditional · novelty 6.0

Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.

Risk Reporting for Developers' Internal AI Model Use

cs.CY · 2026-04-27 · unverdicted · novelty 4.0

A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.

Muse Spark Safety & Preparedness Report

cs.CY · 2026-05-14 · unverdicted · novelty 2.0

Meta's safety report states that Muse Spark meets acceptable risk thresholds for release after mitigations reduced elevated pre-mitigation risks in chemical and biological domains.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches cs.AI · 2026-05-31 · unverdicted · none · ref 97 · 2 links
A survey of RLM use in 28 disciplines reveals uneven adoption and introduces a maturity assessment framework showing larger gaps when limited to public resources.
BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists q-bio.OT · 2026-04-30 · unverdicted · none · ref 31
Agentic biological AI systems like Biomni and K-Dense assist with dual-use tasks blocked by safeguards and gain performance uplift on WMDP proxies; BioVeil MATRIX is introduced as a 10-category taxonomy with 22 techniques to categorize and red-team AI-enabled biosecurity risks.
Risk Reporting for Developers' Internal AI Model Use cs.CY · 2026-04-27 · unverdicted · none · ref 13
A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.
Muse Spark Safety & Preparedness Report cs.CY · 2026-05-14 · unverdicted · none · ref 13
Meta's safety report states that Muse Spark meets acceptable risk thresholds for release after mitigations reduced elevated pre-mitigation risks in chemical and biological domains.

arXiv [preprint]

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer