Virology capabilities test (vct): A multimodal virology q&a benchmark

URL https://arxiv · 2025 · arXiv 2504.16137

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

cs.SE · 2026-05-20 · conditional · novelty 8.0

RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.

BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists

q-bio.OT · 2026-04-30 · unverdicted · novelty 6.0

Agentic biological AI systems like Biomni and K-Dense assist with dual-use tasks blocked by safeguards and gain performance uplift on WMDP proxies; BioVeil MATRIX is introduced as a 10-category taxonomy with 22 techniques to categorize and red-team AI-enabled biosecurity risks.

An Independent Safety Evaluation of Kimi K2.5

cs.CR · 2026-04-03 · conditional · novelty 6.0

Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.

Risk Reporting for Developers' Internal AI Model Use

cs.CY · 2026-04-27 · unverdicted · novelty 4.0

A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.

citing papers explorer

Showing 4 of 4 citing papers.

RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts cs.SE · 2026-05-20 · conditional · none · ref 20
RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.
BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists q-bio.OT · 2026-04-30 · unverdicted · none · ref 31
Agentic biological AI systems like Biomni and K-Dense assist with dual-use tasks blocked by safeguards and gain performance uplift on WMDP proxies; BioVeil MATRIX is introduced as a 10-category taxonomy with 22 techniques to categorize and red-team AI-enabled biosecurity risks.
An Independent Safety Evaluation of Kimi K2.5 cs.CR · 2026-04-03 · conditional · none · ref 20
Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
Risk Reporting for Developers' Internal AI Model Use cs.CY · 2026-04-27 · unverdicted · none · ref 13
A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.

Virology capabilities test (vct): A multimodal virology q&a benchmark

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer