Zifan Wang

2works
2Pith-reviewed
100.0%Recognition coverage
0queued

works

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Pith 2024 · cs.LG · verdict UNVERDICTED · 118 Pith citing
Universal and Transferable Adversarial Attacks on Aligned Language Models Pith 2023 · cs.CL · verdict ACCEPT · 329 Pith citing