Zifan Wang
- 2works
- 2Pith-reviewed
- 100.0%Recognition coverage
- 0queued
works
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Pith 2024 · cs.LG · verdict UNVERDICTED · 118 Pith citing
- Universal and Transferable Adversarial Attacks on Aligned Language Models Pith 2023 · cs.CL · verdict ACCEPT · 329 Pith citing