Long Phan

2works
2Pith-reviewed
100.0%Recognition coverage
0queued

works

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Pith 2024 · cs.LG · verdict UNVERDICTED · 108 Pith citing
Representation Engineering: A Top-Down Approach to AI Transparency Pith 2023 · cs.LG · verdict UNVERDICTED · 149 Pith citing