Alexandra Souly
Identifiers
- name variant Alexandra Souly 0.60 · backfill
Papers (4)
- Evaluating whether AI models would sabotage AI safety research cs.AI · 2026 · author #2
- Seven simple steps for log analysis in AI systems cs.AI · 2026 · author #5
- AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents cs.LG · 2024 · author #2
- A StrongREJECT for Empty Jailbreaks cs.LG · 2024 · author #1
Mentions
- 2402.10260 #1 · arxiv_oai · confidence 0.70 Alexandra Souly
Frequent Coauthors
- Jerome Wynne 2 shared papers
- Xander Davies 2 shared papers
- Abby D'Cruz 1 shared papers
- Andy Zou 1 shared papers
- Charles Teague 1 shared papers
- Cozmin Ududec 1 shared papers
- Dan Hendrycks 1 shared papers
- Derek Duenas 1 shared papers
- Dillon Bowen 1 shared papers
- Ekin Zorer 1 shared papers
- Elvis Hsieh 1 shared papers
- Eric Patey 1 shared papers
- Eric Winsor 1 shared papers
- Harry Coppock 1 shared papers
- JJ Allaire 1 shared papers
- Joe Skinner 1 shared papers
- Jose Hernandez-Orallo 1 shared papers
- Justin Svegliato 1 shared papers
- Justin Wang 1 shared papers
- Kai Fronsdal 1 shared papers