Mantas Mazeika
Identifiers
- name variant Mantas Mazeika 0.60 · backfill
Papers (10)
- Humanity's Last Exam cs.LG · 2025 · author #19
- The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024 · author #29
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal cs.LG · 2024 · author #1
- Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #9
- An Overview of Catastrophic AI Risks cs.CY · 2023 · author #2
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models cs.CL · 2022 · author #246
- Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #4
- Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #5
- Deep Anomaly Detection with Outlier Exposure cs.LG · 2018 · author #2
- Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise cs.LG · 2018 · author #2
Mentions
- 2306.12001 #2 · arxiv_oai · confidence 0.70 Mantas Mazeika
- 2403.03218 #29 · arxiv_oai · confidence 0.70 Mantas Mazeika
Frequent Coauthors
- Dan Hendrycks 10 shared papers
- Andy Zou 6 shared papers
- Steven Basart 5 shared papers
- Dawn Song 4 shared papers
- Long Phan 4 shared papers
- Nathaniel Li 4 shared papers
- Oliver Zhang 3 shared papers
- Zifan Wang 3 shared papers
- Adam Khoja 2 shared papers
- Alexander Pan 2 shared papers
- Alexandr Wang 2 shared papers
- Alice Gatti 2 shared papers
- Ann-Kathrin Dombrowski 2 shared papers
- Collin Burns 2 shared papers
- Damien Sileo 2 shared papers
- Eric Chu 2 shared papers
- Evgenii Zheltonozhskii 2 shared papers
- Genta Indra Winata 2 shared papers
- Jacob Steinhardt 2 shared papers
- James Koppel 2 shared papers