pith. sign in

Mantas Mazeika

Identifiers

  • name variant Mantas Mazeika 0.60 · backfill

Papers (10)

  1. Humanity's Last Exam cs.LG · 2025 · author #19
  2. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning cs.LG · 2024 · author #29
  3. HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal cs.LG · 2024 · author #1
  4. Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #9
  5. An Overview of Catastrophic AI Risks cs.CY · 2023 · author #2
  6. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models cs.CL · 2022 · author #246
  7. Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #4
  8. Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #5
  9. Deep Anomaly Detection with Outlier Exposure cs.LG · 2018 · author #2
  10. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise cs.LG · 2018 · author #2

Mentions

  • 2306.12001 #2 · arxiv_oai · confidence 0.70 Mantas Mazeika
  • 2403.03218 #29 · arxiv_oai · confidence 0.70 Mantas Mazeika

Frequent Coauthors