pith. sign in

Jacob Steinhardt

Identifiers

  • name variant Jacob Steinhardt 0.60 · backfill

Papers (33)

  1. Log analysis is necessary for credible evaluation of AI agents cs.AI · 2026 · author #10
  2. ADAG: Automatically Describing Attribution Graphs cs.CL · 2026 · author #3
  3. Jailbroken: How Does LLM Safety Training Fail? cs.LG · 2023 · author #3
  4. Eliciting Latent Predictions from Transformers with the Tuned Lens cs.LG · 2023 · author #8
  5. Progress measures for grokking via mechanistic interpretability cs.LG · 2023 · author #5
  6. Discovering Latent Knowledge in Language Models Without Supervision cs.CL · 2022 · author #4
  7. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small cs.LG · 2022 · author #5
  8. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models cs.LG · 2022 · author #3
  9. Unsolved Problems in ML Safety cs.LG · 2021 · author #4
  10. Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #11
  11. Measuring Mathematical Problem Solving With the MATH Dataset cs.LG · 2021 · author #8
  12. Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #7
  13. Aligning AI With Shared Human Values cs.CY · 2020 · author #7
  14. Transfer of Adversarial Robustness Between Perturbation Types cs.LG · 2019 · author #5
  15. FrAngel: Component-Based Synthesis with Control Structures cs.PL · 2018 · author #2
  16. Semidefinite relaxations for certifying robustness to adversarial examples cs.LG · 2018 · author #2
  17. Troubling Trends in Machine Learning Scholarship stat.ML · 2018 · author #2
  18. Sever: A Robust Meta-Algorithm for Stochastic Optimization cs.LG · 2018 · author #5
  19. Better Agnostic Clustering Via Relaxed Tensor Norms cs.LG · 2017 · author #2
  20. Certified Defenses for Data Poisoning Attacks cs.LG · 2017 · author #1
  21. Does robustness imply tractability? A lower bound for planted clique in the semi-random model cs.CC · 2017 · author #1
  22. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers cs.LG · 2017 · author #1
  23. Learning from Untrusted Data cs.LG · 2016 · author #2
  24. Concrete Problems in AI Safety cs.AI · 2016 · author #3
  25. Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction cs.HC · 2016 · author #1
  26. Unsupervised Risk Estimation Using Only Conditional Independence Structure cs.LG · 2016 · author #1
  27. Learning Fast-Mixing Models for Structured Prediction cs.LG · 2015 · author #1
  28. Reified Context Models cs.LG · 2015 · author #1
  29. The Statistics of Streaming Sparse Regression math.ST · 2014 · author #1
  30. Permutations with Ascending and Descending Blocks math.CO · 2009 · author #1
  31. Derangements with Ascending and Descending Blocks math.CO · 2009 · author #1
  32. On Coloring the Odd-Distance Graph math.CO · 2009 · author #1
  33. Cayley graphs formed by conjugate generating sets of S_n math.CO · 2007 · author #1

Mentions

  • 2008.02275 #7 · arxiv_oai · confidence 0.70 Jacob Steinhardt
  • 2201.03544 #3 · arxiv_oai · confidence 0.70 Jacob Steinhardt
  • 2109.13916 #4 · arxiv_oai · confidence 0.70 Jacob Steinhardt
  • 0908.4347 #1 · backfill · confidence 0.70 Jacob Steinhardt
  • 0908.3330 #1 · backfill · confidence 0.70 Jacob Steinhardt
  • 0908.1452 #1 · backfill · confidence 0.70 Jacob Steinhardt
  • 2212.03827 #4 · arxiv_oai · confidence 0.70 Jacob Steinhardt
  • 0711.3057 #1 · backfill · confidence 0.70 Jacob Steinhardt

Frequent Coauthors