pith. sign in

Paul Christiano

Identifiers

  • name variant Paul Christiano 0.60 · backfill

Papers (19)

  1. Estimating the expected output of wide random MLPs more efficiently than sampling cs.LG · 2026 · author #6
  2. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training cs.CR · 2024 · author #31
  3. Training language models to follow instructions with human feedback cs.CL · 2022 · author #18
  4. Recursively Summarizing Books with Human Feedback cs.CL · 2021 · author #7
  5. Learning to summarize from human feedback cs.CL · 2020 · author #9
  6. Fine-Tuning Language Models from Human Preferences cs.CL · 2019 · author #7
  7. Supervising strong learners by amplifying weak experts cs.LG · 2018 · author #1
  8. Unrestricted Adversarial Examples stat.ML · 2018 · author #5
  9. AI safety via debate stat.ML · 2018 · author #2
  10. Deep reinforcement learning from human preferences stat.ML · 2017 · author #1
  11. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models cs.LG · 2016 · author #2
  12. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model cs.RO · 2016 · author #1
  13. Concrete Problems in AI Safety cs.AI · 2016 · author #4
  14. Theano: A Python framework for fast computation of mathematical expressions cs.SC · 2016 · author #24
  15. Collaborative prediction with expert advice cs.LG · 2016 · author #1
  16. Provably Manipulation-Resistant Reputation Systems cs.GT · 2014 · author #1
  17. Online Local Learning via Semidefinite Programming cs.LG · 2014 · author #1
  18. Quantum Money from Hidden Subspaces quant-ph · 2012 · author #2
  19. Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs cs.DS · 2010 · author #1

Mentions

  • 1203.4740 #2 · backfill · confidence 0.70 Paul Christiano
  • 2605.05179 #6 · arxiv_oai · confidence 0.70 Paul Christiano
  • 1010.2921 #1 · backfill · confidence 0.70 Paul Christiano
  • 2009.01325 #9 · arxiv_oai · confidence 0.70 Paul Christiano
  • 2109.10862 #7 · arxiv_oai · confidence 0.70 Paul Christiano
  • 1706.03741 #1 · arxiv_oai · confidence 0.70 Paul Christiano

Frequent Coauthors