Joe Benton
Identifiers
- name variant Joe Benton 0.60 · backfill
Papers (8)
- Diffuse AI Control on Fuzzy Tasks cs.LG · 2026 · author #4
- Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning cs.LG · 2026 · author #2
- SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors cs.CR · 2026 · author #5
- Efficiently Aligning Language Models with Online Natural Language Feedback cs.LG · 2026 · author #2
- Removing Sandbagging in LLMs by Training with Weak Supervision cs.LG · 2026 · author #4
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety cs.AI · 2025 · author #5
- Reasoning Models Don't Always Say What They Think cs.CL · 2025 · author #2
- Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming cs.CL · 2025 · author #13
Mentions
- 2606.08892 #4 · arxiv_oai · confidence 0.70 Joe Benton
- 2605.04356 #2 · arxiv_oai · confidence 0.70 Joe Benton
- 2605.24286 #2 · arxiv_oai · confidence 0.70 Joe Benton
- 2507.11473 #5 · arxiv_oai · confidence 0.70 Joe Benton
- 2605.16626 #5 · arxiv_oai · confidence 0.70 Joe Benton
- 2501.18837 #13 · arxiv_oai · confidence 0.70 Joe Benton
Frequent Coauthors
- Ethan Perez 3 shared papers
- Fabien Roger 3 shared papers
- Jan Leike 2 shared papers
- Jared Kaplan 2 shared papers
- Samuel R. Bowman 2 shared papers
- Vivek Hebbar 2 shared papers
- Vlad Mikulik 2 shared papers
- Alan Cooney 1 shared papers
- Aleksander M\k{a}dry 1 shared papers
- Alex Silverstein 1 shared papers
- Allan Dafoe 1 shared papers
- Alwin Peng 1 shared papers
- Amanda Askell 1 shared papers
- Anca Dragan 1 shared papers
- Andy Dau 1 shared papers
- Anjali Gopal 1 shared papers
- Ansh Radhakrishnan 1 shared papers
- Arushi Somani 1 shared papers
- Bowen Baker 1 shared papers
- Buck Shlegeris 1 shared papers