pith. sign in

hub Mixed citations

Categorizing Variants of Goodhart's Law

Mixed citation behavior. Most common role is background (60%).

17 Pith papers citing it
Background 60% of classified citations
abstract

There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.

hub tools

citation-role summary

background 4 method 1

citation-polarity summary

representative citing papers

Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions

cs.CY · 2026-05-21 · conditional · novelty 6.0

Healthcare LLM benchmarks overlook implicit assumptions about user behavior that split into task assumptions testable from conversation data and outcome assumptions requiring behavioral studies, shown by reanalyzing an RCT where both gaps are roughly equal.

SARC: A Governance-by-Architecture Framework for Agentic AI Systems

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

SARC compiles constraint specifications into Pre-Action Gate, Action-Time Monitor, Post-Action Auditor, and Escalation Router components, achieving zero hard violations and 89.5% fewer soft overages than policy-as-code baselines in synthetic procurement evaluations.

Automated alignment is harder than you think

cs.AI · 2026-05-07 · conditional · novelty 6.0

AI agents automating alignment research are prone to systematic undetected errors in fuzzy tasks, leading to overconfident but flawed safety assessments even without deliberate sabotage.

citing papers explorer

Showing 17 of 17 citing papers.