Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

Cyrus Cousins; Hoda Heidari; Jana Schaich Borg; Vijay Keswani; Vincent Conitzer; Walter Sinnott-Armstrong

read the original abstract

Recent AI trends seek to align AI models to learned human-centric objectives, such as personal preferences, utility, or societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, to which AI models are aligned. However, standard elicitation methods often fail to capture the cognitive processes behind human decision making, such as heuristics or simplifying structured thought patterns. To address this failure, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the literature analyzing cognitive processes that shape human decision-making, we derive a model class in which features are first processed with learned rules, then aggregated via a fixed rule, such as the Bradley-Terry rule, to produce a decision. This structured processing of information ensures that such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach by learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.

Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

discussion (0)