Scaling Laws for Moral Machine Judgment in Large Language Models

· 2026 · cs.CY · arXiv 2601.17637

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (size$\times$reasoning interaction: $p = 0.024$). The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.

representative citing papers

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

cs.AI · 2026-04-29 · unverdicted · novelty 5.0

LLMs for robotic health attendant control violate safety rules in 54.4% of harmful scenarios on average, with proprietary models at 23.7% median violation versus 72.8% for open-weight models, indicating they are not yet safe for clinical use.

citing papers explorer

Showing 1 of 1 citing paper.

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control cs.AI · 2026-04-29 · unverdicted · none · ref 25 · internal anchor
LLMs for robotic health attendant control violate safety rules in 54.4% of harmful scenarios on average, with proprietary models at 23.7% median violation versus 72.8% for open-weight models, indicating they are not yet safe for clinical use.

Scaling Laws for Moral Machine Judgment in Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer