pith. sign in

Evaluating the paperclip maximizer: Are rl-based language models more likely to pursue instrumental goals?

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 2 2025 1

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Understanding Large Language Models

cs.CL · 2026-07-01 · unverdicted · novelty 2.0

The paper reviews Transformer architecture, emergent LLM capabilities resembling cognition, explainable AI methods, and argues against both anthropomorphism and overly reductive views of LLM behavior as mere memorization.

citing papers explorer

Showing 1 of 1 citing paper after filters.