pith. sign in

hub

The Alignment Problem from a Deep Learning Perspective , May 2025

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

citation-role summary

background 3 other 1

citation-polarity summary

polarities

background 3 unclear 1

clear filters

representative citing papers

Understanding Goal Generalisation in Sequential Reinforcement Learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable evolution to predict OOD behavior from training history.

Safety, Security, and Cognitive Risks in World Models

cs.CR · 2026-04-01 · unverdicted · novelty 6.0

World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.

Scaling Laws for Reward Model Overoptimization

cs.LG · 2022-10-19 · unverdicted · novelty 6.0

Synthetic measurements show that gold-standard performance degrades according to distinct functional forms when optimizing proxy reward models via RL or best-of-n, with coefficients scaling smoothly by reward model parameter count.

The Agentic Web Requires New Normative Infrastructure

cs.CY · 2026-06-09 · unverdicted · novelty 3.0

The agentic web requires new normative infrastructure of laws, norms, and practices to allow user-delegated AI agents to access online properties without being blocked as malicious bots.

citing papers explorer

Showing 2 of 2 citing papers after filters.