pith. machine review for the scientific record. sign in

hub Canonical reference

Artificial Intelligence, Values and Alignment

Canonical reference. 86% of citing Pith papers cite this work as background.

16 Pith papers citing it
572 external citations · Crossref
Background 86% of classified citations

hub tools

citation-role summary

background 7

citation-polarity summary

roles

background 7

polarities

background 6 support 1

representative citing papers

A Roadmap to Pluralistic Alignment

cs.AI · 2024-02-07 · unverdicted · novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

How Value Induction Reshapes LLM Behaviour

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

Inducing targeted values in LLMs through fine-tuning causes spillover to related or opposing values, boosts safety metrics, and increases anthropomorphic and sycophantic language across all tested values.

Open Problems in Frontier AI Risk Management

cs.LG · 2026-04-28 · unverdicted · novelty 3.0

The paper maps unresolved challenges in frontier AI risk management, classifies them into lack of consensus, framework misalignment, or implementation shortfalls, and identifies actors best positioned to address each.

citing papers explorer

Showing 16 of 16 citing papers.