Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , pages =

Goel, Naman, Yaghini, Mohammad, Faltings, Boi , title = · 2018 · arXiv 8721.327872

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Introduces a fairness layer for deep learning models that guarantees output parity and an online primal-dual algorithm for aggregate fairness guarantees in streaming predictions with small batch sizes.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

Ethical and social risks of harm from Language Models

cs.CL · 2021-12-08 · accept · novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.

On the Opportunities and Risks of Foundation Models

cs.LG · 2021-08-16 · accept · novelty 6.0

Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.

IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.

How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

cs.HC · 2025-11-10 · unverdicted · novelty 5.0

Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.

citing papers explorer

Showing 8 of 8 citing papers.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 136
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning cs.LG · 2026-05-16 · unverdicted · none · ref 71
Introduces a fairness layer for deep learning models that guarantees output parity and an online primal-dual algorithm for aggregate fairness guarantees in streaming predictions with small batch sizes.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model cs.CL · 2022-11-09 · unverdicted · none · ref 6
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 37
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
Ethical and social risks of harm from Language Models cs.CL · 2021-12-08 · accept · none · ref 67
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
On the Opportunities and Risks of Foundation Models cs.LG · 2021-08-16 · accept · none · ref 7
Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.
IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language cs.CL · 2026-04-17 · unverdicted · none · ref 26
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.
How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape cs.HC · 2025-11-10 · unverdicted · none · ref 27
Generative AI boosts attackers' ability to create harmful content at scale while also enabling defenders to detect threats, support users, and improve moderation processes.

Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , pages =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer