Toxicity in chatgpt: Analyzing persona-assigned language models , url =

Deshpande, Ameet, Murahari, Vishvak, Rajpurohit, Tanmay, Kalyan, Ashwin, Narasimhan, Karthik , booktitle = · 2023 · DOI 10.18653/v1/2023.findings-emnlp.88

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

cs.CL · 2026-05-11 · conditional · novelty 6.0

DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.

Mitigating LLM biases toward spurious social contexts using direct preference optimization

cs.AI · 2026-04-02 · unverdicted · novelty 6.0

Debiasing-DPO reduces bias to spurious social contexts by 84% and improves predictive accuracy by 52% on average for LLMs evaluating U.S. classroom transcripts.

Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations?

cs.CL · 2026-05-17 · unverdicted · novelty 5.0

LLMs assigned high or low status personas in multi-turn dialogues exhibit socio-cognitive effects including language coordination, pronoun patterns, persuasion success, and compliance with unsafe requests.

Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

cs.AI · 2024-08-23 · unverdicted · novelty 4.0

The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.

DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

cs.CL · 2026-05-21 · unverdicted · novelty 3.0

Activation steering with FLORES-derived language vectors produces modest, layer-sensitive and language-dependent gains on cultural awareness tasks, with some settings degrading performance and strong interaction with prompt design.

citing papers explorer

Showing 7 of 7 citing papers.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 109
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement cs.CL · 2026-05-11 · conditional · none · ref 10
DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.
Mitigating LLM biases toward spurious social contexts using direct preference optimization cs.AI · 2026-04-02 · unverdicted · none · ref 6
Debiasing-DPO reduces bias to spurious social contexts by 84% and improves predictive accuracy by 52% on average for LLMs evaluating U.S. classroom transcripts.
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations? cs.CL · 2026-05-17 · unverdicted · none · ref 92
LLMs assigned high or low status personas in multi-turn dialogues exhibit socio-cognitive effects including language coordination, pronoun patterns, persuasion success, and compliance with unsafe requests.
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations cs.AI · 2026-05-12 · unverdicted · none · ref 4
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions cs.AI · 2024-08-23 · unverdicted · none · ref 174
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.
DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge cs.CL · 2026-05-21 · unverdicted · none · ref 15
Activation steering with FLORES-derived language vectors produces modest, layer-sensitive and language-dependent gains on cultural awareness tasks, with some settings degrading performance and strong interaction with prompt design.

Toxicity in chatgpt: Analyzing persona-assigned language models , url =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer