Angelina Wang, Erin Beeghly, Sanmi Koyejo, and Daniel E

Exploring safety-utility trade-offs in personalized language models , author= · arXiv 2406.11107

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Persona Non Grata: LLM Persona-Driven Generations in MCQA are Unstable in Distinct Dimensions

cs.CL · 2026-07-01 · unverdicted · novelty 6.0

Persona-driven generations by LLMs in MCQA tasks exhibit instability that differs systematically by model family, size, domain, and prompt format.

Mitigating LLM biases toward spurious social contexts using direct preference optimization

cs.AI · 2026-04-02 · unverdicted · novelty 6.0

Debiasing-DPO reduces bias to spurious social contexts by 84% and improves predictive accuracy by 52% on average for LLMs evaluating U.S. classroom transcripts.

Discriminatory Compliance: How LLMs Answer Queries from Protected Groups

cs.CY · 2026-06-19 · unverdicted · novelty 4.0

State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Persona Non Grata: LLM Persona-Driven Generations in MCQA are Unstable in Distinct Dimensions cs.CL · 2026-07-01 · unverdicted · none · ref 21
Persona-driven generations by LLMs in MCQA tasks exhibit instability that differs systematically by model family, size, domain, and prompt format.
Mitigating LLM biases toward spurious social contexts using direct preference optimization cs.AI · 2026-04-02 · unverdicted · none · ref 29
Debiasing-DPO reduces bias to spurious social contexts by 84% and improves predictive accuracy by 52% on average for LLMs evaluating U.S. classroom transcripts.
Discriminatory Compliance: How LLMs Answer Queries from Protected Groups cs.CY · 2026-06-19 · unverdicted · none · ref 28
State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.

Angelina Wang, Erin Beeghly, Sanmi Koyejo, and Daniel E

fields

years

verdicts

representative citing papers

citing papers explorer