LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
and Wallach, Hanna and Cotterell, Ryan
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
H-SAL erases latent concepts from text profiles using self-descriptions as implicit debiasing signals and shows competitive performance on a new multi-domain Stack Exchange helpfulness benchmark.
UnBias-Plus is an open-source toolkit unifying segment-level multi-class bias classification, biased span localization, neutral text rewriting, and decision reasoning.
Causality resolves trade-offs in trustworthy AI by treating them as invariance conflicts under different data-generating process changes.
citing papers explorer
-
Social Bias in LLM-Generated Code: Benchmark and Mitigation
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
-
Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles
H-SAL erases latent concepts from text profiles using self-descriptions as implicit debiasing signals and shows competitive performance on a new multi-domain Stack Exchange helpfulness benchmark.
-
UnBias-Plus: Detect, Explain, and Rewrite Bias
UnBias-Plus is an open-source toolkit unifying segment-level multi-class bias classification, biased span localization, neutral text rewriting, and decision reasoning.
-
Trustworthy AI Suffers from Invariance Conflicts and Causality is The Solution
Causality resolves trade-offs in trustworthy AI by treating them as invariance conflicts under different data-generating process changes.