pith. machine review for the scientific record. sign in

arxiv: 2407.10853 · v6 · submitted 2024-07-15 · 💻 cs.CL · cs.AI

Recognition: unknown

Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords fairnessbiasevaluationllmsmetricspromptsrisksacross
0
0 comments X
read the original abstract

Bias and fairness risks in Large Language Models (LLMs) vary substantially across deployment contexts, yet existing approaches lack systematic guidance for selecting appropriate evaluation metrics. We present a decision framework that maps LLM use cases, characterized by a model and population of prompts, to relevant bias and fairness metrics based on task type, whether prompts contain protected attribute mentions, and stakeholder priorities. Our framework addresses toxicity, stereotyping, counterfactual unfairness, and allocational harms, and introduces novel metrics based on stereotype classifiers and counterfactual adaptations of text similarity measures. We release an open-source Python library, \texttt{langfair}, for practical adoption. Extensive experiments on use cases across five LLMs and five prompt populations demonstrate that fairness risks cannot be reliably assessed from benchmark performance alone: results on one prompt dataset likely overstate or understate risks for another, underscoring that fairness evaluation must be grounded in the specific deployment context.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

    cs.CY 2026-05 unverdicted novelty 7.0

    StereoTales shows that LLMs produce harmful, culturally adapted stereotypes in open-ended multilingual stories, with patterns consistent across providers and aligned human-LLM harm judgments.

  2. StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

    cs.CY 2026-05 accept novelty 7.0

    StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.