pith. sign in

arxiv: 2605.11365 · v2 · pith:AYL5TBQ6new · submitted 2026-05-12 · 💻 cs.AI · cs.LG· stat.ML

Causal Bias Detection in Generative Artificial Intelligence

Pith reviewed 2026-05-20 23:02 UTC · model grok-4.3

classification 💻 cs.AI cs.LGstat.ML
keywords causal fairnessgenerative AIlarge language modelsbias detectioncausal decompositionfairness quantificationmechanism replacement
0
0 comments X

The pith

Causal fairness in generative AI unifies with standard ML through decompositions of bias along pathways and mechanism replacements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to formalize causal fairness for generative models that sample from their own learned conditionals over multiple variables rather than fitting one predictor. This leads to new decomposition results that break fairness violations into separate contributions from distinct causal pathways and from the gap between real-world mechanisms and those implied by the model. A sympathetic reader would care because high-stakes generative systems such as large language models can embed and amplify disparities in ways that single-predictor fairness tools miss, and granular measurement supplies clearer targets for correction. Identification conditions are stated so that the target quantities can be recovered from data or direct model queries, and estimators are derived to make the quantities computable in practice.

Core claim

We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

What carries the argument

Causal decomposition results that separate fairness impacts into effects transmitted along specific causal pathways and effects that arise when the generative model substitutes its own mechanisms for those in the real world.

If this is right

  • Audits of generative models can now attribute measured disparities to either transmission along existing data paths or to novel bias introduced by the model's own conditional distributions.
  • Fairness interventions can be targeted at specific mechanisms inside the generative process rather than applied uniformly.
  • The same decomposition framework covers both the classic single-predictor setting and the more general generative setting, allowing direct comparison of bias sources across model types.
  • Practical estimation becomes feasible for race and gender bias analysis in large language models without requiring full knowledge of every causal mechanism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support regulatory audits that require generative systems to report both path-specific and mechanism-replacement contributions to observed disparities.
  • Similar decompositions might be tested on image or video generators to see whether the same separation of pathway and mechanism effects appears in non-text domains.
  • If the framework is adopted, training objectives could be augmented with penalties that penalize large mechanism-replacement gaps on protected attributes.

Load-bearing premise

The identification conditions for the causal quantities hold, so that the estimators recover the target fairness measures from observable data or model queries.

What would settle it

A simulation in which a generative model is built with a fully known causal graph and injected bias sources, yet the derived estimators return values that deviate from the ground-truth pathway and mechanism contributions.

Figures

Figures reproduced from arXiv: 2605.11365 by Drago Plecko.

Figure 1
Figure 1. Figure 1: Standard Fairness Model (SFM) for machine learning and generative AI settings. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Graphical models for (a) x-specific direct effect; (b) generic potential outcome in S-SFM. nism f rw Y , or according to the ML model f m Y . In this way, the S-node can be used to denote from which generative environment (real world or model) the data is sampled. In context of generative AI, as mentioned earlier, the S-node must point to all covariates X, Z, W, and Y , since generative models are able to … view at source ↗
Figure 3
Figure 3. Figure 3: For both potential outcomes, S = s0 for each mechanism, meaning that all the mechanisms are from the real world. The difference between the two potential outcomes lies in the value of X 4 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quantifying differences using S-SFM potential outcomes. along the direct path X → Y , and thus captures the direct effect of a x0 → x1 transition in the real world. We contrast this with the difference between (c) and (d) of [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Standard Fairness Models for the three datasets. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hierarchical clustering of model bias signatures (L1 distance, Ward linkage). developer families and parameter scales. Further, while the Llama 3 siblings sit at L1 = 0.39, the Qwen 3.5 pair is at 0.62 and the Gemma 3 pair at 1.22 – both farther apart than many cross-family pairs. These mixed groupings motivate a formal test of whether family membership reliably predicts bias similarity. Using a permutatio… view at source ↗
Figure 6
Figure 6. Figure 6: Counterfactual graph for proof of Prop. 3. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Similarity of model bias signatures: (a) full pairwise [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: TV decomposition into ∆x-DE, ∆x-IE, and ∆x-SE for Gemma 3 27B on NSDUH. be low-earners. Indirect (0.3% ± 1.3%) and spurious (0.5% ± 2.7%) effects are both small and not significant, indicating that the disparity does not flow through observed mediators or confounders. Similarity of Bias Signatures [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: TV decomposition into ∆x-DE, ∆x-IE, and ∆x-SE for Qwen 3.5 27B on BRFSS. under Gemma’s fY , the direct effect is sensitive to the distribution of W, and Gemma’s fW produces a W | X distribution that pushes the direct effect toward stereotyping minorities as more likely to use marijuana. Finally, the fX,Z replacement shifts the direct effect by −5.3% ± 8.8%, and the fully-replaced DEs1 = −1.0% ± 8.3% no lon… view at source ↗
read the original abstract

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper formalizes causal fairness for generative AI models, which construct their own mechanisms over variables rather than inheriting real-world mechanisms as in standard predictive ML. It unifies the two settings under a common causal framework, derives decompositions quantifying fairness effects along specific pathways and via replacement of real-world mechanisms by the generative model, establishes identification conditions, introduces estimators, and applies the approach to measure race and gender bias in LLMs on multiple datasets.

Significance. If the identification and decomposition results hold, the work fills an important gap by extending causal fairness tools to the more complex generative setting, where models implicitly define causal mechanisms. The pathway and mechanism-replacement decompositions enable finer-grained auditing than total-effect measures alone, and the empirical analysis on LLMs demonstrates applicability to current high-impact systems.

major comments (1)
  1. [§4.2] §4.2, Identification Result 2: The identification of the mechanism-replacement effect relies on the assumption that queries to the generative model can isolate the replacement of specific real-world mechanisms without residual dependence on training-data confounders; this is load-bearing for the estimators in §5 but receives only a brief justification rather than a formal proof or sensitivity analysis.
minor comments (3)
  1. [§3.1] §3.1, Eq. (7): The notation for the do-operator applied to generative sampling is introduced without an explicit definition of the intervention semantics for a black-box model; a short clarifying paragraph would improve readability.
  2. [Table 1] Table 1: The reported standard errors for the LLM bias estimates are not accompanied by the number of Monte Carlo samples or query budget used, making it difficult to assess precision.
  3. [§6] §6: The discussion of limitations mentions computational cost but does not address how the method scales when the generative model is a large autoregressive LLM with thousands of tokens.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review, positive summary of the contribution, and recommendation for minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: [§4.2] §4.2, Identification Result 2: The identification of the mechanism-replacement effect relies on the assumption that queries to the generative model can isolate the replacement of specific real-world mechanisms without residual dependence on training-data confounders; this is load-bearing for the estimators in §5 but receives only a brief justification rather than a formal proof or sensitivity analysis.

    Authors: We thank the referee for this observation. The identification result in §4.2 indeed rests on the assumption that targeted queries (e.g., carefully designed prompts) to the generative model can replace a specific real-world mechanism while limiting residual dependence on training-data confounders. The manuscript provides a brief justification based on the controllable conditioning properties of modern generative models such as LLMs. We agree that expanding this into a formal proof sketch and adding a sensitivity analysis would strengthen the presentation and better support the estimators in §5. In the revised version we will augment §4.2 with a more detailed derivation of the identification under the stated assumption and include a sensitivity analysis in §5 that quantifies robustness to potential residual confounding. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation builds on standard causal identification

full rationale

The paper formalizes causal fairness for generative models by extending existing causal inference concepts to a new setting, deriving pathway decompositions and mechanism-replacement effects, and stating identification conditions that recover quantities from observables or model queries. No equations or steps reduce by construction to fitted parameters, self-definitions, or self-citation chains; the central results follow directly from applying standard identification logic without renaming known patterns or smuggling ansatzes via prior author work. The approach remains self-contained against external benchmarks in causal fairness literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard causal assumptions plus newly derived identification conditions for the generative setting; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Causal graph and identification conditions for the variables and mechanisms in the generative model hold
    Invoked to enable the new decomposition results and estimators as stated in the abstract.

pith-pipeline@v0.9.0 · 5770 in / 1127 out tokens · 34793 ms · 2026-05-20T23:02:43.699171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 13 internal anchors

  1. [1]

    Phi-4 Technical Report

    M. Abdin, J. Aneja, H. Behl, S. Bubeck, R. Eldan, S. Gunasekar, M. Harrison, R. J. Hewett, M. Javaheripi, P. Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

  2. [2]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Angwin, J

    J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias: There’s soft- ware used across the country to predict future criminals. and it’s biased against blacks.ProPublica, 5 2016. URLhttps://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencing

  4. [4]

    Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho- seini, C. McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

  5. [5]

    Bareinboim.Causal Artificial Intelligence: A Roadmap for Building Causally Intelligent Systems

    E. Bareinboim.Causal Artificial Intelligence: A Roadmap for Building Causally Intelligent Systems. Online, 2025. URLhttps://causalai-book.net/. Draft version

  6. [6]

    Barocas and A

    S. Barocas and A. D. Selbst. Big data’s disparate impact.Calif. L. Rev., 104:671, 2016

  7. [7]

    X. Bi, D. Chen, G. Chen, S. Chen, D. Dai, C. Deng, H. Ding, K. Dong, Q. Du, Z. Fu, et al. Deepseek llm: Scaling open-source language models with longtermism.arXiv preprint arXiv:2401.02954, 2024

  8. [8]

    F. D. Blau and L. M. Kahn. The gender earnings gap: learning from international comparisons. The American Economic Review, 82(2):533–538, 1992

  9. [9]

    F. D. Blau and L. M. Kahn. The gender wage gap: Extent, trends, and explanations.Journal of economic literature, 55(3):789–865, 2017

  10. [10]

    Brennan, W

    T. Brennan, W. Dieterich, and B. Ehret. Evaluating the predictive validity of the compas risk and needs assessment system.Criminal Justice and Behavior, 36(1):21–40, 2009

  11. [11]

    Buolamwini and T

    J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In S. A. Friedler and C. Wilson, editors,Proceedings of the 1st Confer- ence on Fairness, Accountability and Transparency, volume 81 ofProceedings of Machine Learning Research, pages 77–91, NY , USA, 2018

  12. [12]

    Behavioral Risk Factor Surveillance System Sur- vey Data.https://www.cdc.gov/brfss/, 2023

    Centers for Disease Control and Prevention. Behavioral Risk Factor Surveillance System Sur- vey Data.https://www.cdc.gov/brfss/, 2023. U.S. Department of Health and Human Services

  13. [13]

    Cheong, S

    J. Cheong, S. Kalkan, and H. Gunes. Counterfactual fairness for facial expression recognition. InEuropean Conference on Computer Vision, pages 245–261. Springer, 2022

  14. [14]

    Chernozhukov, D

    V . Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters, 2018

  15. [15]

    S. Chiappa. Path-specific counterfactual fairness. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7801–7808, 2019

  16. [16]

    J. D. Correa and E. Bareinboim. Counterfactual graphical models: Constraints and inference. InForty-second International Conference on Machine Learning, 2025

  17. [17]

    Datta, M

    A. Datta, M. C. Tschantz, and A. Datta. Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination.Proceedings on Privacy Enhancing Technologies, 2015 (1):92–112, Apr. 2015. doi: 10.1515/popets-2015-0007

  18. [18]

    De-Arteaga, A

    M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, and A. T. Kalai. Bias in bios: A case study of semantic representation bias in a high-stakes setting. Inproceedings of the Conference on Fairness, Accountability, and Transparency, pages 120–128, 2019. 10

  19. [19]

    S. Garg, V . Perot, N. Limtiaco, A. Taly, E. H. Chi, and A. Beutel. Counterfactual fairness in text classification through robustness. InProceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 219–226, 2019

  20. [20]

    The Llama 3 Herd of Models

    A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  21. [21]

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  22. [22]

    L. A. Hendricks, K. Burns, K. Saenko, T. Darrell, and A. Rohrbach. Women also snowboard: Overcoming bias in captioning models. InProceedings of the European conference on com- puter vision (ECCV), pages 771–787, 2018

  23. [23]

    Joo and K

    J. Joo and K. Kärkkäinen. Gender slopes: Counterfactual fairness for computer vision mod- els by attribute manipulation. InProceedings of the 2nd international workshop on fairness, accountability, transparency and ethics in multimedia, pages 1–5, 2020

  24. [24]

    S. Jung, S. Yu, S. Chun, and T. Moon. Do counterfactually fair image classifiers satisfy group fairness?–a theoretical and empirical study.Advances in Neural Information Processing Sys- tems, 37:56041–56053, 2024

  25. [25]

    A. E. Khandani, A. J. Kim, and A. W. Lo. Consumer credit-risk models via machine-learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010

  26. [26]

    Avoiding Discrimination through Causal Reasoning

    N. Kilbertus, M. Rojas-Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning.arXiv preprint arXiv:1706.02744, 2017

  27. [27]

    H. Kim, S. Shin, J. Jang, K. Song, W. Joo, W. Kang, and I.-C. Moon. Counterfactual fairness with disentangled causal effect variational autoencoder. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35 (9), pages 8128–8136, 2021

  28. [28]

    M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness.Advances in neural information processing systems, 30, 2017

  29. [29]

    W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Sto- ica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pages 611–626, 2023

  30. [30]

    Holistic Evaluation of Language Models

    P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y . Zhang, D. Narayanan, Y . Wu, A. Kumar, et al. Holistic evaluation of language models.arXiv preprint arXiv:2211.09110, 2022

  31. [31]

    A. H. Liu, K. Khandelwal, S. Subramanian, V . Jouault, A. Rastogi, A. Sadé, A. Jeffares, A. Jiang, A. Cahill, A. Gavaudan, et al. Ministral 3.arXiv preprint arXiv:2601.08584, 2026

  32. [32]

    Luccioni, C

    S. Luccioni, C. Akiki, M. Mitchell, and Y . Jernite. Stable bias: Evaluating societal represen- tations in diffusion models.Advances in Neural Information Processing Systems, 36:56338– 56351, 2023

  33. [33]

    J. F. Mahoney and J. M. Mohen. Method and system for loan origination and underwriting, Oct. 23 2007. US Patent 7,287,008

  34. [34]

    Nabi and I

    R. Nabi and I. Shpitser. Fair inference on outcomes. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

  35. [35]

    Nadeem, A

    M. Nadeem, A. Bethke, and S. Reddy. Stereoset: Measuring stereotypical bias in pretrained language models. InProceedings of the 59th annual meeting of the association for computa- tional linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 5356–5371, 2021

  36. [36]

    Naik and B

    R. Naik and B. Nushi. Social biases through the text-to-image generation lens. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 786–808, 2023. 11

  37. [37]

    Nangia, C

    N. Nangia, C. Vania, R. Bhalerao, and S. Bowman. Crows-pairs: A challenge dataset for measuring social biases in masked language models. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 1953–1967, 2020

  38. [38]

    D. Pager. The mark of a criminal record.American journal of sociology, 108(5):937–975, 2003

  39. [39]

    Pearl.Causality: Models, Reasoning, and Inference

    J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009

  40. [40]

    Pearl and E

    J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal ap- proach. InProceedings of the AAAI Conference on Artificial Intelligence, volume 25 (1), pages 247–254, 2011

  41. [41]

    Ple ˇcko and E

    D. Ple ˇcko and E. Bareinboim. Reconciling predictive and statistical parity: A causal approach. Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024

  42. [42]

    Ple ˇcko and E

    D. Ple ˇcko and E. Bareinboim. Causal fairness analysis.Foundations and Trends in Machine Learning, 17 (3):304–589, 2024

  43. [43]

    Ple ˇcko and N

    D. Ple ˇcko and N. Meinshausen. Fair data adaptation with quantile preservation.Journal of Machine Learning Research, 21:242, 2020

  44. [44]

    Ple ˇcko, P

    D. Ple ˇcko, P. Okanovi´c, S. Havaldar, T. Hoefler, and E. Bareinboim. Epidemiology of large language models: A benchmark for observational distribution knowledge.arXiv preprint arXiv:2511.03070, 2025. URLhttps://arxiv.org/pdf/2511.03070

  45. [45]

    S. SAMHSA. National Survey on Drug Use and Health (NSDUH).https://www.samhsa. gov/data/data-we-collect/nsduh-national-survey-drug-use-and-health,

  46. [46]

    Department of Health and Human Services

    U.S. Department of Health and Human Services

  47. [47]

    J. Sanburn. Facebook thinks some native american names are inauthentic.Time, Feb. 14 2015. URLhttp://time.com/3710203/facebook-native-american-names/

  48. [48]

    i’m sorry to hear that

    E. M. Smith, M. Hall, M. Kambadur, E. Presani, and A. Williams. “i’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. InProceedings of the 2022 conference on empirical methods in natural language processing, pages 9180–9211, 2022

  49. [49]

    L. Sweeney. Discrimination in online ad delivery. Technical Report 2208240, SSRN, Jan. 28

  50. [50]

    URLhttp://dx.doi.org/10.2139/ssrn.2208240

  51. [51]

    L. T. Sweeney and C. Haney. The influence of race on sentencing: A meta-analytic review of experimental studies.Behavioral Sciences & the Law, 10(2):179–195, 1992

  52. [52]

    G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

  53. [53]

    G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mes- nard, B. Shahriari, A. Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

  54. [54]

    Census Bureau

    U.S. Census Bureau. American Community Survey 1-Year Estimates, 2023. Retrieved from https://data.census.gov/

  55. [55]

    S. Wang, X. Cao, J. Zhang, Z. Yuan, S. Shan, X. Chen, and W. Gao. Vlbiasbench: A com- prehensive benchmark for evaluating bias in large vision-language model.arXiv preprint arXiv:2406.14194, 2024

  56. [56]

    Y . Wu, L. Zhang, X. Wu, and H. Tong. Pc-fairness: A unified framework for measuring causality-based fairness.Advances in neural information processing systems, 32, 2019

  57. [57]

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 12

  58. [58]

    Zhang and E

    J. Zhang and E. Bareinboim. Equality of opportunity in classification: A causal approach. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 3671–3681, Montreal, Canada,

  59. [59]

    Curran Associates, Inc

  60. [60]

    Zhang and E

    J. Zhang and E. Bareinboim. Fairness in decision-making—the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. 13 Supplementary Material forCausal Bias Detection in Generative Artificial Intelligence The source code for reproducing the results can be found in the anonymized code repository https:...

  61. [61]

    Write a single narrative enclosed in <story>...</story>

  62. [62]

    Do NOT include headings, lists, analysis, or any text outside the tags

  63. [63]

    Mention ALL facts given below exactly once (age, sex, race, education, income, marijuana use last month)

  64. [64]

    Keep it under 200 words. known facts to be mentioned: - age = 30-34 years - sex = female - race = White unknown facts to be mentioned: - edu (possible values: <= 8th grade, Some high school, High school graduate, Some college no degree, Associate degree, Bachelor’s or higher) - income (possible values: < $10,000, $10,000 - $19,999, $20,000 - $29,999, $30,...