pith. sign in

arxiv: 2606.00600 · v1 · pith:3CW247ZQnew · submitted 2026-05-30 · 💻 cs.SI

Understanding the Self-Reflection Mechanisms of LLMs through Biased Attitude Associations

Pith reviewed 2026-06-28 18:15 UTC · model grok-4.3

classification 💻 cs.SI
keywords self-reflectionLLMssocial biasvalence fluctuationReBias-Lensbias mitigationattitude associationshierarchical divergence
0
0 comments X

The pith

LLM self-reflection smooths overall valence fluctuations across layers while selectively amplifying some category biases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ReBias-Lens to investigate how self-reflection in LLMs reconfigures biased attitude associations using valence projections. It tracks Global-VF for broad trends and Local-VF for specific categories in four models over twelve groups. Results indicate layer-wise smoothing and hierarchical divergence that reduces bias in final outputs. This mitigation is not complete, as certain categories show locked-in or increased bias levels. Understanding these mechanics matters for developing more reliable bias reduction in AI systems.

Core claim

The central discovery is that self-reflection mechanisms produce a distinct layer-wise smoothing of valence fluctuations with significant hierarchical representation divergence as layers deepen. This leads to widespread mitigation of bias at the behavioral level but with a stubborn category-specific selectivity that can lock in and amplify localized biases.

What carries the argument

ReBias-Lens, a probing framework that uses Valence Fluctuation (VF) with Global-VF and Local-VF variants to measure changes in biased attitude associations during self-reflection.

If this is right

  • Overall valence fluctuations smooth out in deeper layers of the model.
  • Behavioral outputs exhibit reduced bias after the self-reflection process.
  • Some social categories experience persistent or amplified bias despite the macro-level reduction.
  • The reconfiguration occurs through valence projection in intersectional contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interventions could focus on specific layers to address the selective amplification.
  • The findings may apply to designing self-reflection prompts that target category-specific biases.
  • Similar mechanisms might appear in other internal processes beyond self-reflection.

Load-bearing premise

Social bias is intrinsically encoded as valence inclinations that worsen with sharper fluctuations across social groups.

What would settle it

If measurements with ReBias-Lens show no layer-wise smoothing of valence fluctuations or absence of category-specific selectivity in bias amplification during self-reflection.

Figures

Figures reproduced from arXiv: 2606.00600 by Boci Yang, Bo Wang, Dongming Zhao, Jingshen Zhang, Ruifang He, Yuexian Hou, Zifei Yu.

Figure 1
Figure 1. Figure 1: Intersectional Context Prompting in ReBias [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparative Analysis of Global-VF Trajec [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Local-VF compared with different settings. Blue and green box plots represent self-reflection states without [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Percentage of bias mitigation across 12 so [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Measuring the Shift from Initial Option B to [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Mistral-7B-Instruct-v0.3, Projection-Based and Cosine Similarity Comparison [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qwen2.5-7B-Instruct, Projection-Based and Cosine Similarity Comparison poverty, ugly, cancer, kill, rotten, vomit, agony, prison F Self-Reflection Instructions 12 [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Self-reflection instructions 13 [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

While the emergent self-reflection capabilities of Large Language Models (LLMs) offer a promising paradigm for autonomous bias mitigation, their internal mechanics remain unclear, raising concerns regarding potential bias entrenchment. Under the premise that social bias is intrinsically encoded as valence inclinations, where the exacerbation of bias scales with sharper valence fluctuations across social groups, this paper proposes ReBias-Lens, a probing framework designed to interpret how self-reflection reconfigures these biased attitude associations through the lens of valence projection within intersectional contexts. Central to ReBias-Lens is the metric of Valence Fluctuation (VF) comprising two variants: Global-VF, which captures macroscopic valence encoding trends, and Local-VF, which scrutinizes microscopic distinctiveness across specific social categories. Deploying ReBias-Lens to evaluate four LLMs across twelve social categories reveals that overall valence fluctuations undergo a distinct layer-wise smoothing, characterized by a significant hierarchical representation divergence as the layers deepen, which ultimately manifests as a widespread mitigation of bias at the behavioral level. In stark contrast to this macro-level reduction, this reflection mechanism is not universally corrective, instead exhibiting a stubborn, category-specific selectivity that regularly locks in and perversely amplifies localized biases. Warning: this paper contains examples with biased content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes the ReBias-Lens probing framework to examine how self-reflection in LLMs reconfigures biased attitude associations via valence projection in intersectional contexts. Under the explicit premise that social bias is encoded as valence inclinations (with bias exacerbation scaling with sharper valence fluctuations across groups), it introduces Global-VF and Local-VF metrics. Analysis across four LLMs and twelve social categories finds layer-wise smoothing of overall valence fluctuations (with hierarchical representation divergence) that manifests as behavioral-level bias mitigation, contrasted by category-specific selectivity that can lock in or amplify localized biases.

Significance. If the valence-to-bias mapping holds and the VF metrics provide independent evidence, the work could illuminate internal self-reflection dynamics in LLMs and highlight limits of uniform bias mitigation. The layer-wise analysis and distinction between macro mitigation and micro amplification would be a useful contribution to interpretability research in social informatics, particularly if accompanied by falsifiable predictions or reproducible code.

major comments (2)
  1. [Abstract] Abstract: The premise that 'social bias is intrinsically encoded as valence inclinations, where the exacerbation of bias scales with sharper valence fluctuations across social groups' is stated without cited prior validation, independent empirical tests, or sensitivity analysis. This assumption is load-bearing for the central claim, as ReBias-Lens, Global-VF, and Local-VF are constructed around it; without it, the interpretation of layer-wise smoothing as 'widespread mitigation of bias' and category selectivity as 'perversely amplifies localized biases' cannot be read as evidence about self-reflection mechanisms.
  2. [Abstract] Abstract (and framework definition): The Valence Fluctuation metric is defined directly in terms of the same valence associations later used to quantify bias mitigation/amplification. This creates a circularity risk for Global-VF and Local-VF; the reported 'distinct layer-wise smoothing' and 'hierarchical representation divergence' may largely re-express the input associations rather than provide an independent measurement of bias reconfiguration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The two major comments highlight important issues regarding the grounding of our core premise and the independence of the VF metrics. We address each point below and outline revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The premise that 'social bias is intrinsically encoded as valence inclinations, where the exacerbation of bias scales with sharper valence fluctuations across social groups' is stated without cited prior validation, independent empirical tests, or sensitivity analysis. This assumption is load-bearing for the central claim, as ReBias-Lens, Global-VF, and Local-VF are constructed around it; without it, the interpretation of layer-wise smoothing as 'widespread mitigation of bias' and category selectivity as 'perversely amplifies localized biases' cannot be read as evidence about self-reflection mechanisms.

    Authors: We acknowledge that the premise is presented without explicit citations or sensitivity analysis in the submitted version, making it a load-bearing assumption. The full manuscript draws on concepts from affective computing and implicit bias literature, but these connections were not sufficiently documented. In revision, we will add targeted citations to prior work linking valence fluctuations to bias exacerbation and include a sensitivity analysis varying the fluctuation thresholds and valence sources to test robustness of the layer-wise smoothing and selectivity findings. revision: yes

  2. Referee: [Abstract] Abstract (and framework definition): The Valence Fluctuation metric is defined directly in terms of the same valence associations later used to quantify bias mitigation/amplification. This creates a circularity risk for Global-VF and Local-VF; the reported 'distinct layer-wise smoothing' and 'hierarchical representation divergence' may largely re-express the input associations rather than provide an independent measurement of bias reconfiguration.

    Authors: The VF metrics are computed on the evolution of valence projections across layers, using fixed external valence associations as the input baseline; the reported smoothing and divergence reflect changes relative to that baseline during self-reflection. This is not intended as a fully independent measure but as a lens on reconfiguration dynamics. To address the circularity concern, we will revise the methods section for clearer separation of input associations from layer outputs and add an ablation comparing VF trends against direct behavioral bias scores from the same models. revision: partial

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The central premise that bias equals valence fluctuation is treated as given rather than derived.

axioms (1)
  • domain assumption Social bias is intrinsically encoded as valence inclinations, where the exacerbation of bias scales with sharper valence fluctuations across social groups.
    Explicitly stated as the premise of the paper in the abstract.
invented entities (2)
  • ReBias-Lens framework no independent evidence
    purpose: Probing tool to interpret self-reflection via valence projection in intersectional contexts.
    Newly proposed metric and analysis pipeline introduced in the abstract.
  • Global-VF and Local-VF metrics no independent evidence
    purpose: Quantify macroscopic valence encoding trends and microscopic distinctiveness across social categories.
    Core measurement constructs defined for the study.

pith-pipeline@v0.9.1-grok · 5771 in / 1461 out tokens · 19718 ms · 2026-06-28T18:15:11.167889+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 14 canonical work pages

  1. [1]

    2024 , eprint=

    Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis , author=. 2024 , eprint=

  2. [2]

    2025 , eprint=

    Self-correction is Not An Innate Capability in Large Language Models , author=. 2025 , eprint=

  3. [3]

    When Can LLM s Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLM s

    Kamoi, Ryo and Zhang, Yusen and Zhang, Nan and Han, Jiawei and Zhang, Rui. When Can LLM s Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLM s. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00713

  4. [4]

    2024 , eprint=

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs , author=. 2024 , eprint=

  5. [5]

    2024 , eprint=

    Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions , author=. 2024 , eprint=

  6. [6]

    Bryson and Arvind Narayanan , title =

    Caliskan, Aylin and Bryson, Joanna J. and Narayanan, Arvind , year=. Semantics derived automatically from language corpora contain human-like biases , volume=. Science , publisher=. doi:10.1126/science.aal4230 , number=

  7. [7]

    2019 , eprint=

    On Measuring Social Biases in Sentence Encoders , author=. 2019 , eprint=

  8. [8]

    Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases , url=

    Guo, Wei and Caliskan, Aylin , year=. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases , url=. doi:10.1145/3461702.3462536 , booktitle=

  9. [9]

    Data Sci

    Tommaso Dolci and Fabio Azzalini and Mara Tanelli , title =. Data Sci. Eng. , volume =. 2023 , url =. doi:10.1007/S41019-023-00211-0 , timestamp =

  10. [10]

    Proceedings of The Web Conference 2020 , pages =

    The Polar Framework: Polar Opposites Enable Interpretability of Pretrained Word Embeddings , author =. Proceedings of The Web Conference 2020 , pages =

  11. [11]

    Findings of the Association for Computational Linguistics: EMNLP 2022 , pages =

    SensePOLAR: Word Sense Aware Interpretability for Pre-trained Contextual Word Embeddings , author =. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages =

  12. [12]

    2024 , eprint=

    Anchoring Bias in Large Language Models: An Experimental Study , author=. 2024 , eprint=

  13. [13]

    2024 , eprint=

    Bias Amplification in Language Model Evolution: An Iterated Learning Perspective , author=. 2024 , eprint=

  14. [14]

    2025 , eprint=

    Understanding the Dark Side of LLMs' Intrinsic Self-Correction , author=. 2025 , eprint=

  15. [15]

    Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society , pages =

    Evaluating Biased Attitude Associations of Language Models in an Intersectional Context , author =. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society , pages =

  16. [16]

    1986 , publisher =

    Blackboard Systems , editor =. 1986 , publisher =

  17. [17]

    Proceedings of the Eighth International Joint Conference on Artificial Intelligence , pages =

    Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education , author =. Proceedings of the Eighth International Joint Conference on Artificial Intelligence , pages =. 1983 , publisher =

  18. [18]

    Proceedings of the Fourth National Conference on Artificial Intelligence , pages =

    Classification Problem Solving , author =. Proceedings of the Fourth National Conference on Artificial Intelligence , pages =. 1984 , publisher =

  19. [19]

    Science , volume =

    New Ways to Make Microcircuits Smaller , author =. Science , volume =. 1980 , doi =

  20. [20]

    International Journal of Man-Machine Studies , volume =

    Strategic Explanations for a Diagnostic Consultation System , author =. International Journal of Man-Machine Studies , volume =. 1984 , doi =

  21. [21]

    1986 , number =

    Poligon: A System for Parallel Problem Solving , author =. 1986 , number =

  22. [22]

    1979 , school =

    Transfer of Rule-Based Expertise through a Tutorial Dialogue , author =. 1979 , school =

  23. [23]

    2021 , note =

    The Engineering of Qualitative Models , author =. 2021 , note =

  24. [24]

    arXiv preprint arXiv:1706.03762 , year =

    Attention Is All You Need , author =. arXiv preprint arXiv:1706.03762 , year =

  25. [25]

    2015 , howpublished =

    Pluto: The 'Other' Red Planet , author =. 2015 , howpublished =

  26. [26]

    ArXiv , year=

    Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions , author=. ArXiv , year=

  27. [27]

    2024 , url=

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs , author=. 2024 , url=

  28. [28]

    Osgood , journal =

    Charles E. Osgood , journal =. Semantic Differential Technique in the Comparative Study of Cultures , urldate =

  29. [29]

    1981 , publisher=

    The Principles of Psychology , author=. 1981 , publisher=

  30. [30]

    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

    Xstest: A test suite for identifying exaggerated safety behaviours in large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  31. [31]

    arXiv preprint arXiv:2405.20947 , year=

    Or-bench: An over-refusal benchmark for large language models , author=. arXiv preprint arXiv:2405.20947 , year=

  32. [32]

    2021 , eprint=

    ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across Languages and Over Centuries , author=. 2021 , eprint=

  33. [33]

    Explicit vs

    Zhao, Yachao and Wang, Bo and Wang, Yan and Zhao, Dongming and He, Ruifang and Hou, Yuexian. Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection. Findings of the Association for Computational Linguistics: ACL 2025. 2025

  34. [34]

    2024 , eprint=

    Measuring Implicit Bias in Explicitly Unbiased Large Language Models , author=. 2024 , eprint=

  35. [35]

    A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation

    Zhao, Yachao and Wang, Bo and Wang, Yan and Zhao, Dongming and Jin, Xiaojia and Zhang, Jijun and He, Ruifang and Hou, Yuexian. A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-...

  36. [36]

    Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis

    Liu, Guangliang and Mao, Haitao and Tang, Jiliang and Johnson, Kristen. Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.918

  37. [37]

    The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models , url=

    Renze, Matthew and Guven, Erhan , year=. The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models , url=. doi:10.1109/fllm63129.2024.10852493 , booktitle=

  38. [38]

    ArXiv , year=

    Merging Improves Self-Critique Against Jailbreak Attacks , author=. ArXiv , year=

  39. [39]

    S afe C onf: A Confidence-Calibrated Safety Self-Evaluation Method for Large Language Models

    Zhang, Bo and Gao, Cong and Yang, Linkang and Han, Bingxu and Hu, Minghao and Luo, Zhunchen and Geng, Guotong and Bai, Xiaoying and Zhang, Jun and Yao, Wen and Wang, Zhong. S afe C onf: A Confidence-Calibrated Safety Self-Evaluation Method for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/...

  40. [40]

    2024 , eprint=

    Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement , author=. 2024 , eprint=

  41. [41]

    Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection

    Phan, Hoang and Li, Victor and Lei, Qi. Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.503

  42. [42]

    Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study , url =

    Zhao, Lili and Wang, Yang and Liu, Qi and Wang, Mengyun and Chen, Wei and Sheng, Zhichao and Wang, Shijin , booktitle =. Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study , url =

  43. [43]

    Towards Mitigating LLM Hallucination via Self Reflection

    Ji, Ziwei and Yu, Tiezheng and Xu, Yan and Lee, Nayeon and Ishii, Etsuko and Fung, Pascale. Towards Mitigating LLM Hallucination via Self Reflection. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.123

  44. [44]

    2023 , eprint=

    Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=

  45. [45]

    2024 , url=

    Self-correction is Not An Innate Capability in Large Language Models: A Case Study of Moral Self-correction , author=. 2024 , url=

  46. [47]

    Psychological Review , volume =

    Implicit social cognition: attitudes, self-esteem, and stereotypes , author =. Psychological Review , volume =. 1995 , month = jan, publisher =

  47. [48]

    2022 , eprint=

    VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models , author=. 2022 , eprint=

  48. [49]

    Journal of Personality and Social Psychology , volume =

    Measuring individual differences in implicit cognition: the implicit association test , author =. Journal of Personality and Social Psychology , volume =. 1998 , publisher =

  49. [50]

    2014 , doi =

    Smith, Joanna and Noble, Helen , title =. 2014 , doi =. https://ebn.bmj.com/content/17/4/100.full.pdf , journal =

  50. [51]

    2025 , eprint=

    Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral , author=. 2025 , eprint=

  51. [52]

    The American Journal of Psychology , volume=

    An atlas of semantic profiles for 360 words , author=. The American Journal of Psychology , volume=. 1958 , publisher=

  52. [53]

    and Taddy, Matt and Evans, James A

    Kozlowski, Austin C. and Taddy, Matt and Evans, James A. , year=. The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings , volume=. American Sociological Review , publisher=. doi:10.1177/0003122419877135 , number=

  53. [54]

    Evidence-Based Nursing , volume =

    Bias in research , author =. Evidence-Based Nursing , volume =. 2014 , publisher =

  54. [55]

    2023 , eprint=

    The Capacity for Moral Self-Correction in Large Language Models , author=. 2023 , eprint=

  55. [56]

    2025 , eprint=

    Safety Layers in Aligned Large Language Models: The Key to LLM Security , author=. 2025 , eprint=

  56. [57]

    ArXiv , year=

    HuggingFace's Transformers: State-of-the-art Natural Language Processing , author=. ArXiv , year=

  57. [58]

    2024 , eprint=

    How do Large Language Models Handle Multilingualism? , author=. 2024 , eprint=

  58. [59]

    2011 , note =

    Dual-process theories and cognitive development: Advances and challenges , journal =. 2011 , note =. doi:https://doi.org/10.1016/j.dr.2011.07.002 , url =

  59. [60]

    2026 , eprint=

    Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin , author=. 2026 , eprint=

  60. [61]

    Behavior Research Methods, Instruments, & Computers , volume =

    Words high and low in pleasantness as rated by male and female college students , author =. Behavior Research Methods, Instruments, & Computers , volume =. 1986 , month = may, publisher =

  61. [62]

    ArXiv , year=

    LLaMA: Open and Efficient Foundation Language Models , author=. ArXiv , year=

  62. [63]

    ArXiv , year=

    Mistral 7B , author=. ArXiv , year=

  63. [64]

    ArXiv , year=

    Qwen Technical Report , author=. ArXiv , year=

  64. [65]

    ArXiv , year=

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. ArXiv , year=

  65. [66]

    Gender bias and stereotypes in Large Language Models , url=

    Kotek, Hadas and Dockum, Rikker and Sun, David , year=. Gender bias and stereotypes in Large Language Models , url=. doi:10.1145/3582269.3615599 , booktitle=

  66. [67]

    Nature , volume=

    The Principles of Psychology , author=. Nature , volume=

  67. [68]

    ArXiv , year=

    Implicit Bias in LLMs: A Survey , author=. ArXiv , year=

  68. [69]

    and Aponte, Ryan and Rossi, Ryan A

    Gallegos, Isabel O. and Aponte, Ryan and Rossi, Ryan A. and Barrow, Joe and Tanjim, Mehrab and Yu, Tong and Deilamsalehy, Hanieh and Zhang, Ruiyi and Kim, Sungchul and Dernoncourt, Franck and Lipka, Nedim and Owens, Deonna and Gu, Jiuxiang. Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes. Proceedings of the 2025 Co...

  69. [70]

    2025 , eprint=

    Exploring the Impact of Personality Traits on LLM Bias and Toxicity , author=. 2025 , eprint=

  70. [71]

    2024 , eprint=

    Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models , author=. 2024 , eprint=

  71. [72]

    2022 , eprint=

    A New Generation of Perspective API: Efficient Multilingual Character-level Transformers , author=. 2022 , eprint=

  72. [73]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  73. [74]

    Publications Manual , year = "1983", publisher =

  74. [75]

    and Kozen, Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  75. [76]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  76. [77]

    Dan Gusfield , title =. 1997

  77. [78]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  78. [79]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =