Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG
Pith reviewed 2026-07-01 01:10 UTC · model grok-4.3
The pith
LLMs apply population statistics to single cases in a failure mode called deductive stereotyping, and phrases found by Fair-GCG steer them toward fairness-aware reasoning at inference time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deductive stereotyping is the tendency of LLMs to substitute population-level statistical associations for case-specific reasoning, yielding inferences that are internally consistent yet socially biased. A reasoning-time injection framework counters this by prepending short phrases discovered through Fair-GCG; these phrases measurably raise fairness metrics, generalize across model scales, improve reasoning-level fairness, lower bias in free-form generation, and carry over to downstream fairness-sensitive applications.
What carries the argument
The reasoning-time injection framework, which prepends short learned phrases before the model produces its final answer, together with Fair-GCG, the search method that identifies the phrases.
If this is right
- Performance rises on multiple fairness benchmarks when the discovered phrases are injected.
- The same phrases found on smaller models remain effective when transferred to larger models.
- Reasoning-level fairness metrics improve, not only surface-level output metrics.
- Bias decreases in open-ended generation tasks that were not used during phrase search.
- The phrases transfer to real-world tasks that involve fairness-sensitive decisions.
Where Pith is reading between the lines
- If the injection mechanism works by altering the model’s internal attention to statistical cues, similar short phrases might be found for other documented reasoning failures such as over-reliance on spurious correlations.
- The framework assumes the model already possesses the knowledge needed for fair reasoning; it may therefore be less effective on tasks where the underlying facts themselves are contested or absent from training data.
- Because the phrases are discovered automatically, the method could be applied to new fairness definitions or new languages without hand-crafted rules.
Load-bearing premise
Inserting a short phrase at reasoning time can reliably shift an LLM toward fairness-aware inference without lowering accuracy on unrelated tasks or creating new unintended biases.
What would settle it
A controlled test in which the same models are run on the original fairness benchmarks both with and without the discovered phrases, measuring whether the reported gains disappear or reverse when the phrases are withheld.
Figures
read the original abstract
Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive stereotyping, in which models apply population-level statistical regularities to individual cases, producing logically coherent yet socially biased inferences. We provide a statistical interpretation of this phenomenon. To steer models toward fairness-aware reasoning, we propose a reasoning-time injection framework. We further introduce Fair-GCG to systematically discover effective injection phrases. Injection phrases discovered by Fair-GCG improve performance across multiple fairness benchmarks, generalize from smaller to larger LLMs, improves reasoning-level fairness, reduces bias in open-ended generation, and transfer to real-world fairness-sensitive tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript identifies a failure mode termed 'deductive stereotyping' in which LLMs apply population-level statistical regularities to individual cases, yielding logically coherent but socially biased inferences. It provides a statistical interpretation of the phenomenon and proposes a reasoning-time phrase injection framework, along with Fair-GCG, to discover effective injection phrases that steer models toward fairness-aware reasoning. The abstract asserts that phrases found by Fair-GCG improve performance across multiple fairness benchmarks, generalize from smaller to larger LLMs, enhance reasoning-level fairness, reduce bias in open-ended generation, and transfer to real-world fairness-sensitive tasks.
Significance. If the empirical results hold with appropriate controls, the work would supply a practical, training-free intervention for a specific bias mode in LLMs and introduce a systematic discovery procedure (Fair-GCG) that could be reusable. The framing of deductive stereotyping as a distinct, statistically interpretable failure mode adds conceptual clarity to the bias literature. However, the significance is limited by the absence of evidence that fairness gains preserve general capabilities or avoid new failure modes.
major comments (2)
- [Abstract and experimental evaluation sections] Abstract and experimental evaluation sections: the central claim that reasoning-time injection improves fairness metrics while leaving general capabilities and other bias dimensions intact lacks any reported results on standard capability benchmarks (MMLU, GSM8K, HumanEval) or unrelated bias axes before versus after injection. This is load-bearing for the 'steer without degrading' guarantee of the proposed framework.
- [Fair-GCG method section] Fair-GCG method section: the optimization objective used to discover injection phrases is not shown to incorporate any term that penalizes degradation on non-fairness tasks; without such a term or post-hoc verification, the reported fairness gains could be traded against capability loss.
minor comments (1)
- [Abstract] Abstract: the sentence 'improves reasoning-level fairness' does not name the specific metric or baseline comparison used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments correctly identify gaps in our current evaluation that must be addressed to support the framework's claims. We respond to each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and experimental evaluation sections] Abstract and experimental evaluation sections: the central claim that reasoning-time injection improves fairness metrics while leaving general capabilities and other bias dimensions intact lacks any reported results on standard capability benchmarks (MMLU, GSM8K, HumanEval) or unrelated bias axes before versus after injection. This is load-bearing for the 'steer without degrading' guarantee of the proposed framework.
Authors: We agree that the manuscript currently lacks direct evidence on capability preservation and other bias axes, which is necessary to substantiate the steering claim. In the revised manuscript we will add before-and-after comparisons on MMLU, GSM8K, and HumanEval, along with checks on unrelated bias dimensions. revision: yes
-
Referee: [Fair-GCG method section] Fair-GCG method section: the optimization objective used to discover injection phrases is not shown to incorporate any term that penalizes degradation on non-fairness tasks; without such a term or post-hoc verification, the reported fairness gains could be traded against capability loss.
Authors: We acknowledge that the Fair-GCG objective contains no explicit penalty for non-fairness degradation. The revision will incorporate post-hoc verification on the capability benchmarks listed above to confirm that fairness gains do not trade off against general performance. revision: yes
Circularity Check
No circularity: empirical method with independent claims
full rationale
The paper introduces deductive stereotyping as a failure mode and proposes a reasoning-time injection framework plus Fair-GCG to discover mitigation phrases. All central claims are empirical (performance gains on fairness benchmarks, generalization across model sizes, transfer to open-ended and real-world tasks). No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The statistical interpretation of the phenomenon is presented as an independent contribution rather than a tautology. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Reasoning-time injection of discovered phrases can steer LLMs toward fairness-aware reasoning
invented entities (2)
-
deductive stereotyping
no independent evidence
-
Fair-GCG
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Scaling Learning Algorithms Towards
Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
-
[2]
and Osindero, Simon and Teh, Yee Whye , journal =
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
-
[3]
2016 , publisher=
Deep learning , author=. 2016 , publisher=
2016
-
[4]
On Second Thought, Let ' s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Shaikh, Omar and Zhang, Hongxin and Held, William and Bernstein, Michael and Yang, Diyi. On Second Thought, Let ' s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.244
-
[5]
Wu, Yufan and He, Yinghui and Jia, Yilin and Mihalcea, Rada and Chen, Yulong and Deng, Naihao. Hi- T o M : A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.717
-
[6]
Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Qwen2.5-Coder Technical Report
Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
arXiv e-prints , pages=
The llama 3 herd of models , author=. arXiv e-prints , pages=
-
[9]
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Bang, Yejin and Cahyawijaya, Samuel and Lee, Nayeon and Dai, Wenliang and Su, Dan and Wilie, Bryan and Lovenia, Holy and Ji, Ziwei and Yu, Tiezheng and Chung, Willy and Do, Quyet V. and Xu, Yan and Fung, Pascale. A Multitask, Multilingual, Multimodal Evaluation of C hat GPT on Reasoning, Hallucination, and Interactivity. Proceedings of the 13th Internatio...
-
[11]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of artificial general intelligence: Early experiments with gpt-4 , author=. arXiv preprint arXiv:2303.12712 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Deepseek-v3 technical report , author=. arXiv preprint arXiv:2412.19437 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Advances in neural information processing systems , volume=
Fair clustering through fairlets , author=. Advances in neural information processing systems , volume=
-
[15]
Shahbazi, Nima and Lin, Yin and Asudeh, Abolfazl and Jagadish, H. V. , title =. ACM Comput. Surv. , month = jul, articleno =. 2023 , issue_date =. doi:10.1145/3588433 , abstract =
-
[16]
Lin, Yin and Gupta, Samika and Jagadish, H. V. , booktitle=. Mitigating Subgroup Unfairness in Machine Learning Classifiers: A Data-Driven Approach , year=
-
[17]
Advances in neural information processing systems , volume=
On fairness and calibration , author=. Advances in neural information processing systems , volume=
-
[18]
Advances in neural information processing systems , volume=
Fairness in learning: Classic and contextual bandits , author=. Advances in neural information processing systems , volume=
-
[19]
gpt-oss-120b & gpt-oss-20b Model Card
gpt-oss-120b & gpt-oss-20b model card , author=. arXiv preprint arXiv:2508.10925 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[21]
In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Li, Junyi and Cheng, Xiaoxue and Zhao, Xin and Nie, Jian-Yun and Wen, Ji-Rong. H alu E val: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.397
-
[22]
How Likely Do LLM s with C o T Mimic Human Reasoning?
Bao, Guangsheng and Zhang, Hongbo and Wang, Cunxiang and Yang, Linyi and Zhang, Yue. How Likely Do LLM s with C o T Mimic Human Reasoning?. Proceedings of the 31st International Conference on Computational Linguistics. 2025
2025
-
[23]
Annual review of psychology , volume=
Deductive reasoning , author=. Annual review of psychology , volume=. 1999 , publisher=
1999
-
[24]
2009 , publisher=
Nudge: Improving decisions about health, wealth, and happiness , author=. 2009 , publisher=
2009
-
[25]
Scientific Reports , volume=
Pause before action: Waiting short time as a simple and resource-rational boost , author=. Scientific Reports , volume=. 2025 , publisher=
2025
-
[26]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding r1-zero-like training: A critical perspective , author=. arXiv preprint arXiv:2503.20783 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[28]
BBQ : A hand-built bias benchmark for question answering
Parrish, Alicia and Chen, Angelica and Nangia, Nikita and Padmakumar, Vishakh and Phang, Jason and Thompson, Jana and Htut, Phu Mon and Bowman, Samuel. BBQ : A hand-built bias benchmark for question answering. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.165
-
[29]
C row S -Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R. C row S -Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.154
-
[30]
Evaluating Gender Bias of LLM s in Making Morality Judgements
Bajaj, Divij and Lei, Yuanyuan and Tong, Jonathan and Huang, Ruihong. Evaluating Gender Bias of LLM s in Making Morality Judgements. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.928
-
[31]
S tereo S et: Measuring stereotypical bias in pretrained language models
Nadeem, Moin and Bethke, Anna and Reddy, Siva. S tereo S et: Measuring stereotypical bias in pretrained language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.416
-
[32]
W ino Q ueer: A Community-in-the-Loop Benchmark for Anti- LGBTQ + Bias in Large Language Models
Felkner, Virginia and Chang, Ho-Chun Herbert and Jang, Eugene and May, Jonathan. W ino Q ueer: A Community-in-the-Loop Benchmark for Anti- LGBTQ + Bias in Large Language Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.507
-
[33]
Gemini: A Family of Highly Capable Multimodal Models
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
Mixtral of experts , author=. arXiv preprint arXiv:2401.04088 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
Palm 2 technical report , author=. arXiv preprint arXiv:2305.10403 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
The Falcon Series of Open Language Models
The falcon series of open language models , author=. arXiv preprint arXiv:2311.16867 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[37]
2023 , eprint=
Mistral 7B , author=. 2023 , eprint=
2023
-
[38]
Qwen2 technical report , author=. arXiv preprint arXiv:2407.10671 , volume=
work page internal anchor Pith review Pith/arXiv arXiv
-
[39]
Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[40]
and Lebrecht, Sophie and Choi, Yejin and Hajishirzi, Hannaneh and Farhadi, Ali and Dodge, Jesse
Liu, Jiacheng and Blanton, Taylor and Elazar, Yanai and Min, Sewon and Chen, Yen-Sung and Chheda-Kothary, Arnavi and Tran, Huy and Bischoff, Byron and Marsh, Eric and Schmitz, Michael and Trier, Cassidy and Sarnat, Aaron and James, Jenna and Borchardt, Jon and Kuehl, Bailey and Cheng, Evie Yu-Yen and Farley, Karen and Anderson, Taira and Albright, David a...
-
[41]
Membership Inference Attacks against Language Models via Neighbourhood Comparison
Mattern, Justus and Mireshghallah, Fatemehsadat and Jin, Zhijing and Schoelkopf, Bernhard and Sachan, Mrinmaya and Berg-Kirkpatrick, Taylor. Membership Inference Attacks against Language Models via Neighbourhood Comparison. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.719
-
[42]
Stereotyping N orwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets
Blodgett, Su Lin and Lopez, Gilsinia and Olteanu, Alexandra and Sim, Robert and Wallach, Hanna. Stereotyping N orwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1...
-
[43]
Aristotle’s logic , author=
-
[44]
, author=
Stereotypes and prejudice: Their automatic and controlled components. , author=. Journal of personality and social psychology , volume=. 1989 , publisher=
1989
-
[45]
2001 , publisher=
Justice as fairness: A restatement , author=. 2001 , publisher=
2001
-
[46]
1996 , publisher=
Practical philosophy , author=. 1996 , publisher=
1996
-
[47]
The Is-Ought Question: A Collection of Papers on the Central Problem in Moral Philosophy , pages=
Hume on ‘is’ and ‘ought’ , author=. The Is-Ought Question: A Collection of Papers on the Central Problem in Moral Philosophy , pages=. 1969 , publisher=
1969
-
[48]
Computational Linguistics , pages=
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models , author=. Computational Linguistics , pages=. 2025 , publisher=
2025
-
[49]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[50]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Universal and transferable adversarial attacks on aligned language models , author=. arXiv preprint arXiv:2307.15043 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[51]
Ji, Ziwei and Yu, Tiezheng and Xu, Yan and Lee, Nayeon and Ishii, Etsuko and Fung, Pascale. Towards Mitigating LLM Hallucination via Self Reflection. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.123
-
[52]
T as T e: Teaching Large Language Models to Translate through Self-Reflection
Wang, Yutong and Zeng, Jiali and Liu, Xuebo and Meng, Fandong and Zhou, Jie and Zhang, Min. T as T e: Teaching Large Language Models to Translate through Self-Reflection. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.333
-
[53]
H ot F lip: White-Box Adversarial Examples for Text Classification
Ebrahimi, Javid and Rao, Anyi and Lowd, Daniel and Dou, Dejing. H ot F lip: White-Box Adversarial Examples for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2006
-
[54]
and Wallace, Eric and Singh, Sameer
Shin, Taylor and Razeghi, Yasaman and Logan IV, Robert L. and Wallace, Eric and Singh, Sameer. A uto P rompt: E liciting K nowledge from L anguage M odels with A utomatically G enerated P rompts. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.346
-
[55]
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie. Black-Box Prompt Optimization: Aligning Large Language Models without Model Training. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.176
-
[56]
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
Singla, Somanshu and Wang, Zhen and Liu, Tianyang and Ashfaq, Abdullah and Hu, Zhiting and Xing, Eric P. Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1220
-
[57]
ACM computing surveys , volume=
Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , author=. ACM computing surveys , volume=. 2023 , publisher=
2023
-
[58]
arXiv preprint arXiv:2312.12321 , year=
Bypassing the safety training of open-source llms with priming attacks , author=. arXiv preprint arXiv:2312.12321 , year=
-
[59]
Jailbreaking Leading Safety-Aligned
Maksym Andriushchenko and Francesco Croce and Nicolas Flammarion , booktitle=. Jailbreaking Leading Safety-Aligned. 2025 , url=
2025
-
[60]
Psychometrika , volume=
Note on the sampling error of the difference between correlated proportions or percentages , author=. Psychometrika , volume=. 1947 , publisher=
1947
-
[61]
proceedings of the Conference on Fairness, Accountability, and Transparency , pages=
Bias in bios: A case study of semantic representation bias in a high-stakes setting , author=. proceedings of the Conference on Fairness, Accountability, and Transparency , pages=
-
[62]
Zara Hall and Melanie Subbiah and Thomas P Zollo and Kathleen McKeown and Richard Zemel , booktitle=. Guiding. 2025 , url=
2025
-
[63]
2016 , publisher=
Big data: A report on algorithmic systems, opportunity, and civil rights , author=. 2016 , publisher=
2016
-
[64]
International conference on machine learning , pages=
Fairness in reinforcement learning , author=. International conference on machine learning , pages=. 2017 , organization=
2017
-
[65]
Quantifying and Reducing Stereotypes in Word Embeddings
Quantifying and reducing stereotypes in word embeddings , author=. arXiv preprint arXiv:1606.06121 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[66]
Advances in neural information processing systems , volume=
Man is to computer programmer as woman is to homemaker? debiasing word embeddings , author=. Advances in neural information processing systems , volume=
-
[67]
Kelly is a Warm Person, Joseph is a Role Model
“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
2023
-
[68]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Bias unveiled: Investigating social bias in LLM-Generated Code , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[69]
International Conference on Learning Representations , year=
JUSTICE OR PREJUDICE? QUANTIFYING BIASES IN LLM-AS-A-JUDGE , author=. International Conference on Learning Representations , year=
-
[70]
The Eleventh International Conference on Learning Representations , year=
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , author=. The Eleventh International Conference on Learning Representations , year=
-
[71]
The Eleventh International Conference on Learning Representations , year=
Decomposed Prompting: A Modular Approach for Solving Complex Tasks , author=. The Eleventh International Conference on Learning Representations , year=
-
[72]
The eleventh international conference on learning representations , year=
Automatic chain of thought prompting in large language models , author=. The eleventh international conference on learning representations , year=
-
[73]
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Boosting language models reasoning with chain-of-knowledge prompting , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[74]
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation , author=. arXiv preprint arXiv:2510.17062 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[75]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Steering away from harm: An adaptive approach to defending vision language model against jailbreaks , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[76]
The Twelfth International Conference on Learning Representations , year=
RAIN: Your Language Models Can Align Themselves without Finetuning , author=. The Twelfth International Conference on Learning Representations , year=
-
[77]
arXiv preprint arXiv:2310.14735 , year=
Unleashing the potential of prompt engineering in large language models: a comprehensive review , author=. arXiv preprint arXiv:2310.14735 , year=
-
[78]
Forty-second International Conference on Machine Learning , year=
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback , author=. Forty-second International Conference on Machine Learning , year=
-
[79]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
Multitask instruction-based prompting for fallacy recognition , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
2022
-
[80]
Self-reflection in llm agents: Effects on problem-solving performance , author=. arXiv preprint arXiv:2405.06682 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.