Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

Huang, Po-Sen, Zhang, Huan, Jiang, Ray, Stanforth, Robert, Welbl, Johannes, Rae, Jack · 2020 · DOI 10.18653/v1/2020.findings-emnlp.7

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

citing papers explorer

Showing 2 of 2 citing papers.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain cs.CL · 2026-05-09 · unverdicted · none · ref 75
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 181
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

fields

years

verdicts

representative citing papers

citing papers explorer