Smith, and Yejin Choi

Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A · 2020 · DOI 10.18653/v1/2020.acl-main.486

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

BBQ: A Hand-Built Bias Benchmark for Question Answering

cs.CL · 2021-10-15 · accept · novelty 7.0

BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.

How people talk about each other: Modeling Generalized Intergroup Bias and Emotion

cs.CL · 2022-09-14 · unverdicted · novelty 6.0

Introduces the first interpersonal emotion dataset from congressional tweets and demonstrates that joint neural modeling of interpersonal group relationships and emotions yields performance gains on both.

Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

CITA generates Chinese implicit toxicity samples that cause 69.48% average missed detection across seven tested detectors while preserving harmfulness, and the same data improves robustness when used to fine-tune a CITD defense model.

Quantifying and Predicting Disagreement in Graded Human Ratings

cs.CL · 2026-05-01 · unverdicted · novelty 5.0

Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.

PaLM 2 Technical Report

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

citing papers explorer

Showing 6 of 6 citing papers.

BBQ: A Hand-Built Bias Benchmark for Question Answering cs.CL · 2021-10-15 · accept · none · ref 49
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning cs.AI · 2026-04-07 · unverdicted · none · ref 31
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
How people talk about each other: Modeling Generalized Intergroup Bias and Emotion cs.CL · 2022-09-14 · unverdicted · none · ref 24
Introduces the first interpersonal emotion dataset from congressional tweets and demonstrates that joint neural modeling of interpersonal group relationships and emotions yields performance gains on both.
Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting cs.CL · 2026-05-21 · unverdicted · none · ref 8
CITA generates Chinese implicit toxicity samples that cause 69.48% average missed detection across seven tested detectors while preserving harmfulness, and the same data improves robustness when used to fine-tune a CITD defense model.
Quantifying and Predicting Disagreement in Graded Human Ratings cs.CL · 2026-05-01 · unverdicted · none · ref 223
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 129
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

Smith, and Yejin Choi

fields

years

verdicts

representative citing papers

citing papers explorer