A participatory red-teaming project in the Global South created the PLACES dataset of 26k T2I failure examples that reveal unique cultural and linguistic harms missed by existing safety frameworks.
emnlp-main.307/
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Obj-Disco decomposes LLM alignment reward signals into sparse weighted combinations of interpretable natural language objectives via iterative analysis of behavioral changes across checkpoints, capturing over 90% of observed reward behavior.
citing papers explorer
-
Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South
A participatory red-teaming project in the Global South created the PLACES dataset of 26k T2I failure examples that reveal unique cultural and linguistic harms missed by existing safety frameworks.
-
Discovering Implicit Large Language Model Alignment Objectives
Obj-Disco decomposes LLM alignment reward signals into sparse weighted combinations of interpretable natural language objectives via iterative analysis of behavioral changes across checkpoints, capturing over 90% of observed reward behavior.