ControBench is a new interaction-aware benchmark combining heterogeneous graphs and rich text for controversial discourse analysis on social networks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
The authors adapt established RCT validity principles from other fields into a standardized framework with 33 guidelines tailored to AI evaluation contexts.
citing papers explorer
-
ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
ControBench is a new interaction-aware benchmark combining heterogeneous graphs and rich text for controversial discourse analysis on social networks.
-
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
-
Principles and Guidelines for Randomized Controlled Trials in AI Evaluation
The authors adapt established RCT validity principles from other fields into a standardized framework with 33 guidelines tailored to AI evaluation contexts.