Transactions of the Association for Computational Linguistics , volume =

Sorami Hisamoto, Matt Post, Kevin Duh , title = · 2020 · DOI 10.1162/tacl_a_00299

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

cs.LG · 2026-06-16 · conditional · novelty 7.0

CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

citing papers explorer

Showing 2 of 2 citing papers.

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models cs.LG · 2026-06-16 · conditional · none · ref 55
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models cs.AI · 2024-06-14 · conditional · none · ref 178
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

Transactions of the Association for Computational Linguistics , volume =

fields

years

verdicts

representative citing papers

citing papers explorer