Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
Introduces a benchmark using logical rules from knowledge graphs to generate multi-hop questions that evaluate whether knowledge edits in LLMs propagate to entailed facts, finding up to 24% performance gaps for methods like ROME and FT.
citing papers explorer
-
Benchmarking Knowledge Editing using Logical Rules
Introduces a benchmark using logical rules from knowledge graphs to generate multi-hop questions that evaluate whether knowledge edits in LLMs propagate to entailed facts, finding up to 24% performance gaps for methods like ROME and FT.