Concept-level explainability for auditing and steering LLM responses

Kenza Amara, Rita Sevastjanova, Mennatallah El-Assady · arXiv 2505.07610

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Investigating Linguistic Steering: An Analysis of Adjectival Effects Across Large Language Model Architectures

cs.CL · 2026-04-28 · unverdicted · novelty 6.0

Shapley value analysis identifies powerful adjectives that steer MMLU performance in model-family-specific patterns, with non-additive interactions emerging in larger models.

citing papers explorer

Showing 1 of 1 citing paper.

Investigating Linguistic Steering: An Analysis of Adjectival Effects Across Large Language Model Architectures cs.CL · 2026-04-28 · unverdicted · none · ref 1
Shapley value analysis identifies powerful adjectives that steer MMLU performance in model-family-specific patterns, with non-additive interactions emerging in larger models.

Concept-level explainability for auditing and steering LLM responses

fields

years

verdicts

representative citing papers

citing papers explorer