Interpreting Neural Networks to Improve Politeness Comprehension

Malika Aubakirova , Mohit Bansal

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords neuralpolitenessnetworksfeaturesmodelsnetworkseveralunderstanding

read the original abstract

We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new features, reduce the accuracy gap between the original featurized system and the neural model, thus providing a clear quantitative interpretation of the success of these neural networks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus
cs.CL 2026-04 unverdicted novelty 6.0

Politeness in prompts boosts average LLM response quality by up to 11% but the benefit is language- and model-dependent, with English favoring courteous tones, Hindi deferential ones, and Spanish assertive ones.