Learning Interpretable Text Signals for Structured Responses

Ben Powell; Cixiao Jiang; Niall MacKay

read the original abstract

Textual data are often collected alongside structured response variables, but prediction and interpretation are commonly treated as separate tasks. This paper studies rating prediction as an initial case of interpretable text-response modelling, where the aim is to learn textual representations that are both semantically meaningful and aligned with an external response. We propose a joint non-negative matrix factorisation and binomial regression model, in which the document-topic representation is learned from both text reconstruction and rating prediction. Simulation experiments and a real-world review dataset show that the model can recover stable response-relevant textual signals and achieve competitive performance against linear and ridge regression baselines. The framework provides a practical step towards interpretable modelling of text-linked outcomes, with potential extensions to other response types beyond bounded ratings.

Learning Interpretable Text Signals for Structured Responses

discussion (0)