Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations
Pith reviewed 2026-05-16 09:24 UTC · model grok-4.3
The pith
A learned projection called SENSE maps word embeddings to sensorimotor norms and aligns with human judgments on made-up words.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SENSE learns a direct projection from lexical embeddings onto the Lancaster sensorimotor norms. A behavioral experiment with 281 participants who chose nonce words for specific sensorimotor associations produced statistically significant correlations with the model's ratings in six of the eleven modalities. Sublexical inspection of the selected nonce words further identified systematic phonosthemic patterns for the interoceptive norm.
What carries the argument
The SENSE projection model that maps word embeddings to the eleven Lancaster sensorimotor norm scores.
If this is right
- Embeddings can approximate sensorimotor ratings for arbitrary new words without additional human data collection.
- Nonce-word selection tasks provide an independent behavioral check on whether learned representations carry grounded information.
- Text corpora alone can surface candidate phonosthemes tied to particular sensory or motor dimensions.
Where Pith is reading between the lines
- The same projection technique could be used to rank or generate words that are intended to evoke chosen sensory qualities.
- Applying the model to contextual rather than static embeddings might show how surrounding text alters perceived sensorimotor strength.
- Similar mappings could link embeddings to other grounded properties such as emotional or spatial dimensions.
Load-bearing premise
Participant choices of nonce words reflect genuine sensorimotor associations rather than biases introduced by how the words were generated or how responses were collected.
What would settle it
A replication of the participant study with a fresh group that finds no correlation between their nonce-word selections and the SENSE ratings would falsify the validation.
read the original abstract
While word embeddings derive meaning from co-occurrence patterns, human language understanding is grounded in sensory and motor experience. We present $\text{SENSE}$ $(\textbf{S}\text{ensorimotor }$ $\textbf{E}\text{mbedding }$ $\textbf{N}\text{orm }$ $\textbf{S}\text{coring }$ $\textbf{E}\text{ngine})$, a learned projection model that predicts Lancaster sensorimotor norms from word lexical embeddings. We also conducted a behavioral study where 281 participants selected which among candidate nonce words evoked specific sensorimotor associations, finding statistically significant correlations between human selection rates and $\text{SENSE}$ ratings across 6 of the 11 modalities. Sublexical analysis of these nonce words selection rates revealed systematic phonosthemic patterns for the interoceptive norm, suggesting a path towards computationally proposing candidate phonosthemes from text data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SENSE, a learned projection model that maps static word embeddings to the 11 Lancaster sensorimotor norms. It trains the projection on real-word embeddings, applies it to nonce-word embeddings, and reports statistically significant correlations with human selection rates from a behavioral study (N=281) in 6 of the 11 modalities; it further identifies systematic phonosthemic patterns in the interoceptive norm via sublexical analysis of the nonce items.
Significance. If the empirical results hold, the work offers a practical computational link between distributional embeddings and grounded sensorimotor experience, allowing prediction of sensorimotor ratings for novel or nonce forms and opening a route to data-driven phonostheme discovery. The independent behavioral validation on nonce words supplies external grounding that goes beyond training-data circularity.
major comments (2)
- [Abstract] Abstract: the headline claim of statistically significant correlations across 6 modalities is presented without any accompanying information on model architecture, training objective, statistical tests, effect sizes, or correction for multiple comparisons; these omissions are load-bearing for evaluating whether the reported correlations support the generalization claim.
- [Behavioral validation] Behavioral validation section: the paper does not report baseline comparisons (e.g., raw embedding cosine similarity, random projections, or majority-class predictors) against which the SENSE projection's incremental predictive value can be assessed; without such controls it is difficult to determine whether the learned mapping adds explanatory power beyond the input embeddings themselves.
minor comments (3)
- Ensure that every reported correlation includes the exact test statistic, degrees of freedom, p-value, and effect size (e.g., r or R^{2}) in both the text and any summary table.
- Clarify how nonce-word embeddings are obtained (subword tokenization? averaging?) and whether the projection parameters are frozen before the behavioral evaluation.
- The sublexical phonostheme analysis would benefit from an explicit description of the feature extraction method and a control for word length or frequency confounds.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive recommendation for minor revision. We address each major comment below and will incorporate the suggested changes in the revised manuscript to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim of statistically significant correlations across 6 modalities is presented without any accompanying information on model architecture, training objective, statistical tests, effect sizes, or correction for multiple comparisons; these omissions are load-bearing for evaluating whether the reported correlations support the generalization claim.
Authors: We agree that the abstract would benefit from additional methodological details to support the headline claim. In the revised manuscript, we will update the abstract to include a brief description of the SENSE model (a learned linear projection trained with mean squared error loss on real-word embeddings), the statistical approach (Pearson's correlations with significance determined after Bonferroni correction for 11 comparisons), and mention of effect sizes. This will help readers evaluate the strength of the generalization claim without exceeding abstract length constraints. revision: yes
-
Referee: [Behavioral validation] Behavioral validation section: the paper does not report baseline comparisons (e.g., raw embedding cosine similarity, random projections, or majority-class predictors) against which the SENSE projection's incremental predictive value can be assessed; without such controls it is difficult to determine whether the learned mapping adds explanatory power beyond the input embeddings themselves.
Authors: We concur that baseline comparisons are important for demonstrating the added value of the SENSE projection. In the revised manuscript, we will include these in the Behavioral validation section. Specifically, we will report results for: raw cosine similarity between nonce embeddings and norm-associated word averages, random projection baselines, and a simple majority-class predictor from the behavioral data. These controls will allow assessment of whether the learned mapping provides incremental predictive power. revision: yes
Circularity Check
No significant circularity; derivation relies on external norms and independent human validation
full rationale
The SENSE model is trained as a learned projection from existing word embeddings onto the external Lancaster sensorimotor norms for real words. Predicted ratings for nonce words are generated by applying this trained projection to their embeddings and then compared to selection rates from a separate behavioral experiment with 281 independent human participants. Because the training targets (Lancaster norms) and the validation data (human nonce selections) originate outside the model's fitted parameters and are not redefined or derived from the same inputs, no step reduces by construction to self-definition, fitted-input renaming, or self-citation load-bearing. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- SENSE projection parameters
axioms (1)
- domain assumption Lancaster sensorimotor norms are valid and reliable measures of human sensory and motor experience
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.