Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations

Abhinav Gupta; Jesse Thomason; Toben H. Mintz

arxiv: 2602.00469 · v2 · submitted 2026-01-31 · 💻 cs.CL · cs.AI

Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations

Abhinav Gupta , Toben H. Mintz , Jesse Thomason This is my paper

Pith reviewed 2026-05-16 09:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords sensorimotor normsword embeddingsnonce wordsphonosthemeslexical groundingprojection modelbehavioral validation

0 comments

The pith

A learned projection called SENSE maps word embeddings to sensorimotor norms and aligns with human judgments on made-up words.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SENSE, a model trained to predict scores on the eleven Lancaster sensorimotor dimensions straight from standard word embeddings. Researchers tested the model by asking participants to pick which nonce words felt linked to particular sensations or movements, and found reliable agreement in six modalities. The same nonce-word data also revealed recurring sound patterns associated with internal bodily states. This work shows how purely textual statistics can recover measurable traces of embodied meaning.

Core claim

SENSE learns a direct projection from lexical embeddings onto the Lancaster sensorimotor norms. A behavioral experiment with 281 participants who chose nonce words for specific sensorimotor associations produced statistically significant correlations with the model's ratings in six of the eleven modalities. Sublexical inspection of the selected nonce words further identified systematic phonosthemic patterns for the interoceptive norm.

What carries the argument

The SENSE projection model that maps word embeddings to the eleven Lancaster sensorimotor norm scores.

If this is right

Embeddings can approximate sensorimotor ratings for arbitrary new words without additional human data collection.
Nonce-word selection tasks provide an independent behavioral check on whether learned representations carry grounded information.
Text corpora alone can surface candidate phonosthemes tied to particular sensory or motor dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same projection technique could be used to rank or generate words that are intended to evoke chosen sensory qualities.
Applying the model to contextual rather than static embeddings might show how surrounding text alters perceived sensorimotor strength.
Similar mappings could link embeddings to other grounded properties such as emotional or spatial dimensions.

Load-bearing premise

Participant choices of nonce words reflect genuine sensorimotor associations rather than biases introduced by how the words were generated or how responses were collected.

What would settle it

A replication of the participant study with a fresh group that finds no correlation between their nonce-word selections and the SENSE ratings would falsify the validation.

read the original abstract

While word embeddings derive meaning from co-occurrence patterns, human language understanding is grounded in sensory and motor experience. We present $\text{SENSE}$ $(\textbf{S}\text{ensorimotor }$ $\textbf{E}\text{mbedding }$ $\textbf{N}\text{orm }$ $\textbf{S}\text{coring }$ $\textbf{E}\text{ngine})$, a learned projection model that predicts Lancaster sensorimotor norms from word lexical embeddings. We also conducted a behavioral study where 281 participants selected which among candidate nonce words evoked specific sensorimotor associations, finding statistically significant correlations between human selection rates and $\text{SENSE}$ ratings across 6 of the 11 modalities. Sublexical analysis of these nonce words selection rates revealed systematic phonosthemic patterns for the interoceptive norm, suggesting a path towards computationally proposing candidate phonosthemes from text data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SENSE is a learned projection from embeddings to Lancaster sensorimotor norms, backed by a nonce-word behavioral study that hits significance in six modalities, but the abstract leaves the model and stats too thin to judge strength.

read the letter

The core of this paper is a projection model called SENSE that takes standard word embeddings and maps them onto the eleven Lancaster sensorimotor dimensions. They train it on real words, then test it on nonce words by having 281 participants pick which made-up items feel like they match a given modality. The abstract reports significant correlations in six of those modalities and adds a phonosthemic look at the interoceptive results.

Referee Report

2 major / 3 minor

Summary. The paper introduces SENSE, a learned projection model that maps static word embeddings to the 11 Lancaster sensorimotor norms. It trains the projection on real-word embeddings, applies it to nonce-word embeddings, and reports statistically significant correlations with human selection rates from a behavioral study (N=281) in 6 of the 11 modalities; it further identifies systematic phonosthemic patterns in the interoceptive norm via sublexical analysis of the nonce items.

Significance. If the empirical results hold, the work offers a practical computational link between distributional embeddings and grounded sensorimotor experience, allowing prediction of sensorimotor ratings for novel or nonce forms and opening a route to data-driven phonostheme discovery. The independent behavioral validation on nonce words supplies external grounding that goes beyond training-data circularity.

major comments (2)

[Abstract] Abstract: the headline claim of statistically significant correlations across 6 modalities is presented without any accompanying information on model architecture, training objective, statistical tests, effect sizes, or correction for multiple comparisons; these omissions are load-bearing for evaluating whether the reported correlations support the generalization claim.
[Behavioral validation] Behavioral validation section: the paper does not report baseline comparisons (e.g., raw embedding cosine similarity, random projections, or majority-class predictors) against which the SENSE projection's incremental predictive value can be assessed; without such controls it is difficult to determine whether the learned mapping adds explanatory power beyond the input embeddings themselves.

minor comments (3)

Ensure that every reported correlation includes the exact test statistic, degrees of freedom, p-value, and effect size (e.g., r or R^{2}) in both the text and any summary table.
Clarify how nonce-word embeddings are obtained (subword tokenization? averaging?) and whether the projection parameters are frozen before the behavioral evaluation.
The sublexical phonostheme analysis would benefit from an explicit description of the feature extraction method and a control for word length or frequency confounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive recommendation for minor revision. We address each major comment below and will incorporate the suggested changes in the revised manuscript to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of statistically significant correlations across 6 modalities is presented without any accompanying information on model architecture, training objective, statistical tests, effect sizes, or correction for multiple comparisons; these omissions are load-bearing for evaluating whether the reported correlations support the generalization claim.

Authors: We agree that the abstract would benefit from additional methodological details to support the headline claim. In the revised manuscript, we will update the abstract to include a brief description of the SENSE model (a learned linear projection trained with mean squared error loss on real-word embeddings), the statistical approach (Pearson's correlations with significance determined after Bonferroni correction for 11 comparisons), and mention of effect sizes. This will help readers evaluate the strength of the generalization claim without exceeding abstract length constraints. revision: yes
Referee: [Behavioral validation] Behavioral validation section: the paper does not report baseline comparisons (e.g., raw embedding cosine similarity, random projections, or majority-class predictors) against which the SENSE projection's incremental predictive value can be assessed; without such controls it is difficult to determine whether the learned mapping adds explanatory power beyond the input embeddings themselves.

Authors: We concur that baseline comparisons are important for demonstrating the added value of the SENSE projection. In the revised manuscript, we will include these in the Behavioral validation section. Specifically, we will report results for: raw cosine similarity between nonce embeddings and norm-associated word averages, random projection baselines, and a simple majority-class predictor from the behavioral data. These controls will allow assessment of whether the learned mapping provides incremental predictive power. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external norms and independent human validation

full rationale

The SENSE model is trained as a learned projection from existing word embeddings onto the external Lancaster sensorimotor norms for real words. Predicted ratings for nonce words are generated by applying this trained projection to their embeddings and then compared to selection rates from a separate behavioral experiment with 281 independent human participants. Because the training targets (Lancaster norms) and the validation data (human nonce selections) originate outside the model's fitted parameters and are not redefined or derived from the same inputs, no step reduces by construction to self-definition, fitted-input renaming, or self-citation load-bearing. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on a learnable mapping from co-occurrence embeddings to sensorimotor norms plus the validity of the human judgment task as an external benchmark.

free parameters (1)

SENSE projection parameters
Weights of the learned projection model fitted to map embeddings onto the 11 Lancaster norm dimensions.

axioms (1)

domain assumption Lancaster sensorimotor norms are valid and reliable measures of human sensory and motor experience
These norms serve as the training targets and evaluation ground truth.

pith-pipeline@v0.9.0 · 5446 in / 1259 out tokens · 29388 ms · 2026-05-16T09:24:40.298998+00:00 · methodology

Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)