Investigating the structure of emotions by analyzing similarity and association of emotion words

Fumitaka Iwaki; Tatsuji Takahashi

arxiv: 2602.06430 · v2 · submitted 2026-02-06 · 💻 cs.CL · cs.AI

Investigating the structure of emotions by analyzing similarity and association of emotion words

Fumitaka Iwaki , Tatsuji Takahashi This is my paper

Pith reviewed 2026-05-16 07:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords emotion structuresemantic networksPlutchik wheelcommunity detectionsimilarity ratingsassociation ratingsemotion wordssentiment analysis

0 comments

The pith

Networks from emotion word similarity and association ratings largely match Plutchik's wheel but show local differences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds semantic networks from human ratings of how similar or associated pairs of emotion words are, then applies community detection to reveal their structure. It compares this structure to Plutchik's wheel, which arranges emotions in a circle with primary emotions and blends. The networks turn out mostly similar to the wheel in overall layout, yet they differ in specific local groupings of words. A reader would care because the wheel is widely used in natural language processing for sentiment and emotion analysis, so empirical checks on its fit matter for model accuracy. The work tests whether language-based data supports the circular model or requires adjustments.

Core claim

The authors collected similarity and association ratings for ordered pairs of emotion words, built two corresponding networks, and ran community detection on each. Both networks exhibited a structure that, for the most part, aligned with Plutchik's wheel of emotion, yet displayed local differences in how certain emotions clustered or connected.

What carries the argument

Semantic networks of emotion words derived from pairwise similarity and association ratings, with community detection used to extract and compare their structure against Plutchik's circular model.

If this is right

Plutchik's wheel offers a reasonable global approximation for organizing emotion words in language data.
Local differences in the networks point to specific emotion pairs or clusters that may need refinement in NLP applications.
Similarity and association data can serve as an empirical basis for validating or adjusting psychological emotion models.
Community detection methods can extract interpretable structure from emotion word networks for further analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Sentiment analysis systems could replace rigid wheel categories with network-derived clusters to capture observed local variations.
The local differences might stem from linguistic or cultural factors and could be tested by repeating the experiment across languages.
If the networks predict human judgments better than the wheel alone, they could guide improved emotion lexicons for text processing.
Combining these networks with other models, such as dimensional affect spaces, might resolve the local mismatches.

Load-bearing premise

Ratings of similarity and association between emotion words accurately reflect the underlying structure of human emotions, and community detection on the resulting networks provides a valid test of Plutchik's wheel.

What would settle it

A replication study in which new participants provide ratings that produce networks whose detected communities show no overall structural resemblance to the wheel's primary emotion categories and blends.

read the original abstract

In the field of natural language processing, some studies have attempted sentiment analysis on text by handling emotions as explanatory or response variables. One of the most popular emotion models used in this context is the wheel of emotion proposed by Plutchik. This model schematizes human emotions in a circular structure, and represents them in two or three dimensions. However, the validity of Plutchik's wheel of emotion has not been sufficiently examined. This study investigated the validity of the wheel by creating and analyzing a semantic networks of emotion words. Through our experiments, we collected data of similarity and association of ordered pairs of emotion words, and constructed networks using these data. We then analyzed the structure of the networks through community detection, and compared it with that of the wheel of emotion. The results showed that each network's structure was, for the most part, similar to that of the wheel of emotion, but locally different.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New ratings of emotion word similarities and associations get turned into networks whose communities mostly match Plutchik's wheel but leave the circular order and opposite pairs untested.

read the letter

This paper collects fresh human ratings on similarity and association for pairs of emotion words, builds networks from them, runs community detection, and reports that the resulting structures are mostly like Plutchik's wheel but differ locally. That is the core piece of work: new data plus a straightforward network comparison instead of another theoretical restatement of the model. The data collection itself is useful because Plutchik's wheel is widely used in NLP sentiment work yet has not been checked against word-level judgments in this way before. The authors deserve credit for generating the ratings rather than recycling old ones. The comparison is honest on its own terms and shows the expected broad alignment with basic emotion clusters. The soft spots are real but not fatal. The abstract gives no sample size, rating scale, or statistical details on how the communities were matched to the wheel, so the strength of the “mostly similar” claim is hard to judge from what is shown. More importantly, community detection recovers groups of related terms but does not automatically recover the wheel’s circular sequence or its designated opposite pairs. No quantitative check of those geometric features appears in the description, which means the validation is narrower than the headline suggests. The paper is aimed at researchers who already use Plutchik’s model in text analysis and want an empirical anchor for it. A reader working on emotion lexicons or affective computing would get value from the new ratings even if the structural test is incomplete. It is not a major theoretical step, but the data are new and the method is transparent enough that referees could usefully tighten the analysis and clarify exactly what was compared. I would send it to peer review rather than desk reject it.

Referee Report

2 major / 1 minor

Summary. The paper collects human ratings of similarity and association for pairs of emotion words, constructs semantic networks from these data, applies community detection to recover clusters, and compares the resulting structures to Plutchik's wheel of emotions. The central claim is that the networks are mostly similar to the wheel but exhibit local differences.

Significance. If the empirical comparison holds under quantitative scrutiny, the work supplies a data-driven check on a popular emotion model used in NLP sentiment analysis. The network-construction approach is a reasonable empirical probe and could help refine emotion lexicons, but its value depends on whether the method actually tests the wheel's distinctive geometric features rather than generic clustering.

major comments (2)

[Abstract] Abstract: the headline result that network structure is 'for the most part similar' to the wheel yet 'locally different' rests on community detection alone. This procedure recovers clusters of related terms but supplies no quantitative test of the wheel's circular ordering, intensity gradients, or the eight designated opposite pairs; without such a metric the similarity judgment is unanchored.
[Methods] Methods (data collection and analysis sections): the abstract and reported results omit sample size, rating-scale details, inter-rater reliability statistics, and any controls for response bias or order effects. These omissions make the reported 'local differences' impossible to evaluate for robustness.

minor comments (1)

[Abstract] Abstract: the phrase 'ordered pairs of emotion words' is ambiguous; clarify whether directionality is preserved in the networks or whether ratings are symmetrized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript to incorporate additional quantitative tests and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract: the headline result that network structure is 'for the most part similar' to the wheel yet 'locally different' rests on community detection alone. This procedure recovers clusters of related terms but supplies no quantitative test of the wheel's circular ordering, intensity gradients, or the eight designated opposite pairs; without such a metric the similarity judgment is unanchored.

Authors: We agree that community detection primarily identifies clusters and does not directly quantify the wheel's circular ordering, intensity gradients, or opposite-pair structure. In the revised manuscript we have added two quantitative metrics: (1) Spearman correlation between network shortest-path distances and angular distances derived from Plutchik's wheel, and (2) a direct comparison of edge weights between the eight designated opposite pairs. These additions anchor the similarity claim beyond clustering alone while preserving the original community-detection results. revision: yes
Referee: [Methods] Methods (data collection and analysis sections): the abstract and reported results omit sample size, rating-scale details, inter-rater reliability statistics, and any controls for response bias or order effects. These omissions make the reported 'local differences' impossible to evaluate for robustness.

Authors: We acknowledge the omissions. The revised manuscript now reports the participant sample size, the precise 7-point Likert scales used for similarity and association ratings, intraclass correlation coefficients for inter-rater reliability, and the randomization procedure applied to word-pair presentation order to reduce sequence and response biases. These details allow readers to assess the robustness of the reported local differences. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ratings and community detection are independent of Plutchik wheel

full rationale

The paper collects fresh pairwise similarity and association ratings for emotion words, builds semantic networks from these data, applies community detection, and performs a direct structural comparison to Plutchik's wheel. No equations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the derivation chain; the central claim rests on newly gathered empirical observations rather than any reduction to prior inputs or author-specific assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that word similarity and association ratings capture the true structure of emotions; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Similarity and association ratings of emotion words reflect the underlying structure of human emotions
The paper treats collected ratings as direct evidence for or against the wheel model without independent validation of this mapping.

pith-pipeline@v0.9.0 · 5450 in / 1285 out tokens · 33452 ms · 2026-05-16T07:21:47.133032+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction (8-tick period, Tick ≃ LogicNat) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

When we set the parameter of resolution α as .001, both networks were decomposed to 8 communities... NMI ... to similarity network was 0.81... association network it was 0.72.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking (D=3 circular topology) refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

the wheel of emotion... allocates eight primary emotions on it... A pair of emotions next to each other are similar, and an emotion and another across from it are opposite emotions.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.