arxiv: 2604.16138 · v1 · submitted 2026-04-17 · 💻 cs.CL · cs.LG

Recognition: unknown

Sentiment Analysis of German Sign Language Fairy Tales

Fabrizio Nunnari , Siddhant Jain , Patrick Gebhard

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:14 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords sentiment analysisGerman sign languageDGSMediaPipeXGBoostbody motionfacial featuresfairy tales

0 comments

The pith

Sentiment in German sign language fairy tales can be detected from video motion features at 0.631 balanced accuracy, with body movements contributing as much as facial ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first applies large language models to German fairy tale texts and uses majority voting to assign each segment a negative, neutral, or positive valence label. These labels are paired with corresponding video recordings of the tales performed in German Sign Language. Motion features are extracted from the face and body in each video clip using MediaPipe, after which an XGBoost classifier is trained to predict the three sentiment classes directly from the features. The resulting model reaches an average balanced accuracy of 0.631, and inspection of the most predictive features shows that movements of the hips, elbows, and shoulders rank alongside eyebrow and mouth motions. Readers interested in visual languages would care because the work indicates that emotional information travels through the whole body rather than residing only in the face.

Core claim

The authors establish a pipeline that transfers valence labels from fairy-tale text to aligned German Sign Language videos, extracts face and body motion features via MediaPipe, and trains an explainable XGBoost model that classifies negative, neutral, or positive sentiment at 0.631 balanced accuracy. Feature-importance analysis reveals that hip, elbow, and shoulder motion contribute substantially to discrimination, supporting the claim that face and body play equal roles in conveying sentiment in sign language.

What carries the argument

XGBoost classifier trained on MediaPipe-extracted motion features from face and body landmarks, followed by feature-importance ranking

If this is right

Sentiment recognition systems for sign language must incorporate full-body tracking rather than face-only analysis.
Text-based labeling via majority vote of language models can serve as a scalable proxy when direct video annotation is expensive.
The same feature-extraction and classification approach can be applied to additional stories or genres in German Sign Language.
Linguistic descriptions of sign language should treat torso and limb dynamics as primary channels for emotional expression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pipeline could be ported to other sign languages once comparable video-text alignments are available.
Real-time applications such as sentiment-aware captioning or translation tools would benefit from full-body rather than head-only capture.
If body landmarks remain important across larger and more varied datasets, curriculum design for sign-language interpreters may need to emphasize whole-body expressivity.

Load-bearing premise

The valence labels produced by large-language-model analysis of the written fairy-tale texts match the sentiments actually expressed in the corresponding sign-language video performances.

What would settle it

Independent ratings collected from fluent signers who watch only the videos and assign negative, neutral, or positive labels; if agreement with the text-derived labels falls well below the model's 0.631 accuracy, the training signal is shown to be misaligned.

Figures

Figures reproduced from arXiv: 2604.16138 by Fabrizio Nunnari, Patrick Gebhard, Siddhant Jain.

**Figure 2.** Figure 2: Overview of our dataset preparation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of sentiment among the 574 sentences of our dataset. atic case was for neutral labels predicted as negative. 4.2 Baseline comparison Given the seminal status of the research in sentiment analysis for sign language, we could not identify any suitable baseline for a direct comparison on sign language videos analysis. However, early work on sentiment prediction on text reported an overall Pear… view at source ↗

**Figure 4.** Figure 4: For each fairy tale, the sentiment is plotted against the evolution of the plot. The dashed line [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Test metrics per folder. how she is “jumping” on place when the context of the story gets troublesome. The mean distance between elbows has a positive correlation with valence. Suggesting that more positive utterances are performed by widening the movement of the arms. An exhaustive analysis and explanation of the correlations is beyond the scope of this work. For future linguistic analyses, the complete … view at source ↗

**Figure 6.** Figure 6: Confusion matrix for test fold 1. tion, torso rotation should be compensated for by data normalization, or the stimuli material should be selected to prevent role taking. 6 Conclusions and Future Work We presented a method for the construction of a corpus for sentiment analysis on German sign language fairy tales using LLMs, followed by the pipeline for training an explainable prediction model for the infe… view at source ↗

**Figure 7.** Figure 7: Top 30 features for the prediction of the sentiment labels. Mean and standard deviation among [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels of valence (negative, neutral, positive) on German fairy tales text segments using four large language models (LLMs) and majority voting, reaching an inter-annotator agreement of 0.781 Krippendorff's alpha. Second, we extract face and body motion features from each corresponding DGS video segment using MediaPipe. Finally, we train an explainable model (based on XGBoost) to predict negative, neutral or positive sentiment from video features. Results show an average balanced accuracy of 0.631. A thorough analysis of the most important features reveal that, in addition to eyebrows and mouth motion on the face, also the motion of hips, elbows, and shoulders considerably contribute in the discrimination of the conveyed sentiment, indicating an equal importance of face and body for sentiment communication in sign language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a DGS fairy-tale video dataset with text-derived sentiment labels and reports that body motion features matter about as much as facial ones for prediction, but the labels rest on an untested assumption.

read the letter

The core contribution is a dataset of German Sign Language fairy tale segments paired with negative/neutral/positive labels from LLM majority vote on the text, plus an XGBoost model on MediaPipe face and body features that hits 0.631 balanced accuracy. The feature analysis is the part that stands out: it flags hips, elbows, and shoulders as comparably important to eyebrows and mouth, which is a concrete observation worth noting for sign language work. They also give a solid 0.781 Krippendorff alpha on the text labeling step and use off-the-shelf tools without inventing new ones, which keeps the pipeline transparent. Dataset creation and the body-motion finding are new enough in this specific combination that the paper is not just re-running standard sentiment tools on another language. The numbers are reported plainly, which helps. The main weakness is that the labels come entirely from text analysis with no reported human validation on the actual video performances. Sign language renditions can shift or emphasize valence through body dynamics and non-manuals in ways the source text does not capture, so the 0.631 accuracy and the equal-importance claim both depend on an assumption that is not checked. No dataset size, no error analysis, and no inter-rater study on the videos are mentioned, which leaves the central result only moderately supported. This is the kind of work that belongs in a reading group focused on multimodal or sign-language processing; people building accessibility tools or studying non-manual signals could extract the feature-importance takeaway even if they treat the accuracy figure as preliminary. It is not ready for citation in its current form, but the dataset itself could be useful if released with more documentation. I would send it to peer review so referees can ask for video-level label validation and basic stats on the data split and sample size.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a dataset of German fairy tale texts paired with DGS video performances and proposes a pipeline for sentiment analysis. Text segments are annotated for negative/neutral/positive valence using four LLMs with majority voting, yielding a Krippendorff's alpha of 0.781. Face and body features are extracted from videos via MediaPipe, and an XGBoost classifier is trained to predict the text-derived labels from these features, achieving 0.631 balanced accuracy. Feature analysis concludes that body motions (hips, elbows, shoulders) contribute comparably to facial motions (eyebrows, mouth) in conveying sentiment.

Significance. Should the text labels prove to be reliable indicators of the sentiments performed in the videos, the work would offer a novel dataset and baseline for DGS sentiment analysis, an area with limited prior research. The emphasis on explainable AI to identify key body and face features is a positive aspect, potentially informing future sign language understanding systems. The empirical approach using standard tools provides a reproducible starting point, though the unvalidated proxy assumption caps its current significance.

major comments (3)

Sentiment Labeling section: The sentiment labels are generated exclusively from text using LLMs without any reported validation against the actual sentiments expressed in the corresponding DGS video segments. This is a load-bearing issue for the central claim, as sign language performances can alter textual sentiment through non-manual signals and body dynamics; no human video annotation or alignment study is described.
Results and Evaluation section: The reported balanced accuracy of 0.631 lacks supporting details such as the total number of segments in the dataset, class balance, the specific train/validation/test split used, or the cross-validation strategy. Without these, the performance metric is difficult to interpret or compare.
Feature Importance Analysis section: The conclusion that hips, elbows, and shoulders are equally important to eyebrows and mouth for sentiment discrimination rests on the unvalidated text labels. If the labels do not accurately reflect video content, the feature rankings may not generalize to true sign language sentiment cues.

minor comments (2)

Abstract: The abstract states 'an average balanced accuracy of 0.631' but does not clarify whether this is averaged over classes, folds, or runs, or provide confidence intervals.
Dataset Description: No information is given on the source of the fairy tales, the number of tales, or how video segments were aligned to text segments.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, acknowledging limitations where they exist and outlining planned revisions to improve clarity and transparency.

read point-by-point responses

Referee: Sentiment Labeling section: The sentiment labels are generated exclusively from text using LLMs without any reported validation against the actual sentiments expressed in the corresponding DGS video segments. This is a load-bearing issue for the central claim, as sign language performances can alter textual sentiment through non-manual signals and body dynamics; no human video annotation or alignment study is described.

Authors: We agree this is a core limitation of the work. The labels are derived solely from the German fairy tale texts via LLM majority voting as a proxy for the sentiment expressed in the paired DGS video performances. While the high Krippendorff's alpha of 0.781 indicates consistency in the text annotations, we recognize that sign language performances may modify valence through body dynamics and non-manual signals, and no human validation or alignment study on the videos was conducted. This was due to the substantial resources needed for expert DGS annotators. In the revised manuscript, we will add an explicit limitations subsection discussing the proxy assumption, its potential impact on results, and recommendations for future human-annotated validation studies. We will also frame the contribution more clearly as an initial baseline rather than a direct video sentiment analysis. revision: partial
Referee: Results and Evaluation section: The reported balanced accuracy of 0.631 lacks supporting details such as the total number of segments in the dataset, class balance, the specific train/validation/test split used, or the cross-validation strategy. Without these, the performance metric is difficult to interpret or compare.

Authors: We accept that these experimental details are essential for proper interpretation and reproducibility. The revised Results and Evaluation section will report the total number of text-video segments, the class distribution across negative/neutral/positive labels, the train/validation/test split proportions, and the cross-validation strategy (including number of folds) used for the XGBoost classifier. These additions will enable readers to contextualize the 0.631 balanced accuracy and facilitate comparisons with future work. revision: yes
Referee: Feature Importance Analysis section: The conclusion that hips, elbows, and shoulders are equally important to eyebrows and mouth for sentiment discrimination rests on the unvalidated text labels. If the labels do not accurately reflect video content, the feature rankings may not generalize to true sign language sentiment cues.

Authors: We concur that the feature importance rankings and the claim of comparable contribution from body (hips, elbows, shoulders) and facial (eyebrows, mouth) features are conditional on the proxy text labels being representative of the video content. In the revised Feature Importance Analysis section, we will qualify all conclusions accordingly, explicitly noting the dependence on the unvalidated labels and cautioning that the observed equal importance may not fully generalize without direct video-based sentiment annotations. The emphasis on explainability will be retained but framed with these caveats. revision: partial

standing simulated objections not resolved

Direct human validation or alignment study of the text-derived sentiment labels against the actual sentiments performed in the DGS video segments (no such annotations were collected in the original study).

Circularity Check

0 steps flagged

No circularity: standard empirical ML pipeline with independent components

full rationale

The paper's chain consists of three independent stages: (1) LLM-based majority-vote labeling of text segments (Krippendorff α reported on text only), (2) MediaPipe extraction of face+body keypoints from the corresponding videos, and (3) training an XGBoost classifier to predict the text-derived labels from the video features, followed by post-hoc feature-importance analysis. The balanced accuracy of 0.631 and the relative importance of hips/elbows/shoulders versus eyebrows/mouth are direct empirical outputs of this supervised learning setup; they are not obtained by fitting a parameter to a subset and renaming it a prediction, nor by any self-referential definition or self-citation that bears the central claim. No equation or derivation reduces to its own inputs by construction. The assumption that text labels faithfully proxy video sentiment is a validity issue external to the reported pipeline and does not constitute circularity under the specified criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the transferability of text-derived sentiment labels to sign language videos and on the accuracy of MediaPipe motion tracking; no free parameters are explicitly named in the abstract, and no new entities are postulated.

axioms (2)

domain assumption Majority voting across four LLMs produces reliable three-class sentiment labels for fairy tale text segments.
Invoked to create the ground-truth labels used for training.
domain assumption MediaPipe extracts motion features from face and body that are sufficient to discriminate sentiment in sign language videos.
Invoked in the feature extraction and model training steps.

pith-pipeline@v0.9.0 · 5462 in / 1479 out tokens · 48994 ms · 2026-05-10T08:14:31.733555+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Emotion Anal- ysis as a Regression Problem - Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation

Buechel Sven and Hahn Udo. Emotion Anal- ysis as a Regression Problem - Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation. In Frontiers in Artificial Intelligence and Appli- cations. IOS Press, 2016

2016
[2]

Neural Sign Language Translation

Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. Neural Sign Language Translation. InProceed- ings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), June 2018

2018
[3]

Fast Krippendorff: Fast com- putation of Krippendorff’s alpha agreement measure, 2017

Santiago Castro. Fast Krippendorff: Fast com- putation of Krippendorff’s alpha agreement measure, 2017. Publication Title: GitHub repository

2017
[4]

XGBoost: A Scalable Tree Boosting System

Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. InProceed- ings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, San Francisco Califor- nia USA, August 2016. ACM

2016
[5]

EmoSign: A Multimodal Dataset for Understanding Emo- tions in American Sign Language, 2025

Phoebe Chua, Cathy Mengying Fang, Take- hiko Ohkawa, Raja Kushalnagar, Suranga Nanayakkara, and Pattie Maes. EmoSign: A Multimodal Dataset for Understanding Emo- tions in American Sign Language, 2025. Ver- sion Number: 1. 10 Table 4: List of the 96 prediction features sorted by importance. Part 1: 1-64. F eature Importance mouthSmileRight mean 0.02707 mou...

2025
[6]

Eberhard, Gary F

David M. Eberhard, Gary F. Simons, and Charles D. Fennig.Ethnologue: Languages of the World. SIL International, 28 edition, 2025

2025
[7]

Train- ing a Broad-Coverage German Sentiment Clas- sification Model for Dialog Systems

Oliver Guhr, Anne-Kathrin Schumann, Frank Bahrmann, and Hans Joachim B¨ ohme. Train- ing a Broad-Coverage German Sentiment Clas- sification Model for Dialog Systems. In Nicoletta Calzolari, Fr´ ed´ eric B´ echet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isa- hara, Bente Maegaard, Joseph Mariani, H´ el` ene ...

2020
[8]

Expressing Emotions in Sign Languages (ExEmSiLa 2024), July 2024

Annika Herrmann, Sarah Schwarzenberg, Thomas Finkbeiner, Nina-Kristin Meister, and Markus Steinbach. Expressing Emotions in Sign Languages (ExEmSiLa 2024), July 2024

2024
[9]

A Fairy Tale Gold Standard

Berenike Herrmann and Jana L¨ udtke. A Fairy Tale Gold Standard. Annotation and Analysis of Emotions in the Children’s and Household Tales by the Brothers Grimm. 2023. Medium: HTML ,XML ,PDF Version Number: 1.0

2023
[10]

From dictionaries to LLMs – an evaluation of sentiment analysis tech- niques for German language data.Computa- tional Humanities Research, 1:e4, 2025

Jannis Kl¨ ahn, Janos Borst-Graetz, and Manuel Burghardt. From dictionaries to LLMs – an evaluation of sentiment analysis tech- niques for German language data.Computa- tional Humanities Research, 1:e4, 2025

2025
[11]

SAGE, Los Angeles London New Delhi Singapore Wash- ington DC Melbourne, fourth edition edition, 2019

Klaus Krippendorff.Content analysis: an introduction to its methodology. SAGE, Los Angeles London New Delhi Singapore Wash- ington DC Melbourne, fourth edition edition, 2019

2019
[12]

MediaPipe: A Framework for Building Perception Pipelines

Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, and Matthias Grundmann. MediaPipe: A Framework for Building Perception Pipelines. arXiv:1906.08172 [cs], June 2019. arXiv: 1906.08172

work page internal anchor Pith review arXiv 1906
[13]

Pleasure-arousal- dominance: A general framework for de- scribing and measuring individual differences in Temperament.Current Psychology, 14(4):261–292, December 1996

Albert Mehrabian. Pleasure-arousal- dominance: A general framework for de- scribing and measuring individual differences in Temperament.Current Psychology, 14(4):261–292, December 1996

1996
[14]

From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales

Saif Mohammad. From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales. In Kalliopi Zer- vanou and Piroska Lendvai, editors,Proceed- ings of the 5th ACL-HLT Workshop on Lan- guage Technology for Cultural Heritage, So- cial Sciences, and Humanities, pages 105–114, Portland, OR, USA, June 2011. Association for Computational ...

2011
[15]

DGS-Fabeln-1, July 2024

Fabrizio Nunnari, Eleftherios Avramidis, and Cristina Espa˜ na-Bonet. DGS-Fabeln-1, July 2024

2024
[16]

DGS- Fabeln-1: A Multi-Angle Parallel Corpus of Fairy Tales between German Sign Language and German Text

Fabrizio Nunnari, Eleftherios Avramidis, Cristina Espa˜ na-Bonet, Marco Gonz´ alez, Anna Hennes, and Patrick Gebhard. DGS- Fabeln-1: A Multi-Angle Parallel Corpus of Fairy Tales between German Sign Language and German Text. InProceedings of the 2024 Joint International Conference on Com- putational Linguistics, Language Resources and Evaluation (LREC-COLI...

2024
[17]

Using Deep Learning Models for Multimodal Sen- tence Level Sentiment Analysis of Sign Lan- guage.Forum for Linguistic Studies, April 2025

Osondu Oguike and Mpho Primus. Using Deep Learning Models for Multimodal Sen- tence Level Sentiment Analysis of Sign Lan- guage.Forum for Linguistic Studies, April 2025

2025
[18]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duches- nay. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Re- search, 12:2825–2830, 2011

2011
[19]

Andrew Schwartz, Gregory Park, Johannes Eichstaedt, Margaret Kern, Lyle Ungar, and Elisabeth Shulman

Daniel Preot ¸iuc-Pietro, H. Andrew Schwartz, Gregory Park, Johannes Eichstaedt, Margaret Kern, Lyle Ungar, and Elisabeth Shulman. Modelling Valence and Arousal in Facebook posts. In Alexandra Balahur, Erik van der 12 Goot, Piek Vossen, and Andres Montoyo, edi- tors,Proceedings of the 7th Workshop on Com- putational Approaches to Subjectivity, Senti- ment...

2016
[20]

ICWSM, US, June 2024

Viktor Suter and Miriam Meckel.Using GPT- 4 for Text Analysis: Insights from English and German Language News Classification Tasks. ICWSM, US, June 2024

2024
[21]

Sentiment analysis in sign language.Sig- nal, Image and Video Processing, 19(3):223, March 2025

S ¸eyma Takır, Barı¸ s Bilen, and Do˘ gukan Ar- slan. Sentiment analysis in sign language.Sig- nal, Image and Video Processing, 19(3):223, March 2025

2025
[22]

Volkova, Betty Mohler, Detmar Meurers, Dale Gerdemann, and Heinrich H

Ekaterina P. Volkova, Betty Mohler, Detmar Meurers, Dale Gerdemann, and Heinrich H. B¨ ulthoff. Emotional Perception of Fairy Tales: Achieving Agreement in Emotion Annotation of Text. In Diana Inkpen and Carlo Strap- parava, editors,Proceedings of the NAACL HLT 2010 Workshop on Computational Ap- proaches to Analysis and Generation of Emo- tion in Text, pa...

2010
[23]

Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning

Weizhe Wang and Hongwu Yang. Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning. In2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), pages 1–5, Hong Kong, January 2021. IEEE

2021
[24]

U-Shaped Distribution Guided Sign Language Emotion Recognition With Semantic and Movement Features.IEEE Transactions on Affective Computing, 15(4):2180–2191, October 2024

Jiangtao Zhang, Qingshan Wang, and Qi Wang. U-Shaped Distribution Guided Sign Language Emotion Recognition With Semantic and Movement Features.IEEE Transactions on Affective Computing, 15(4):2180–2191, October 2024. 13

2024