UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

Amaia Zurinaga; Darya Hryhoryeva; Hamidreza Jamalabadi; Iryna Gurevych

arxiv: 2604.21534 · v2 · pith:CUN64YEZnew · submitted 2026-04-23 · 💻 cs.CL

UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

Darya Hryhoryeva , Amaia Zurinaga , Hamidreza Jamalabadi , Iryna Gurevych This is my paper

Pith reviewed 2026-05-09 21:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords affective computingvalence and arousalLLM promptingemotional dynamicsneural regressionuser embeddingsSemEval taskshort-term change modeling

0 comments

The pith

LLMs capture current emotions from text well, but recent numeric trajectories explain short-term changes better than text semantics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests three methods for tracking both a person's current emotional state and how it shifts over short sequences of their own texts. Large language models are used to read the words for the current state, a structured transition model handles ordered changes, and a neural model adds the recent history of emotion numbers plus user-specific details. Results show text works for the static part while the recent numbers drive the dynamic part more reliably. This distinction matters for building systems that follow emotional flow over time rather than just labeling single messages.

Core claim

Our findings indicate that LLMs effectively capture static affective signals from text, whereas short-term affective variation in this dataset is more strongly explained by recent numeric state trajectories than by textual semantics. The system that combined LLM prompting with a neural regression model using trajectories and user embeddings ranked first in both Subtask 1 and Subtask 2A under the official metric.

What carries the argument

The lightweight neural regression model that incorporates recent affective trajectories and trainable user embeddings, shown to outperform text-based approaches for modeling short-term changes.

Load-bearing premise

The SemEval-2026 Task 2 dataset and evaluation metric provide a valid test of real-world affective dynamics modeling, with no major biases in the chronologically ordered texts or labels.

What would settle it

A follow-up experiment on a new chronologically ordered text dataset where adding text features improves short-term change prediction accuracy beyond what numeric trajectories alone achieve.

read the original abstract

This paper presents our system developed for SemEval-2026 Task 2. The task requires modeling both current affect and short-term affective change in chronologically ordered user-generated texts. We explore three complementary approaches: (1) LLM prompting under user-aware and user-agnostic settings, (2) a pairwise Maximum Entropy (MaxEnt) model with Ising-style interactions for structured transition modeling, and (3) a lightweight neural regression model incorporating recent affective trajectories and trainable user embeddings. Our findings indicate that LLMs effectively capture static affective signals from text, whereas short-term affective variation in this dataset is more strongly explained by recent numeric state trajectories than by textual semantics. Our system ranked first among participating teams in both Subtask 1 and Subtask 2A based on the official evaluation metric.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript describes the UKP_Psycontrol system submitted to SemEval-2026 Task 2, which requires predicting both current valence/arousal levels and short-term affective changes from chronologically ordered user-generated texts. It evaluates three approaches: (1) LLM prompting under user-aware and user-agnostic conditions, (2) a pairwise Maximum Entropy model incorporating Ising-style interactions for transition modeling, and (3) a lightweight neural regression model that uses recent numeric affective trajectories plus trainable user embeddings. The central claim is that LLMs capture static affective signals from text effectively, while short-term dynamics in this dataset are better explained by numeric state trajectories than by textual semantics; the system achieved first place in Subtask 1 and Subtask 2A.

Significance. If the empirical contrast holds after addressing dataset concerns, the work usefully separates static versus dynamic affective modeling and shows that incorporating short-term numeric history can outperform text-only or LLM-based predictors for transitions. The top shared-task ranking and the use of complementary structured and neural methods provide a practical baseline for future user-state tracking systems.

major comments (1)

[Experimental results / Discussion] The headline finding that numeric trajectories outperform textual semantics for short-term change (abstract and experimental results) is load-bearing for the paper's contrast between approaches. The skeptic note correctly flags that this superiority could arise from dataset artifacts such as temporal autocorrelation, stable per-user baselines, or annotation propagation across sequences rather than genuine semantic limitations. No autocorrelation plots, user-level variance decomposition, order-permutation controls, or similar diagnostics appear to be reported; without them the claim that trajectories are 'more strongly explained' than text remains vulnerable to the chronological ordering bias.

minor comments (2)

[Approaches and Experiments] Implementation details, hyper-parameters, exact training procedures, and full quantitative tables (including ablations, error bars, and per-subtask scores) are referenced only at a high level; expanding these would strengthen verifiability.
[Abstract] The abstract states the ranking result but does not include the official metric values or direct comparisons to the other participating systems; adding a concise results table would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address the major comment below regarding potential dataset artifacts in our comparison of numeric trajectories versus textual semantics for short-term affective dynamics.

read point-by-point responses

Referee: [Experimental results / Discussion] The headline finding that numeric trajectories outperform textual semantics for short-term change (abstract and experimental results) is load-bearing for the paper's contrast between approaches. The skeptic note correctly flags that this superiority could arise from dataset artifacts such as temporal autocorrelation, stable per-user baselines, or annotation propagation across sequences rather than genuine semantic limitations. No autocorrelation plots, user-level variance decomposition, order-permutation controls, or similar diagnostics appear to be reported; without them the claim that trajectories are 'more strongly explained' than text remains vulnerable to the chronological ordering bias.

Authors: We agree that the absence of these diagnostics leaves the central claim vulnerable to alternative explanations rooted in dataset structure rather than the semantic limitations of text. Our neural regression model relies on recent numeric trajectories and user embeddings precisely to capture dynamic changes beyond static baselines, while the LLM approaches rely on textual input; however, without explicit controls we cannot fully rule out autocorrelation or ordering effects. In the revised manuscript we will add (1) autocorrelation plots of valence and arousal sequences per user, (2) a variance decomposition separating between-user stable components from within-user temporal variation, and (3) order-permutation controls that randomly shuffle sequence order within users before re-training and evaluating the trajectory-based model. These additions will directly test whether the predictive advantage of numeric trajectories depends on genuine short-term dynamics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical modeling paper with no derivations or self-referential reductions

full rationale

The paper reports results from three standard modeling approaches (LLM prompting, MaxEnt with Ising interactions, and neural regression on trajectories plus embeddings) trained and evaluated on the SemEval-2026 Task 2 dataset using held-out testing. No equations, derivations, or parameter-fitting steps are described that would reduce a claimed prediction to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central empirical contrast between static LLM performance and trajectory-based dynamics is presented as an observation on this specific dataset rather than a self-contained logical necessity. This is a typical competition-system paper whose claims remain externally falsifiable via the shared task data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or novel theoretical constructs; relies on standard supervised learning assumptions for the three models.

pith-pipeline@v0.9.0 · 5457 in / 1014 out tokens · 42894 ms · 2026-05-09T21:53:26.386792+00:00 · methodology

UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)