Lightweight Cybersickness Detection based on User-Specific Eye and Head Tracking Data in Virtual Reality

Maria Torres Vega; Mihai B\^ace; Yijun Wang

arxiv: 2604.17158 · v1 · submitted 2026-04-18 · 💻 cs.HC · cs.LG

Lightweight Cybersickness Detection based on User-Specific Eye and Head Tracking Data in Virtual Reality

Yijun Wang , Mihai B\^ace , Maria Torres Vega This is my paper

Pith reviewed 2026-05-10 05:53 UTC · model grok-4.3

classification 💻 cs.HC cs.LG

keywords cybersicknessvirtual realityeye trackinghead trackingensemble learninguser-specific detectionlightweight modelsVR user experience

0 comments

The pith

Training ensemble models on similar-content VR segments with 23 eye and head features detects cybersickness at 93% cross-user accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether a lightweight ensemble approach can detect cybersickness reliably using only eye and head tracking data collected from individual users. It finds that performance hinges on how the training data is assembled, with models trained on VR segments of matching content delivering the strongest outcomes at 93 percent accuracy when tested across users and 88 percent in personalized settings. These results rely on just 23 features and run with low computational cost, making the method suitable for ongoing VR use. Cybersickness disrupts user comfort and immersion, so early detection opens the door to automatic adjustments that preserve the experience. The work shifts focus from pooled multi-user data to user-specific patterns that respect individual differences.

Core claim

The authors establish that feature engineering and training set construction, specifically the selection of data from similar-content segments, are decisive for performance. An ensemble learning model using 23-dimensional eye and head tracking features from the Simulation 2021 dataset reaches 93 percent detection accuracy in cross-user evaluation and 88 percent in user-personalized evaluation, while offering shorter training and inference times that support practical real-world deployment.

What carries the argument

Ensemble learning model operating on 23-dimensional eye and head tracking feature vectors, trained selectively on VR segments with matching visual and motion content.

If this is right

Lightweight models become practical for real-time cybersickness detection in deployed VR applications.
User-specific training data improves or maintains accuracy while avoiding the need for large aggregated datasets.
High performance holds across varying levels of cybersickness without requiring complex model architectures.
Shorter training and inference times enable on-device operation with limited hardware resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tracking signals could support detection of related VR issues such as visual fatigue if similar content-based training is applied.
Mixing training segments from dissimilar content types would likely reduce the reported accuracies, revealing a limit of the current approach.
Real-time detection could feed into automatic VR adjustments like slowing camera motion or narrowing field of view to reduce discomfort.
Extending the method to longer or multi-session VR experiences might require periodic retraining on fresh similar-content data.

Load-bearing premise

That the eye and head movement patterns tied to cybersickness in the Simulation 2021 dataset remain representative across new users and new VR content.

What would settle it

Running the same models on eye and head data from users experiencing a VR environment with substantially different motion speeds or visual styles and observing detection accuracy drop below 80 percent.

read the original abstract

The occurrence of cybersickness in virtual reality (VR) significantly impairs users' perception and sense of immersion. Therefore, timely detection of cybersickness and the application of appropriate intervention strategies are crucial for enhancing the user experience. However, existing cybersickness detection methods often suffer from issues such as poor detection reliability across different levels of cybersickness and unnecessary model complexity. Furthermore, while cybersickness exhibits significant inter-user variability, most existing approaches aggregate all data from users and lack user-specific solutions. In this paper, we investigate a lightweight approach for cybersickness detection incorporating an ensemble learning model and user-specific eye and head tracking data. Our experiments using the open-source dataset Simulation 2021 demonstrate that feature engineering and training set construction are critical for determining detection performance. Models trained with data from similar-content segments achieve the best results, attaining detection accuracies of 93% in the cross-user setting and 88% in the user-personalized setting, using only 23-dimensional eye and head features. Moreover, by using user-specific data, well-tuned ensemble learning models with shorter training and inference times can be feasibly applied to real-world cybersickness detection, offering superior time efficiency and outstanding detection performance. This work offers useful evidence toward the development of lightweight and user-adaptive cybersickness detection models for VR applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a lightweight ensemble learning approach for cybersickness detection in VR using user-specific eye and head tracking data. On the open Simulation 2021 dataset, it finds that feature engineering and training-set construction from similar-content segments are critical, with models achieving 93% accuracy in cross-user settings and 88% in user-personalized settings using only 23-dimensional features; the work also highlights shorter training/inference times relative to alternatives and the value of user-specific modeling.

Significance. If the performance claims hold under transparent validation, the work would provide a practical contribution to VR user-experience research by demonstrating an efficient, user-adaptive detection method that avoids heavy models while addressing inter-user variability. Explicit use of an open dataset and emphasis on time efficiency are strengths that would aid reproducibility and real-world deployment if the supporting protocol is supplied.

major comments (1)

[Abstract] Abstract: the central claims of 93% cross-user and 88% personalized accuracy rest on an unspecified protocol. No definition is given for 'similar-content segments,' no list or justification is supplied for the 23 eye/head features, the ensemble architecture (base learners, aggregation rule, hyperparameter tuning) is not described, and the cross-user/personalized data splits and validation procedure are omitted. These omissions are load-bearing because they prevent assessment of selection bias, leakage, or overfitting, rendering the reported numbers unverifiable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comment correctly identifies that the abstract, in its current concise form, leaves key methodological elements unspecified, which limits immediate verifiability of the reported accuracies. We address this point directly below and will revise the manuscript to improve self-containment of the abstract while preserving its brevity.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 93% cross-user and 88% personalized accuracy rest on an unspecified protocol. No definition is given for 'similar-content segments,' no list or justification is supplied for the 23 eye/head features, the ensemble architecture (base learners, aggregation rule, hyperparameter tuning) is not described, and the cross-user/personalized data splits and validation procedure are omitted. These omissions are load-bearing because they prevent assessment of selection bias, leakage, or overfitting, rendering the reported numbers unverifiable.

Authors: We agree that the abstract should supply enough context for the central claims to be assessed at a high level. In the revised version we will expand the abstract with concise definitions and descriptions: 'similar-content segments' will be defined as VR simulation intervals sharing comparable visual and motion characteristics; the 23 features will be characterized as a reduced set of eye- and head-tracking statistics obtained via correlation-based selection; the ensemble will be described as a combination of lightweight base learners aggregated by a simple rule with hyperparameters selected through cross-validation; and the evaluation protocol will be summarized as leave-one-user-out for the cross-user case and per-user stratified splits for the personalized case. These additions will be kept brief to respect abstract length limits, with full lists, justifications, and pseudocode remaining in the Methods and Experiments sections. We will also add an explicit statement that all splits and feature selection were performed on training data only to preclude leakage. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML results on open dataset

full rationale

The paper reports experimental accuracies (93% cross-user, 88% personalized) from training ensemble models on the Simulation 2021 dataset using 23 eye/head features. No derivation chain, equations, or first-principles claims exist that reduce outputs to inputs by construction. Results are standard performance metrics on data splits with no self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the work relies on standard supervised machine learning assumptions and the properties of the cited open dataset.

pith-pipeline@v0.9.0 · 5508 in / 1147 out tokens · 51943 ms · 2026-05-10T05:53:49.095048+00:00 · methodology

Lightweight Cybersickness Detection based on User-Specific Eye and Head Tracking Data in Virtual Reality

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)