SkillSight: Efficient First-Person Skill Assessment with Gaze

Chi Hsuan Wu; Kristen Grauman; Kumar Ashutosh

arxiv: 2511.19629 · v2 · submitted 2025-11-24 · 💻 cs.CV

SkillSight: Efficient First-Person Skill Assessment with Gaze

Chi Hsuan Wu , Kumar Ashutosh , Kristen Grauman This is my paper

Pith reviewed 2026-05-17 05:35 UTC · model grok-4.3

classification 💻 cs.CV

keywords egocentric perceptionskill assessmentgaze trackingknowledge distillationpower efficiencyfirst-person videocomputer visionwearable computing

0 comments

The pith

SkillSight distills a gaze-only student model from joint video-and-gaze training to assess skill level with high accuracy at far lower power cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that skill level in physical tasks shows up in where a person looks as much as in what the camera records. It builds a teacher model that learns from both egocentric video and gaze, then transfers that knowledge to a student model that needs only gaze input at test time. Experiments across cooking, music, and sports datasets show the student keeps competitive accuracy while cutting power use by a factor of 73. A reader would care because this removes the main barrier to always-on skill feedback on battery-limited devices like smart glasses. The approach therefore opens a route to practical, in-the-wild AI coaching without continuous video capture.

Core claim

Skill level is evident not only in how a person performs an activity but also in how they direct their attention; a two-stage teacher-student framework first learns the joint distribution of gaze and egocentric video, then distills a gaze-only student that achieves state-of-the-art accuracy on three real-world datasets while eliminating continuous video processing.

What carries the argument

Two-stage distillation pipeline in which a teacher jointly models gaze and video for skill prediction and then transfers knowledge to a gaze-only student model for low-power inference.

If this is right

Skill assessment becomes feasible on always-on wearable devices without draining the battery.
The same gaze signal can support real-time coaching feedback during practice sessions.
Datasets that record only eye tracking become sufficient for training future skill models.
Power savings scale with the duration of the activity, enabling longer monitoring sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distillation step may generalize to other egocentric tasks where attention cues matter more than raw pixel content.
Combining the gaze-only model with occasional low-frame-rate video checks could further improve robustness without much added cost.
If gaze data can be captured on commodity smart glasses, the method could support large-scale studies of skill acquisition in everyday settings.

Load-bearing premise

Gaze patterns by themselves remain informative enough to predict skill level once the student has been distilled from video-plus-gaze training data.

What would settle it

A large accuracy drop when the gaze-only student is tested on a held-out activity set where gaze statistics no longer correlate with expert versus novice performance.

Figures

Figures reproduced from arXiv: 2511.19629 by Chi Hsuan Wu, Kristen Grauman, Kumar Ashutosh.

**Figure 1.** Figure 1: Skill assessment with gaze. Experts and novices exhibit distinct attention behaviors, influencing both how they move their head and eyes and what they see, as illustrated here with clips from an expert (top) and novice (bottom) basketball layup from [31]. The proposed method explores the associations between gaze, action, and expertise to achieve accurate and power-efficient skill assessment, using either… view at source ↗

**Figure 2.** Figure 2: Left: Overview of SkillSight-Teacher. We incorporate three components that encode action and gaze correlation, attended object sequence, and gaze trajectory for skill assessment. These features are fused by the fusion layer for prediction. Right: Overview of distillation method. SkillSight-Student learns to distill knowledge from the teacher feature [ev, ec, eg] using the distillation token tdis. As guidan… view at source ↗

**Figure 3.** Figure 3: What does an expert vs. novice tend to see more of? In these distributions, each patch crops the egocentric frame based on the subject’s gaze coordinates. Our representation surfaces interesting patterns, like (left two boxes) how novice pianists fixate on their hands more often than experts do (77% vs. 45%, as quantified with hand detection), or (right two boxes) how bouldering experts exhibit greater gaz… view at source ↗

**Figure 4.** Figure 4: Qualitative results. Both SkillSight-T and SkillSight-S better predict skill level than prior work. Experts and novices show distinct gaze patterns consistent with Ego-Exo4D [31] expert commentaries, shown for reference but not used by any model. The last example (bottom right) shows a failure case, highlighting the challenge of assessing skill from subtle movements. Blue rays show gaze direction and depth… view at source ↗

**Figure 6.** Figure 6: Gaze pattern analysis. SkillSight-S reveals distinct gaze patterns between model-predicted experts and novices. the power consumption of the best baseline, i.e. EgoDistill [80], by 43%. Moreover, SkillSight-S demonstrates competitive performance compared to video-based methods, which are power intensive regardless of the architecture due to the energy cost of sensing and visual feature encoding. Compar… view at source ↗

**Figure 7.** Figure 7: Distinct gaze pattern analysis. We present more distinct gaze patterns that SkillSight-S reveals between subjects at different skill levels. not be defined for a single instantaneous reading. E. Behavior-level interpretation of gaze In [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Egocentric perception on smart glasses could transform how we learn new skills in the physical world, but automatic skill assessment remains a fundamental technical challenge. We introduce SkillSight for power-efficient skill assessment from first-person data. Central to our approach is the hypothesis that skill level is evident not only in how a person performs an activity (video), but also in how they direct their attention when doing so (gaze). Our two-stage framework first learns to jointly model gaze and egocentric video when predicting skill level, then distills a gaze-only student model. At inference, the student model requires only gaze input, drastically reducing power consumption by eliminating continuous video processing. Experiments on three datasets spanning cooking, music, and sports establish, for the first time, the valuable role of gaze in skill understanding across diverse real-world settings. Our SkillSight teacher model achieves state-of-the-art performance, while our gaze-only student variant maintains high accuracy using 73x less power than competing methods. These results pave the way for in-the-wild AI-supported skill learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SkillSight's gaze-only student via distillation offers a practical path to low-power skill assessment, though the transfer of cues from the teacher remains the key uncertainty.

read the letter

The main point is that this paper demonstrates a distillation method to assess skills from gaze data alone, after training on both gaze and video, leading to much lower power use on devices. They propose SkillSight, which first trains a teacher model using egocentric video and gaze to predict skill levels in activities like cooking, music, and sports. The student model is then trained to replicate the teacher's predictions but using only gaze input. This means at runtime, the system avoids processing video frames continuously, which they claim reduces power consumption by a factor of 73 while keeping accuracy competitive. What works here is the focus on practical efficiency for wearable cameras and the evaluation across multiple domains. It builds on the idea that gaze patterns reveal expertise through attention allocation, and showing this in diverse settings strengthens the case. The potential weakness is in the assumption that gaze encodes enough information independently. The distillation might not capture all the cues if the teacher depends on visual scene details not present in the gaze signal. More detailed results on the accuracy difference between teacher and student, along with comparisons to non-distilled gaze models, would clarify this. The abstract only gives high-level claims, so the full paper's tables and ablations are key to judging how well it holds up. The work appears to use standard distillation techniques without obvious circularity, and the multi-dataset experiments are a solid step. No major issues with the citation pattern from what I can see. This is relevant for colleagues working on egocentric AI and on-device applications for skill learning. A reader looking for ways to make attention-based models more efficient would take something away from it. The paper engages honestly with the literature on gaze and skill assessment. It is worth sending for peer review so the community can assess the empirical support. I would recommend accepting it for peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces SkillSight, a two-stage framework for power-efficient first-person skill assessment. A teacher model jointly processes gaze and egocentric video to predict skill level across cooking, music, and sports tasks; this is distilled into a gaze-only student model that operates at inference without video input. The work claims SOTA performance for the teacher and that the student maintains high accuracy while using 73x less power than competing methods, establishing the value of gaze for skill understanding.

Significance. If the central results hold, the approach could enable practical on-device skill assessment for smart glasses by replacing continuous video processing with low-power gaze input. The cross-domain evaluation and distillation strategy provide a concrete path to power reduction while preserving accuracy, with potential impact on egocentric perception systems for real-world skill learning.

major comments (2)

[§4] §4 (Experiments): The claim that the gaze-only student maintains high accuracy after distillation is load-bearing for the 73x power reduction result, yet the section provides no ablation isolating the contribution of gaze versus video features in the teacher or measuring performance drop when video context is removed at inference; without this, it is unclear whether skill cues transfer fully to the student.
[§3.2] §3.2 (Distillation): The distillation loss is described as combining task loss and feature matching, but no analysis shows that this objective forces recovery of video-dependent discrimination cues (e.g., object attention timing) from gaze sequences alone; if the teacher exploits visual content unavailable to the student, the accuracy premise fails.

minor comments (2)

[Abstract] Abstract: Specific numerical values for accuracy, power measurements, and baseline comparisons are missing, weakening the ability to assess the SOTA and 73x claims at a glance.
[Figure 2] Figure 2: The diagram of the teacher-student pipeline would benefit from explicit annotation of the distillation loss terms and temperature parameter to match the text description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the changes we will make to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments): The claim that the gaze-only student maintains high accuracy after distillation is load-bearing for the 73x power reduction result, yet the section provides no ablation isolating the contribution of gaze versus video features in the teacher or measuring performance drop when video context is removed at inference; without this, it is unclear whether skill cues transfer fully to the student.

Authors: We agree that the current experiments would be strengthened by explicit ablations isolating the contribution of gaze versus video features. In the revised manuscript we will add a new ablation subsection in §4 that (i) compares teacher performance with and without video input and (ii) reports the accuracy drop between the joint teacher and the gaze-only student across all three datasets. These results will directly quantify how much skill-relevant information transfers through distillation and will support the 73x power-reduction claim. revision: yes
Referee: [§3.2] §3.2 (Distillation): The distillation loss is described as combining task loss and feature matching, but no analysis shows that this objective forces recovery of video-dependent discrimination cues (e.g., object attention timing) from gaze sequences alone; if the teacher exploits visual content unavailable to the student, the accuracy premise fails.

Authors: We acknowledge the value of additional analysis showing that the distillation objective recovers video-dependent cues from gaze alone. While the cross-domain results already indicate successful transfer, we will expand §3.2 with a brief discussion of the feature-matching term and add qualitative examples (gaze attention maps aligned with skill-critical events) plus a quantitative cue-recovery metric in the experiments. These additions will clarify how gaze sequences encode the necessary timing and focus information. revision: partial

Circularity Check

0 steps flagged

No circularity: standard teacher-student distillation with empirical validation on external datasets

full rationale

The paper presents a two-stage pipeline: a teacher model jointly processes gaze and egocentric video to predict skill level, followed by distillation to a gaze-only student. This follows conventional knowledge distillation without reducing predictions to fitted parameters by construction or relying on self-citation chains for core claims. Experiments on three independent datasets (cooking, music, sports) provide external validation. No self-definitional equations, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation are described. The 73x power reduction is a measured outcome of removing video input at inference, not a definitional tautology. The derivation chain remains self-contained against benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Relies on the domain assumption that gaze encodes skill information and on standard ML training choices whose specific hyperparameters are not detailed in the abstract.

free parameters (1)

distillation loss weights and temperature
Typical hyperparameters in teacher-student training that must be chosen or tuned to achieve the reported accuracy-power trade-off.

axioms (1)

domain assumption Gaze direction and fixation patterns are informative of skill level in physical tasks.
Central hypothesis stated in the abstract that justifies using gaze as the sole input at inference.

pith-pipeline@v0.9.0 · 5482 in / 1105 out tokens · 37827 ms · 2026-05-17T05:35:28.560650+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our two-stage framework first learns to jointly model gaze and egocentric video when predicting skill level, then distills a gaze-only student model... Ldis = ||f_p(ê_s) - f_t([e_v, e_c, e_g])||_1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · 1 internal anchor

[1]

Visual strategies of young soccer players during a passing test – a pilot study.Journal of Eye Movement Research, 15 (1), 2022. 2

work page 2022
[2]

Classification of expert-novice level using eye tracking and motion data via conditional multimodal variational autoencoder

Yusuke Akamatsu, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. Classification of expert-novice level using eye tracking and motion data via conditional multimodal variational autoencoder. InICASSP 2021-2021 IEEE Inter- national Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), pages 1360–1364. IEEE, 2021. 2, 3, 5, 6, 8

work page 2021
[3]

Where does gaze lead? integrating gaze and motion for en- hanced 3d pose estimation

Taravat Anvari, Markus Lappe, and Marc H E de Lussanet. Where does gaze lead? integrating gaze and motion for en- hanced 3d pose estimation. In2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Work- shops (VRW), pages 76–83, 2025. 2

work page 2025
[4]

Expertaf: Expert action- able feedback from video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, and Kristen Grauman. Expertaf: Expert action- able feedback from video. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13582– 13594, 2025. 1

work page 2025
[5]

Low power environmental image sensors for remote photogrammetry.Sensors, 22(19):7617,

Alpha Yaya Balde, Emmanuel Bergeret, Denis Cajal, and Jean-Pierre Toumazet. Low power environmental image sensors for remote photogrammetry.Sensors, 22(19):7617,

work page
[6]

Am i a baller? basketball performance assessment from first-person videos

Gedas Bertasius, Hyun Soo Park, Stella X Yu, and Jianbo Shi. Am i a baller? basketball performance assessment from first-person videos. InProceedings of the IEEE inter- national conference on computer vision, pages 2177–2185,

work page
[7]

Is space-time attention all you need for video understanding? InIcml, page 4, 2021

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. Is space-time attention all you need for video understanding? InIcml, page 4, 2021. 3, 5, 6, 7, 8

work page 2021
[8]

Skillformer: Unified multi-view video understanding for proficiency estimation,

Edoardo Bianchi and Antonio Liotta. Skillformer: Unified multi-view video understanding for proficiency estimation,

work page
[9]

egoppg: Heart rate estimation from eye-tracking cameras in egocentric systems to benefit downstream vision tasks.arXiv preprint arXiv:2502.20879,

Bj ¨orn Braun, Rayan Armani, Manuel Meier, Max Moe- bus, and Christian Holz. egoppg: Heart rate estimation from eye-tracking cameras in egocentric systems to benefit downstream vision tasks.arXiv preprint arXiv:2502.20879,

work page arXiv
[10]

A review of eye tracking for understand- ing and improving diagnostic interpretation.Cognitive re- search: principles and implications, 4(1):7, 2019

Tad T Bruny ´e, Trafton Drew, Donald L Weaver, and Joann G Elmore. A review of eye tracking for understand- ing and improving diagnostic interpretation.Cognitive re- search: principles and implications, 4(1):7, 2019. 2

work page 2019
[11]

Flexible frame selection for efficient video reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab, and Cordelia Schmid. Flexible frame selection for efficient video reasoning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29071–29082,

work page
[12]

Video action differencing.arXiv preprint arXiv:2503.07860, 2025

James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, and Ser- ena Yeung-Levy. Video action differencing.arXiv preprint arXiv:2503.07860, 2025. 1

work page arXiv 2025
[13]

Michel A. Cara. The effect of practice and musical structure on pianists’ eye-hand span and visual monitoring.Journal of Eye Movement Research, 16(2):1–18, 2023. 4, 8

work page 2023
[14]

Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. Inpro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017. 6

work page 2017
[15]

Quiet eye training im- proves surgical performance: A randomized controlled study.Frontiers in Psychology, 5:821, 2014

Joe Causer, Adam Harvey, Richard Snelgrove, Gary Ar- senault, and Oshin Vartanian. Quiet eye training im- proves surgical performance: A randomized controlled study.Frontiers in Psychology, 5:821, 2014. 2

work page 2014
[16]

Integra- tion of experts’ and beginners’ machine operation experi- ences to obtain a detailed task model.IEICE TRANSAC- TIONS on Information, E104-D(1):152–161, 2021

Longfei CHEN, Yuichi NAKAMURA, Kazuaki KONDO, Dima DAMEN, and Walterio MAYOL-CUEV AS. Integra- tion of experts’ and beginners’ machine operation experi- ences to obtain a detailed task model.IEICE TRANSAC- TIONS on Information, E104-D(1):152–161, 2021. 2, 5

work page 2021
[17]

You-do, i- learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video

Dima Damen, Teesid Leelasawassuk, Osian Haines, An- drew Calway, and Walterio W Mayol-Cuevas. You-do, i- learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video. InBMVC, page 3, 2014. 2

work page 2014
[18]

Trends in ai inference energy consump- tion: Beyond the performance-vs-parameter laws of deep learning.Sustainable Computing: Informatics and Systems, 38:100857, 2023

Radosvet Desislavov, Fernando Mart´ınez-Plumed, and Jos´e Hern´andez-Orallo. Trends in ai inference energy consump- tion: Beyond the performance-vs-parameter laws of deep learning.Sustainable Computing: Informatics and Systems, 38:100857, 2023. 8

work page 2023
[19]

Luci- daction: A hierarchical and multi-model dataset for com- prehensive action quality assessment.Advances in Neural Information Processing Systems, 37:96468–96482, 2024

Linfeng Dong, Wei Wang, Yu Qiao, and Xiao Sun. Luci- daction: A hierarchical and multi-model dataset for com- prehensive action quality assessment.Advances in Neural Information Processing Systems, 37:96468–96482, 2024. 2

work page 2024
[20]

The pros and cons: Rank-aware temporal attention for skill determination in long videos

Hazel Doughty, Walterio Mayol-Cuevas, and Dima Damen. The pros and cons: Rank-aware temporal attention for skill determination in long videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7862–7871, 2019. 1, 2

work page 2019
[21]

The influ- ence of expertise in music reading on the detection of tem- poral violations.Visual Cognition, 20(3):267–282, 2012

Veronique Drai-Zerbib and Emmanuel Baccino. The influ- ence of expertise in music reading on the detection of tem- poral violations.Visual Cognition, 20(3):267–282, 2012. 2

work page 2012
[22]

Towards progress assessment for adaptive hints in educational virtual reality games

Tobias Drey, Pascal Jansen, Fabian Fischbach, Julian From- mel, and Enrico Rukzio. Towards progress assessment for adaptive hints in educational virtual reality games. InEx- tended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, page 1–9, New York, NY , USA, 2020. Association for Computing Machinery. 1

work page 2020
[23]

Amr Elkholy, Mohamed E Hussein, Walid Gomaa, Dima Damen, and Emmanuel Saba. Efficient and robust skeleton- based quality assessment and abnormality detection in hu- man action performance.IEEE journal of biomedical and health informatics, 24(1):280–291, 2019. 2

work page 2019
[24]

X3d: Expanding architectures for efficient video recognition

Christoph Feichtenhofer. X3d: Expanding architectures for efficient video recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 203–213, 2020. 2, 5, 6, 7

work page 2020
[25]

Evostruggle: A dataset capturing the evolution of strug- gle across activities and skill levels.arXiv preprint arXiv:2510.01362, 2025

Shijia Feng, Michael Wray, and Walterio Mayol-Cuevas. Evostruggle: A dataset capturing the evolution of strug- gle across activities and skill levels.arXiv preprint arXiv:2510.01362, 2025. 1

work page arXiv 2025
[26]

Video-based surgical skill assessment using 9 3d convolutional neural networks.International Journal of Computer Assisted Radiology and Surgery, 14(7):1217– 1225, 2019

Isabel Funke, S ¨oren Torge Mees, J ¨urgen Weitz, and Ste- fanie Speidel. Video-based surgical skill assessment using 9 3d convolutional neural networks.International Journal of Computer Assisted Radiology and Surgery, 14(7):1217– 1225, 2019. 3

work page 2019
[27]

Soline Galuret, Nicolas Vall ´ee, Alexandre Tronchot, Herve Thomazeau, Pierre Jannin, and Arnaud Huaulm ´e. Gaze behavior is related to objective technical skills assessment during virtual reality simulator-based surgical training: a proof of concept.International Journal of Computer As- sisted Radiology and Surgery, 18(9):1697–1705, 2023. 2

work page 2023
[28]

Listen to look: Action recognition by previewing audio

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, and Lorenzo Torresani. Listen to look: Action recognition by previewing audio. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 10457–10467,

work page
[29]

Visual-semantic alignment temporal pars- ing for action quality assessment.IEEE Transactions on Circuits and Systems for Video Technology, 2024

Kumie Gedamu, Yanli Ji, Yang Yang, Jie Shao, and Heng Tao Shen. Visual-semantic alignment temporal pars- ing for action quality assessment.IEEE Transactions on Circuits and Systems for Video Technology, 2024. 2

work page 2024
[30]

Using eye tracking to trace a cogni- tive process: Gaze behaviour during decision making in a natural environment.Journal of eye movement research, 6 (1), 2013

Kerstin Gidl ¨of, Annika Wallin, Richard Dewhurst, and Kenneth Holmqvist. Using eye tracking to trace a cogni- tive process: Gaze behaviour during decision making in a natural environment.Journal of eye movement research, 6 (1), 2013. 1, 2

work page 2013
[31]

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 193...

work page 2024
[32]

Control of gaze in natural environments: effects of rewards and costs, uncertainty and memory in target selection.Interface focus, 8(4):20180009, 2018

Mary M Hayhoe and Jonathan Samir Matthis. Control of gaze in natural environments: effects of rewards and costs, uncertainty and memory in target selection.Interface focus, 8(4):20180009, 2018. 2

work page 2018
[33]

1.1 computing’s energy problem (and what we can do about it)

Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14, 2014. 8

work page 2014
[34]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 6

work page 2022
[35]

Predicting gaze in egocentric video by learning task- dependent attention transition

Yifei Huang, Minjie Cai, Zhenqiang Li, and Yoichi Sato. Predicting gaze in egocentric video by learning task- dependent attention transition. InProceedings of the Eu- ropean conference on computer vision (ECCV), pages 754– 769, 2018. 2

work page 2018
[36]

Mutual context network for jointly estimating egocentric gaze and action.IEEE Transactions on Image Processing, 29:7795–7806, 2020

Yifei Huang, Minjie Cai, Zhenqiang Li, Feng Lu, and Yoichi Sato. Mutual context network for jointly estimating egocentric gaze and action.IEEE Transactions on Image Processing, 29:7795–7806, 2020. 4

work page 2020
[37]

Egoexolearn: A dataset for bridging asynchronous ego-and exo-centric view of procedural ac- tivities in real world

Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Li- jin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, et al. Egoexolearn: A dataset for bridging asynchronous ego-and exo-centric view of procedural ac- tivities in real world. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 22072–22086, 2024. 1...

work page 2024
[38]

Vid2coach: Transform- ing how-to videos into task assistants.arXiv preprint arXiv:2506.00717, 2025

Mina Huh, Zihui Xue, Ujjaini Das, Kumar Ashutosh, Kris- ten Grauman, and Amy Pavel. Vid2coach: Transform- ing how-to videos into task assistants.arXiv preprint arXiv:2506.00717, 2025. 1

work page arXiv 2025
[39]

Difference in gaze control ability be- tween low and high skill players of a real-time strategy game in esports.PloS one, 17(3):e0265526, 2022

Inhyeok Jeong, Kento Nakagawa, Rieko Osu, and Kazuyuki Kanosue. Difference in gaze control ability be- tween low and high skill players of a real-time strategy game in esports.PloS one, 17(3):e0265526, 2022. 2, 4, 5

work page 2022
[40]

Eyepiano: leveraging gaze for reflective piano learning

Jakob Karolus, Johannes Sylupp, Albrecht Schmidt, and Paweł W Wo´zniak. Eyepiano: leveraging gaze for reflective piano learning. InProceedings of the 2023 ACM Designing Interactive Systems Conference, pages 1209–1223, 2023. 2

work page 2023
[41]

Generalized and efficient skill assessment from imu data with applications in gymnastics and medical training.ACM Transactions on Computing for Healthcare, 2(1):1–21, 2020

Aftab Khan, Sebastian Mellor, Rachel King, Balazs Janko, William Harwin, R Simon Sherratt, Ian Craddock, and Thomas Pl ¨otz. Generalized and efficient skill assessment from imu data with applications in gymnastics and medical training.ACM Transactions on Computing for Healthcare, 2(1):1–21, 2020. 2

work page 2020
[42]

GazeGPT: Augmenting human capabilities using gaze-contingent contextual ai for smart eyewear,

Robert Konrad, Nitish Padmanaban, J Gabriel Buckmaster, Kevin C Boyle, and Gordon Wetzstein. Gazegpt: Augment- ing human capabilities using gaze-contingent contextual ai for smart eyewear.arXiv preprint arXiv:2401.17217, 2024. 2

work page arXiv 2024
[43]

Scsam- pler: Sampling salient clips from video for efficient action recognition

Bruno Korbar, Du Tran, and Lorenzo Torresani. Scsam- pler: Sampling salient clips from video for efficient action recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6232–6242, 2019. 2

work page 2019
[44]

In the eye of transformer: Global-local correlation for egocentric gaze estimation.arXiv preprint arXiv:2208.04464, 2022

Bolin Lai, Miao Liu, Fiona Ryan, and James M Rehg. In the eye of transformer: Global-local correlation for egocentric gaze estimation.arXiv preprint arXiv:2208.04464, 2022. 2

work page arXiv 2022
[45]

Listen to look into the future: Audio-visual egocen- tric gaze anticipation

Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu, and James M Rehg. Listen to look into the future: Audio-visual egocen- tric gaze anticipation. InEuropean Conference on Com- puter Vision, pages 192–210. Springer, 2024. 2

work page 2024
[46]

The roles of vision and eye movements in the control of activities of daily living.Perception, 28(11):1311–1328, 1999

Michael Land, Neil Mennie, and Jennifer Rusted. The roles of vision and eye movements in the control of activities of daily living.Perception, 28(11):1311–1328, 1999. 2

work page 1999
[47]

Hypercam: Low-power on- board computer vision for iot cameras.arXiv preprint arXiv:2501.10547, 2025

Chae Young Lee, Maxwell Fite, Tejus Rao, Sara Achour, Zerina Kapetanovic, et al. Hypercam: Low-power on- board computer vision for iot cameras.arXiv preprint arXiv:2501.10547, 2025. 3

work page arXiv 2025
[48]

Seungmin Lee and Jongseong An. Gaze control and motor performance in motor expertise studies: Focused review of field application research on perceptual skill training.Inter- national Journal of Applied Sports Sciences, 35(1), 2023. 2, 4, 5

work page 2023
[49]

Multi-skeleton structures graph convolu- tional network for action quality assessment in long videos

Qing Lei, Huiying Li, Hongbo Zhang, Jixiang Du, and Shangce Gao. Multi-skeleton structures graph convolu- tional network for action quality assessment in long videos. Applied Intelligence, 53(19):21692–21705, 2023. 2

work page 2023
[50]

Learning to pre- dict gaze in egocentric video

Yin Li, Alireza Fathi, and James M Rehg. Learning to pre- dict gaze in egocentric video. InProceedings of the IEEE international conference on computer vision, pages 3216– 3223, 2013. 2 10

work page 2013
[51]

In the eye of the be- holder: Gaze and actions in first person video.IEEE trans- actions on pattern analysis and machine intelligence, 45 (6):6731–6747, 2021

Yin Li, Miao Liu, and James M Rehg. In the eye of the be- holder: Gaze and actions in first person video.IEEE trans- actions on pattern analysis and machine intelligence, 45 (6):6731–6747, 2021. 1, 2, 4, 5, 6, 7

work page 2021
[52]

A light weight model for active speaker detection

Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, and Liangyin Chen. A light weight model for active speaker detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22932–22941, 2023. 2

work page 2023
[53]

Ricaˆ2: Rubric- informed, calibrated assessment of actions

Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Sri- nath GNVV Namburi, and Yin Li. Ricaˆ2: Rubric- informed, calibrated assessment of actions. InProceedings of the European Conference on Computer Vision (ECCV),

work page
[54]

Chat2map: Efficient scene mapping from multi-ego conversations

Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Hen- derson, Paul Calamia, Kristen Grauman, and Vamsi Kr- ishna Ithapu. Chat2map: Efficient scene mapping from multi-ego conversations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10554–10564, 2023. 2

work page 2023
[55]

Learning to de- tect attended objects in cultural sites with gaze signals and weak object supervision.ACM Journal on Computing and Cultural Heritage, 17(3):1–21, 2024

Michele Mazzamuto*, Francesco Ragusa*, Antonino Furnari*, and Giovanni Maria Farinella*. Learning to de- tect attended objects in cultural sites with gaze signals and weak object supervision.ACM Journal on Computing and Cultural Heritage, 17(3):1–21, 2024. 2

work page 2024
[56]

Gazing into missteps: Leverag- ing eye-gaze for unsupervised mistake detection in egocen- tric videos of skilled human activities

Michele Mazzamuto, Antonino Furnari, Yoichi Sato, and Giovanni Maria Farinella. Gazing into missteps: Leverag- ing eye-gaze for unsupervised mistake detection in egocen- tric videos of skilled human activities. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8310–8320, 2025. 2

work page 2025
[57]

See like an expert: Gaze- augmented training enhances skill acquisition in a virtual reality robotic suturing task.Journal of Endourology, 35 (3):376–382, 2021

Rachel Melnyk, Timothy Campbell, Tyler Holler, Kather- ine Cameron, Patrick Saba, Michael W Witthaus, Jean Joseph, and Ahmed Ghazi. See like an expert: Gaze- augmented training enhances skill acquisition in a virtual reality robotic suturing task.Journal of Endourology, 35 (3):376–382, 2021. 2

work page 2021
[58]

Project aria glasses user man- ual.https : / / facebookresearch

Meta Platforms, Inc. Project aria glasses user man- ual.https : / / facebookresearch . github . io/projectaria_tools/docs/ARK/glasses_ manual/glasses_user_manual, 2025. Accessed: 2025-10-06. 3, 1

work page 2025
[59]

Integrating human gaze into attention for egocentric activity recognition

Kyle Min and Jason J Corso. Integrating human gaze into attention for egocentric activity recognition. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1069–1078, 2021. 4

work page 2021
[60]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervi- sion.arXiv preprint arXiv:2304.07193, 2023. 5

work page internal anchor Pith review Pith/arXiv arXiv 2023
[61]

Gaze-guided graph neural network for action anticipation conditioned on inten- tion

S ¨uleyman ¨Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, and Enkelejda Kasneci. Gaze-guided graph neural network for action anticipation conditioned on inten- tion. InProceedings of the 2024 Symposium on Eye Track- ing Research and Applications, pages 1–9, 2024. 2

work page 2024
[62]

Advancements in context recog- nition for edge devices and smart eyewear: Sensors and ap- plications.IEEE Access, 2025

Francesca Palermo, Luca Casciano, Lokmane Demagh, Au- relio Teliti, Niccol`o Antonello, Giacomo Gervasoni, Hazem Hesham Yousef Shalby, Marco Brando Paracchini, Simone Mentasti, Hao Quan, et al. Advancements in context recog- nition for edge devices and smart eyewear: Sensors and ap- plications.IEEE Access, 2025. 5, 8

work page 2025
[63]

Basket: A large- scale video dataset for fine-grained skill estimation

Yulu Pan, Ce Zhang, and Gedas Bertasius. Basket: A large- scale video dataset for fine-grained skill estimation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3

work page 2025
[64]

What to say and when to say it: Live fitness coaching as a testbed for situated interaction.Advances in Neural Infor- mation Processing Systems, 37:75853–75882, 2024

Sunny Panchal, Apratim Bhattacharyya, Guillaume Berger, Antoine Mercier, Cornelius B ¨ohm, Florian Dietrichkeit, Reza Pourreza, Xuanlin Li, Pulkit Madan, Mingu Lee, et al. What to say and when to say it: Live fitness coaching as a testbed for situated interaction.Advances in Neural Infor- mation Processing Systems, 37:75853–75882, 2024. 1

work page 2024
[65]

What and how well you performed? a multitask learning approach to ac- tion quality assessment

Paritosh Parmar and Brendan Tran Morris. What and how well you performed? a multitask learning approach to ac- tion quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 304–313, 2019. 1

work page 2019
[66]

Piano skills assessment

Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In2021 IEEE 23rd international work- shop on multimedia signal processing (MMSP), pages 1–5. IEEE, 2021. 2

work page 2021
[67]

Piano skills assessment

Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In2021 IEEE 23rd international work- shop on multimedia signal processing (MMSP), pages 1–5. IEEE, 2021. 3

work page 2021
[68]

Do- main knowledge-informed self-supervised representations for workout form assessment

Paritosh Parmar, Amol Gharat, and Helge Rhodin. Do- main knowledge-informed self-supervised representations for workout form assessment. InEuropean Conference on Computer Vision, pages 105–123. Springer, 2022. 2

work page 2022
[69]

Egotrigger: Toward audio- driven image capture for human memory enhancement in all-day energy-efficient smart glasses.arXiv preprint arXiv:2508.01915, 2025

Akshay Paruchuri, Sinan Hersek, Lavisha Aggarwal, Qiao Yang, Xin Liu, Achin Kulshrestha, Andrea Colaco, Henry Fuchs, and Ishan Chatterjee. Egotrigger: Toward audio- driven image capture for human memory enhancement in all-day energy-efficient smart glasses.arXiv preprint arXiv:2508.01915, 2025. 2, 5, 6, 7

work page arXiv 2025
[70]

Review on eye-hand span in sight-reading of music.Journal of eye movement research, 14(4):10–16910, 2021

Joris Perra, B ´en´edicte Poulin-Charronnat, Thierry Baccino, and V ´eronique Drai-Zerbib. Review on eye-hand span in sight-reading of music.Journal of eye movement research, 14(4):10–16910, 2021. 5

work page 2021
[71]

E2 (go) motion: Motion augmented event stream for egocentric action recognition

Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, and Barbara Caputo. E2 (go) motion: Motion augmented event stream for egocentric action recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19935–19947, 2022. 5, 6

work page 2022
[72]

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone.arXiv preprint arXiv:2307.05463, 2023

Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, and Pengchuan Zhang. Egovlpv2: Egocentric video-language pre-training with fusion in the backbone.arXiv preprint arXiv:2307.05463, 2023. 5

work page arXiv 2023
[73]

Fit- nets: Hints for thin deep nets, 2015

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fit- nets: Hints for thin deep nets, 2015. 5

work page 2015
[74]

11 Electrasight: Fully onboard eye tracking for smart glasses with hybrid eog (heog).IEEE Internet of Things Journal,

Nicolas Sch ¨arer, Federico Villani, Aishwarya Melatur, Steven Peter, Tommaso Polonelli, and Michele Magno. 11 Electrasight: Fully onboard eye tracking for smart glasses with hybrid eog (heog).IEEE Internet of Things Journal,

work page
[75]

Multisensebadminton: Wearable sensor–based biomechanical dataset for evalua- tion of badminton performance.Scientific Data, 11(1):343,

Minwoo Seong, Gwangbin Kim, Dohyeon Yeo, Yumin Kang, Heesan Yang, Joseph DelPreto, Wojciech Matusik, Daniela Rus, and SeungJun Kim. Multisensebadminton: Wearable sensor–based biomechanical dataset for evalua- tion of badminton performance.Scientific Data, 11(1):343,

work page
[76]

Privaceye: privacy-preserving head- mounted eye tracking using egocentric scene image and eye movement features

Julian Steil, Marion Koelle, Wilko Heuten, Susanne Boll, and Andreas Bulling. Privaceye: privacy-preserving head- mounted eye tracking using egocentric scene image and eye movement features. InProceedings of the 11th ACM sym- posium on eye tracking research & applications, pages 1– 10, 2019. 2

work page 2019
[77]

Predicting behaviors of basketball players from first person videos

Shan Su, Jung Pyo Hong, Jianbo Shi, and Hyun Soo Park. Predicting behaviors of basketball players from first person videos. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1501–1510, 2017. 2

work page 2017
[78]

Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent.Journal of vision, 21(3):13–13, 2021

Brian Sullivan, Casimir JH Ludwig, Dima Damen, Walterio Mayol-Cuevas, and Iain D Gilchrist. Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent.Journal of vision, 21(3):13–13, 2021. 2

work page 2021
[79]

Smartapm framework for adaptive power manage- ment in wearable devices using deep reinforcement learn- ing.Scientific Reports, 15(1):6911, 2025

R Sunder, Umesh Kumar Lilhore, Anjani Kumar Rai, Ehab Ghith, Mehdi Tlija, Sarita Simaiya, and Afraz Hussain Ma- jeed. Smartapm framework for adaptive power manage- ment in wearable devices using deep reinforcement learn- ing.Scientific Reports, 15(1):6911, 2025. 2

work page 2025
[80]

Egodistill: Egocentric head motion distillation for efficient video understanding.Advances in Neural Information Pro- cessing Systems, 36:33485–33498, 2023

Shuhan Tan, Tushar Nagarajan, and Kristen Grauman. Egodistill: Egocentric head motion distillation for efficient video understanding.Advances in Neural Information Pro- cessing Systems, 36:33485–33498, 2023. 2, 5, 6, 7, 8

work page 2023

Showing first 80 references.

[1] [1]

Visual strategies of young soccer players during a passing test – a pilot study.Journal of Eye Movement Research, 15 (1), 2022. 2

work page 2022

[2] [2]

Classification of expert-novice level using eye tracking and motion data via conditional multimodal variational autoencoder

Yusuke Akamatsu, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. Classification of expert-novice level using eye tracking and motion data via conditional multimodal variational autoencoder. InICASSP 2021-2021 IEEE Inter- national Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), pages 1360–1364. IEEE, 2021. 2, 3, 5, 6, 8

work page 2021

[3] [3]

Where does gaze lead? integrating gaze and motion for en- hanced 3d pose estimation

Taravat Anvari, Markus Lappe, and Marc H E de Lussanet. Where does gaze lead? integrating gaze and motion for en- hanced 3d pose estimation. In2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Work- shops (VRW), pages 76–83, 2025. 2

work page 2025

[4] [4]

Expertaf: Expert action- able feedback from video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, and Kristen Grauman. Expertaf: Expert action- able feedback from video. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13582– 13594, 2025. 1

work page 2025

[5] [5]

Low power environmental image sensors for remote photogrammetry.Sensors, 22(19):7617,

Alpha Yaya Balde, Emmanuel Bergeret, Denis Cajal, and Jean-Pierre Toumazet. Low power environmental image sensors for remote photogrammetry.Sensors, 22(19):7617,

work page

[6] [6]

Am i a baller? basketball performance assessment from first-person videos

Gedas Bertasius, Hyun Soo Park, Stella X Yu, and Jianbo Shi. Am i a baller? basketball performance assessment from first-person videos. InProceedings of the IEEE inter- national conference on computer vision, pages 2177–2185,

work page

[7] [7]

Is space-time attention all you need for video understanding? InIcml, page 4, 2021

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. Is space-time attention all you need for video understanding? InIcml, page 4, 2021. 3, 5, 6, 7, 8

work page 2021

[8] [8]

Skillformer: Unified multi-view video understanding for proficiency estimation,

Edoardo Bianchi and Antonio Liotta. Skillformer: Unified multi-view video understanding for proficiency estimation,

work page

[9] [9]

egoppg: Heart rate estimation from eye-tracking cameras in egocentric systems to benefit downstream vision tasks.arXiv preprint arXiv:2502.20879,

Bj ¨orn Braun, Rayan Armani, Manuel Meier, Max Moe- bus, and Christian Holz. egoppg: Heart rate estimation from eye-tracking cameras in egocentric systems to benefit downstream vision tasks.arXiv preprint arXiv:2502.20879,

work page arXiv

[10] [10]

A review of eye tracking for understand- ing and improving diagnostic interpretation.Cognitive re- search: principles and implications, 4(1):7, 2019

Tad T Bruny ´e, Trafton Drew, Donald L Weaver, and Joann G Elmore. A review of eye tracking for understand- ing and improving diagnostic interpretation.Cognitive re- search: principles and implications, 4(1):7, 2019. 2

work page 2019

[11] [11]

Flexible frame selection for efficient video reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab, and Cordelia Schmid. Flexible frame selection for efficient video reasoning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29071–29082,

work page

[12] [12]

Video action differencing.arXiv preprint arXiv:2503.07860, 2025

James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, and Ser- ena Yeung-Levy. Video action differencing.arXiv preprint arXiv:2503.07860, 2025. 1

work page arXiv 2025

[13] [13]

Michel A. Cara. The effect of practice and musical structure on pianists’ eye-hand span and visual monitoring.Journal of Eye Movement Research, 16(2):1–18, 2023. 4, 8

work page 2023

[14] [14]

Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. Inpro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017. 6

work page 2017

[15] [15]

Quiet eye training im- proves surgical performance: A randomized controlled study.Frontiers in Psychology, 5:821, 2014

Joe Causer, Adam Harvey, Richard Snelgrove, Gary Ar- senault, and Oshin Vartanian. Quiet eye training im- proves surgical performance: A randomized controlled study.Frontiers in Psychology, 5:821, 2014. 2

work page 2014

[16] [16]

Integra- tion of experts’ and beginners’ machine operation experi- ences to obtain a detailed task model.IEICE TRANSAC- TIONS on Information, E104-D(1):152–161, 2021

Longfei CHEN, Yuichi NAKAMURA, Kazuaki KONDO, Dima DAMEN, and Walterio MAYOL-CUEV AS. Integra- tion of experts’ and beginners’ machine operation experi- ences to obtain a detailed task model.IEICE TRANSAC- TIONS on Information, E104-D(1):152–161, 2021. 2, 5

work page 2021

[17] [17]

You-do, i- learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video

Dima Damen, Teesid Leelasawassuk, Osian Haines, An- drew Calway, and Walterio W Mayol-Cuevas. You-do, i- learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video. InBMVC, page 3, 2014. 2

work page 2014

[18] [18]

Trends in ai inference energy consump- tion: Beyond the performance-vs-parameter laws of deep learning.Sustainable Computing: Informatics and Systems, 38:100857, 2023

Radosvet Desislavov, Fernando Mart´ınez-Plumed, and Jos´e Hern´andez-Orallo. Trends in ai inference energy consump- tion: Beyond the performance-vs-parameter laws of deep learning.Sustainable Computing: Informatics and Systems, 38:100857, 2023. 8

work page 2023

[19] [19]

Luci- daction: A hierarchical and multi-model dataset for com- prehensive action quality assessment.Advances in Neural Information Processing Systems, 37:96468–96482, 2024

Linfeng Dong, Wei Wang, Yu Qiao, and Xiao Sun. Luci- daction: A hierarchical and multi-model dataset for com- prehensive action quality assessment.Advances in Neural Information Processing Systems, 37:96468–96482, 2024. 2

work page 2024

[20] [20]

The pros and cons: Rank-aware temporal attention for skill determination in long videos

Hazel Doughty, Walterio Mayol-Cuevas, and Dima Damen. The pros and cons: Rank-aware temporal attention for skill determination in long videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7862–7871, 2019. 1, 2

work page 2019

[21] [21]

The influ- ence of expertise in music reading on the detection of tem- poral violations.Visual Cognition, 20(3):267–282, 2012

Veronique Drai-Zerbib and Emmanuel Baccino. The influ- ence of expertise in music reading on the detection of tem- poral violations.Visual Cognition, 20(3):267–282, 2012. 2

work page 2012

[22] [22]

Towards progress assessment for adaptive hints in educational virtual reality games

Tobias Drey, Pascal Jansen, Fabian Fischbach, Julian From- mel, and Enrico Rukzio. Towards progress assessment for adaptive hints in educational virtual reality games. InEx- tended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, page 1–9, New York, NY , USA, 2020. Association for Computing Machinery. 1

work page 2020

[23] [23]

Amr Elkholy, Mohamed E Hussein, Walid Gomaa, Dima Damen, and Emmanuel Saba. Efficient and robust skeleton- based quality assessment and abnormality detection in hu- man action performance.IEEE journal of biomedical and health informatics, 24(1):280–291, 2019. 2

work page 2019

[24] [24]

X3d: Expanding architectures for efficient video recognition

Christoph Feichtenhofer. X3d: Expanding architectures for efficient video recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 203–213, 2020. 2, 5, 6, 7

work page 2020

[25] [25]

Evostruggle: A dataset capturing the evolution of strug- gle across activities and skill levels.arXiv preprint arXiv:2510.01362, 2025

Shijia Feng, Michael Wray, and Walterio Mayol-Cuevas. Evostruggle: A dataset capturing the evolution of strug- gle across activities and skill levels.arXiv preprint arXiv:2510.01362, 2025. 1

work page arXiv 2025

[26] [26]

Video-based surgical skill assessment using 9 3d convolutional neural networks.International Journal of Computer Assisted Radiology and Surgery, 14(7):1217– 1225, 2019

Isabel Funke, S ¨oren Torge Mees, J ¨urgen Weitz, and Ste- fanie Speidel. Video-based surgical skill assessment using 9 3d convolutional neural networks.International Journal of Computer Assisted Radiology and Surgery, 14(7):1217– 1225, 2019. 3

work page 2019

[27] [27]

Soline Galuret, Nicolas Vall ´ee, Alexandre Tronchot, Herve Thomazeau, Pierre Jannin, and Arnaud Huaulm ´e. Gaze behavior is related to objective technical skills assessment during virtual reality simulator-based surgical training: a proof of concept.International Journal of Computer As- sisted Radiology and Surgery, 18(9):1697–1705, 2023. 2

work page 2023

[28] [28]

Listen to look: Action recognition by previewing audio

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, and Lorenzo Torresani. Listen to look: Action recognition by previewing audio. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 10457–10467,

work page

[29] [29]

Visual-semantic alignment temporal pars- ing for action quality assessment.IEEE Transactions on Circuits and Systems for Video Technology, 2024

Kumie Gedamu, Yanli Ji, Yang Yang, Jie Shao, and Heng Tao Shen. Visual-semantic alignment temporal pars- ing for action quality assessment.IEEE Transactions on Circuits and Systems for Video Technology, 2024. 2

work page 2024

[30] [30]

Using eye tracking to trace a cogni- tive process: Gaze behaviour during decision making in a natural environment.Journal of eye movement research, 6 (1), 2013

Kerstin Gidl ¨of, Annika Wallin, Richard Dewhurst, and Kenneth Holmqvist. Using eye tracking to trace a cogni- tive process: Gaze behaviour during decision making in a natural environment.Journal of eye movement research, 6 (1), 2013. 1, 2

work page 2013

[31] [31]

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 193...

work page 2024

[32] [32]

Control of gaze in natural environments: effects of rewards and costs, uncertainty and memory in target selection.Interface focus, 8(4):20180009, 2018

Mary M Hayhoe and Jonathan Samir Matthis. Control of gaze in natural environments: effects of rewards and costs, uncertainty and memory in target selection.Interface focus, 8(4):20180009, 2018. 2

work page 2018

[33] [33]

1.1 computing’s energy problem (and what we can do about it)

Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14, 2014. 8

work page 2014

[34] [34]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 6

work page 2022

[35] [35]

Predicting gaze in egocentric video by learning task- dependent attention transition

Yifei Huang, Minjie Cai, Zhenqiang Li, and Yoichi Sato. Predicting gaze in egocentric video by learning task- dependent attention transition. InProceedings of the Eu- ropean conference on computer vision (ECCV), pages 754– 769, 2018. 2

work page 2018

[36] [36]

Mutual context network for jointly estimating egocentric gaze and action.IEEE Transactions on Image Processing, 29:7795–7806, 2020

Yifei Huang, Minjie Cai, Zhenqiang Li, Feng Lu, and Yoichi Sato. Mutual context network for jointly estimating egocentric gaze and action.IEEE Transactions on Image Processing, 29:7795–7806, 2020. 4

work page 2020

[37] [37]

Egoexolearn: A dataset for bridging asynchronous ego-and exo-centric view of procedural ac- tivities in real world

Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Li- jin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, et al. Egoexolearn: A dataset for bridging asynchronous ego-and exo-centric view of procedural ac- tivities in real world. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 22072–22086, 2024. 1...

work page 2024

[38] [38]

Vid2coach: Transform- ing how-to videos into task assistants.arXiv preprint arXiv:2506.00717, 2025

Mina Huh, Zihui Xue, Ujjaini Das, Kumar Ashutosh, Kris- ten Grauman, and Amy Pavel. Vid2coach: Transform- ing how-to videos into task assistants.arXiv preprint arXiv:2506.00717, 2025. 1

work page arXiv 2025

[39] [39]

Difference in gaze control ability be- tween low and high skill players of a real-time strategy game in esports.PloS one, 17(3):e0265526, 2022

Inhyeok Jeong, Kento Nakagawa, Rieko Osu, and Kazuyuki Kanosue. Difference in gaze control ability be- tween low and high skill players of a real-time strategy game in esports.PloS one, 17(3):e0265526, 2022. 2, 4, 5

work page 2022

[40] [40]

Eyepiano: leveraging gaze for reflective piano learning

Jakob Karolus, Johannes Sylupp, Albrecht Schmidt, and Paweł W Wo´zniak. Eyepiano: leveraging gaze for reflective piano learning. InProceedings of the 2023 ACM Designing Interactive Systems Conference, pages 1209–1223, 2023. 2

work page 2023

[41] [41]

Generalized and efficient skill assessment from imu data with applications in gymnastics and medical training.ACM Transactions on Computing for Healthcare, 2(1):1–21, 2020

Aftab Khan, Sebastian Mellor, Rachel King, Balazs Janko, William Harwin, R Simon Sherratt, Ian Craddock, and Thomas Pl ¨otz. Generalized and efficient skill assessment from imu data with applications in gymnastics and medical training.ACM Transactions on Computing for Healthcare, 2(1):1–21, 2020. 2

work page 2020

[42] [42]

GazeGPT: Augmenting human capabilities using gaze-contingent contextual ai for smart eyewear,

Robert Konrad, Nitish Padmanaban, J Gabriel Buckmaster, Kevin C Boyle, and Gordon Wetzstein. Gazegpt: Augment- ing human capabilities using gaze-contingent contextual ai for smart eyewear.arXiv preprint arXiv:2401.17217, 2024. 2

work page arXiv 2024

[43] [43]

Scsam- pler: Sampling salient clips from video for efficient action recognition

Bruno Korbar, Du Tran, and Lorenzo Torresani. Scsam- pler: Sampling salient clips from video for efficient action recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6232–6242, 2019. 2

work page 2019

[44] [44]

In the eye of transformer: Global-local correlation for egocentric gaze estimation.arXiv preprint arXiv:2208.04464, 2022

Bolin Lai, Miao Liu, Fiona Ryan, and James M Rehg. In the eye of transformer: Global-local correlation for egocentric gaze estimation.arXiv preprint arXiv:2208.04464, 2022. 2

work page arXiv 2022

[45] [45]

Listen to look into the future: Audio-visual egocen- tric gaze anticipation

Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu, and James M Rehg. Listen to look into the future: Audio-visual egocen- tric gaze anticipation. InEuropean Conference on Com- puter Vision, pages 192–210. Springer, 2024. 2

work page 2024

[46] [46]

The roles of vision and eye movements in the control of activities of daily living.Perception, 28(11):1311–1328, 1999

Michael Land, Neil Mennie, and Jennifer Rusted. The roles of vision and eye movements in the control of activities of daily living.Perception, 28(11):1311–1328, 1999. 2

work page 1999

[47] [47]

Hypercam: Low-power on- board computer vision for iot cameras.arXiv preprint arXiv:2501.10547, 2025

Chae Young Lee, Maxwell Fite, Tejus Rao, Sara Achour, Zerina Kapetanovic, et al. Hypercam: Low-power on- board computer vision for iot cameras.arXiv preprint arXiv:2501.10547, 2025. 3

work page arXiv 2025

[48] [48]

Seungmin Lee and Jongseong An. Gaze control and motor performance in motor expertise studies: Focused review of field application research on perceptual skill training.Inter- national Journal of Applied Sports Sciences, 35(1), 2023. 2, 4, 5

work page 2023

[49] [49]

Multi-skeleton structures graph convolu- tional network for action quality assessment in long videos

Qing Lei, Huiying Li, Hongbo Zhang, Jixiang Du, and Shangce Gao. Multi-skeleton structures graph convolu- tional network for action quality assessment in long videos. Applied Intelligence, 53(19):21692–21705, 2023. 2

work page 2023

[50] [50]

Learning to pre- dict gaze in egocentric video

Yin Li, Alireza Fathi, and James M Rehg. Learning to pre- dict gaze in egocentric video. InProceedings of the IEEE international conference on computer vision, pages 3216– 3223, 2013. 2 10

work page 2013

[51] [51]

In the eye of the be- holder: Gaze and actions in first person video.IEEE trans- actions on pattern analysis and machine intelligence, 45 (6):6731–6747, 2021

Yin Li, Miao Liu, and James M Rehg. In the eye of the be- holder: Gaze and actions in first person video.IEEE trans- actions on pattern analysis and machine intelligence, 45 (6):6731–6747, 2021. 1, 2, 4, 5, 6, 7

work page 2021

[52] [52]

A light weight model for active speaker detection

Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, and Liangyin Chen. A light weight model for active speaker detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22932–22941, 2023. 2

work page 2023

[53] [53]

Ricaˆ2: Rubric- informed, calibrated assessment of actions

Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Sri- nath GNVV Namburi, and Yin Li. Ricaˆ2: Rubric- informed, calibrated assessment of actions. InProceedings of the European Conference on Computer Vision (ECCV),

work page

[54] [54]

Chat2map: Efficient scene mapping from multi-ego conversations

Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Hen- derson, Paul Calamia, Kristen Grauman, and Vamsi Kr- ishna Ithapu. Chat2map: Efficient scene mapping from multi-ego conversations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10554–10564, 2023. 2

work page 2023

[55] [55]

Learning to de- tect attended objects in cultural sites with gaze signals and weak object supervision.ACM Journal on Computing and Cultural Heritage, 17(3):1–21, 2024

Michele Mazzamuto*, Francesco Ragusa*, Antonino Furnari*, and Giovanni Maria Farinella*. Learning to de- tect attended objects in cultural sites with gaze signals and weak object supervision.ACM Journal on Computing and Cultural Heritage, 17(3):1–21, 2024. 2

work page 2024

[56] [56]

Gazing into missteps: Leverag- ing eye-gaze for unsupervised mistake detection in egocen- tric videos of skilled human activities

Michele Mazzamuto, Antonino Furnari, Yoichi Sato, and Giovanni Maria Farinella. Gazing into missteps: Leverag- ing eye-gaze for unsupervised mistake detection in egocen- tric videos of skilled human activities. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8310–8320, 2025. 2

work page 2025

[57] [57]

See like an expert: Gaze- augmented training enhances skill acquisition in a virtual reality robotic suturing task.Journal of Endourology, 35 (3):376–382, 2021

Rachel Melnyk, Timothy Campbell, Tyler Holler, Kather- ine Cameron, Patrick Saba, Michael W Witthaus, Jean Joseph, and Ahmed Ghazi. See like an expert: Gaze- augmented training enhances skill acquisition in a virtual reality robotic suturing task.Journal of Endourology, 35 (3):376–382, 2021. 2

work page 2021

[58] [58]

Project aria glasses user man- ual.https : / / facebookresearch

Meta Platforms, Inc. Project aria glasses user man- ual.https : / / facebookresearch . github . io/projectaria_tools/docs/ARK/glasses_ manual/glasses_user_manual, 2025. Accessed: 2025-10-06. 3, 1

work page 2025

[59] [59]

Integrating human gaze into attention for egocentric activity recognition

Kyle Min and Jason J Corso. Integrating human gaze into attention for egocentric activity recognition. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1069–1078, 2021. 4

work page 2021

[60] [60]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervi- sion.arXiv preprint arXiv:2304.07193, 2023. 5

work page internal anchor Pith review Pith/arXiv arXiv 2023

[61] [61]

Gaze-guided graph neural network for action anticipation conditioned on inten- tion

S ¨uleyman ¨Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, and Enkelejda Kasneci. Gaze-guided graph neural network for action anticipation conditioned on inten- tion. InProceedings of the 2024 Symposium on Eye Track- ing Research and Applications, pages 1–9, 2024. 2

work page 2024

[62] [62]

Advancements in context recog- nition for edge devices and smart eyewear: Sensors and ap- plications.IEEE Access, 2025

Francesca Palermo, Luca Casciano, Lokmane Demagh, Au- relio Teliti, Niccol`o Antonello, Giacomo Gervasoni, Hazem Hesham Yousef Shalby, Marco Brando Paracchini, Simone Mentasti, Hao Quan, et al. Advancements in context recog- nition for edge devices and smart eyewear: Sensors and ap- plications.IEEE Access, 2025. 5, 8

work page 2025

[63] [63]

Basket: A large- scale video dataset for fine-grained skill estimation

Yulu Pan, Ce Zhang, and Gedas Bertasius. Basket: A large- scale video dataset for fine-grained skill estimation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3

work page 2025

[64] [64]

What to say and when to say it: Live fitness coaching as a testbed for situated interaction.Advances in Neural Infor- mation Processing Systems, 37:75853–75882, 2024

Sunny Panchal, Apratim Bhattacharyya, Guillaume Berger, Antoine Mercier, Cornelius B ¨ohm, Florian Dietrichkeit, Reza Pourreza, Xuanlin Li, Pulkit Madan, Mingu Lee, et al. What to say and when to say it: Live fitness coaching as a testbed for situated interaction.Advances in Neural Infor- mation Processing Systems, 37:75853–75882, 2024. 1

work page 2024

[65] [65]

What and how well you performed? a multitask learning approach to ac- tion quality assessment

Paritosh Parmar and Brendan Tran Morris. What and how well you performed? a multitask learning approach to ac- tion quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 304–313, 2019. 1

work page 2019

[66] [66]

Piano skills assessment

Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In2021 IEEE 23rd international work- shop on multimedia signal processing (MMSP), pages 1–5. IEEE, 2021. 2

work page 2021

[67] [67]

Piano skills assessment

Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In2021 IEEE 23rd international work- shop on multimedia signal processing (MMSP), pages 1–5. IEEE, 2021. 3

work page 2021

[68] [68]

Do- main knowledge-informed self-supervised representations for workout form assessment

Paritosh Parmar, Amol Gharat, and Helge Rhodin. Do- main knowledge-informed self-supervised representations for workout form assessment. InEuropean Conference on Computer Vision, pages 105–123. Springer, 2022. 2

work page 2022

[69] [69]

Egotrigger: Toward audio- driven image capture for human memory enhancement in all-day energy-efficient smart glasses.arXiv preprint arXiv:2508.01915, 2025

Akshay Paruchuri, Sinan Hersek, Lavisha Aggarwal, Qiao Yang, Xin Liu, Achin Kulshrestha, Andrea Colaco, Henry Fuchs, and Ishan Chatterjee. Egotrigger: Toward audio- driven image capture for human memory enhancement in all-day energy-efficient smart glasses.arXiv preprint arXiv:2508.01915, 2025. 2, 5, 6, 7

work page arXiv 2025

[70] [70]

Review on eye-hand span in sight-reading of music.Journal of eye movement research, 14(4):10–16910, 2021

Joris Perra, B ´en´edicte Poulin-Charronnat, Thierry Baccino, and V ´eronique Drai-Zerbib. Review on eye-hand span in sight-reading of music.Journal of eye movement research, 14(4):10–16910, 2021. 5

work page 2021

[71] [71]

E2 (go) motion: Motion augmented event stream for egocentric action recognition

Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, and Barbara Caputo. E2 (go) motion: Motion augmented event stream for egocentric action recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19935–19947, 2022. 5, 6

work page 2022

[72] [72]

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone.arXiv preprint arXiv:2307.05463, 2023

Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, and Pengchuan Zhang. Egovlpv2: Egocentric video-language pre-training with fusion in the backbone.arXiv preprint arXiv:2307.05463, 2023. 5

work page arXiv 2023

[73] [73]

Fit- nets: Hints for thin deep nets, 2015

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fit- nets: Hints for thin deep nets, 2015. 5

work page 2015

[74] [74]

11 Electrasight: Fully onboard eye tracking for smart glasses with hybrid eog (heog).IEEE Internet of Things Journal,

Nicolas Sch ¨arer, Federico Villani, Aishwarya Melatur, Steven Peter, Tommaso Polonelli, and Michele Magno. 11 Electrasight: Fully onboard eye tracking for smart glasses with hybrid eog (heog).IEEE Internet of Things Journal,

work page

[75] [75]

Multisensebadminton: Wearable sensor–based biomechanical dataset for evalua- tion of badminton performance.Scientific Data, 11(1):343,

Minwoo Seong, Gwangbin Kim, Dohyeon Yeo, Yumin Kang, Heesan Yang, Joseph DelPreto, Wojciech Matusik, Daniela Rus, and SeungJun Kim. Multisensebadminton: Wearable sensor–based biomechanical dataset for evalua- tion of badminton performance.Scientific Data, 11(1):343,

work page

[76] [76]

Privaceye: privacy-preserving head- mounted eye tracking using egocentric scene image and eye movement features

Julian Steil, Marion Koelle, Wilko Heuten, Susanne Boll, and Andreas Bulling. Privaceye: privacy-preserving head- mounted eye tracking using egocentric scene image and eye movement features. InProceedings of the 11th ACM sym- posium on eye tracking research & applications, pages 1– 10, 2019. 2

work page 2019

[77] [77]

Predicting behaviors of basketball players from first person videos

Shan Su, Jung Pyo Hong, Jianbo Shi, and Hyun Soo Park. Predicting behaviors of basketball players from first person videos. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1501–1510, 2017. 2

work page 2017

[78] [78]

Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent.Journal of vision, 21(3):13–13, 2021

Brian Sullivan, Casimir JH Ludwig, Dima Damen, Walterio Mayol-Cuevas, and Iain D Gilchrist. Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent.Journal of vision, 21(3):13–13, 2021. 2

work page 2021

[79] [79]

Smartapm framework for adaptive power manage- ment in wearable devices using deep reinforcement learn- ing.Scientific Reports, 15(1):6911, 2025

R Sunder, Umesh Kumar Lilhore, Anjani Kumar Rai, Ehab Ghith, Mehdi Tlija, Sarita Simaiya, and Afraz Hussain Ma- jeed. Smartapm framework for adaptive power manage- ment in wearable devices using deep reinforcement learn- ing.Scientific Reports, 15(1):6911, 2025. 2

work page 2025

[80] [80]

Egodistill: Egocentric head motion distillation for efficient video understanding.Advances in Neural Information Pro- cessing Systems, 36:33485–33498, 2023

Shuhan Tan, Tushar Nagarajan, and Kristen Grauman. Egodistill: Egocentric head motion distillation for efficient video understanding.Advances in Neural Information Pro- cessing Systems, 36:33485–33498, 2023. 2, 5, 6, 7, 8

work page 2023