COIVis: Eye-tracking-based Visual Exploration of Concept Learning in MOOC Videos
Pith reviewed 2026-05-17 00:56 UTC · model grok-4.3
The pith
Eye-tracking system maps MOOC video gaze to concept-specific learner states like attention and load.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
COIVis extracts course concepts from multimodal MOOC video content and anchors them to specific spatiotemporal regions as Concepts of Interest. Learners' gaze trajectories are turned into COI sequences, from which five interpretable learner-state features are calculated at the COI level using standard eye-tracking metrics. The resulting narrative multi-view visualization lets instructors move between cohort overviews and individual paths, locate problematic concepts, and compare learning strategies across learners.
What carries the argument
Concepts of Interest (COIs), which anchor abstract course concepts to concrete temporal intervals and screen locations by fusing multimodal video analysis with the lecture's structure.
If this is right
- Instructors can identify both consistent and anomalous learning patterns across a cohort at the level of individual concepts.
- Problematic concepts can be located quickly through the visualization rather than inferred from coarse quiz scores.
- Diverse learner strategies become visible for direct comparison in the same interface.
- Timely, personalized interventions for struggling learners become feasible based on real-time gaze-derived states.
- Instructional design can be optimized by revising video segments tied to low-attention or high-load concepts.
Where Pith is reading between the lines
- The COI alignment technique might transfer to other screen-based instructional videos outside MOOC platforms.
- Combining the gaze-derived features with clickstream data could produce hybrid models that capture both attention and navigation behavior.
- Future validation could test whether the five features remain stable when videos are viewed on different devices or at varying playback speeds.
Load-bearing premise
Eye-tracking metrics can be turned into reliable values for the five learner-state features without further validation or controlled experiments.
What would settle it
A controlled test showing that the computed Attention, Cognitive Load, Interest, Preference, and Synchronicity scores for specific COIs do not predict independent measures of learner comprehension or engagement on those same concepts.
Figures
read the original abstract
Massive Open Online Courses (MOOCs) make high-quality instruction accessible. However, the lack of face-to-face interaction makes it difficult for instructors to obtain feedback on learners' performance and provide more effective instructional guidance. Traditional analytical approaches, such as clickstream logs or quiz scores, capture only coarse-grained learning outcomes and offer limited insight into learners' moment-to-moment cognitive states. In this study, we propose COIVis, an eye tracking-based visual analytics system that supports concept-level exploration of learning processes in MOOC videos. COIVis first extracts course concepts from multimodal video content and aligns them with the temporal structure and screen space of the lecture, defining Concepts of Interest (COIs), which anchor abstract concepts to specific spatiotemporal regions. Learners' gaze trajectories are transformed into COI sequences, and five interpretable learner-state features -- Attention, Cognitive Load, Interest, Preference, and Synchronicity -- are computed at the COI level based on eye tracking metrics. Building on these representations, COIVis provides a narrative, multi-view visualization enabling instructors to move from cohort-level overviews to individual learning paths, quickly locate problematic concepts, and compare diverse learning strategies. We evaluate COIVis through two case studies and in-depth user-feedback interviews. The results demonstrate that COIVis effectively provides instructors with valuable insights into the consistency and anomalies of learners' learning patterns, thereby supporting timely and personalized interventions for learners and optimizing instructional design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents COIVis, an eye-tracking-based visual analytics system for concept-level exploration of learning processes in MOOC videos. It extracts course concepts from multimodal content to define Concepts of Interest (COIs) aligned with temporal and spatial structure, transforms gaze trajectories into COI sequences, and computes five learner-state features (Attention, Cognitive Load, Interest, Preference, Synchronicity) at the COI level from eye-tracking metrics. The system offers narrative multi-view visualizations allowing instructors to move from cohort overviews to individual paths, identify problematic concepts, and compare strategies. Evaluation via two case studies and user interviews claims the system yields insights into learning pattern consistency and anomalies to support interventions and instructional design.
Significance. If the feature mappings hold, this work could advance MOOC analytics by moving beyond coarse clickstream or quiz data to fine-grained, concept-anchored cognitive state insights, with the COI anchoring and multi-view narrative design offering a practical framework for instructor support. The integration of eye-tracking with spatiotemporal concept alignment and the progression from aggregate to individual analysis represent a targeted contribution to educational visual analytics. The case studies and interviews provide initial usability evidence, though the absence of quantitative validation limits assessment of impact.
major comments (1)
- The section describing computation of the five learner-state features states only that they 'are computed at the COI level based on eye tracking metrics' with no equations, specific metrics (fixation duration, saccade amplitude, pupil dilation, etc.), aggregation rules, thresholds, or validation against self-reports or controlled stimuli. This is load-bearing for the central claim because the visualizations, case-study conclusions about 'consistency and anomalies of learners' learning patterns,' and downstream recommendations for interventions all rest on these features accurately reflecting cognitive states rather than heuristic artifacts.
minor comments (1)
- The abstract and evaluation sections would benefit from explicit reference to prior eye-tracking literature used to ground the five features, to clarify how the mappings extend or differ from established metrics.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. The major comment identifies an important area for improvement in the description of the learner-state features, which we agree requires expansion to better support the paper's claims. We address this point below and outline the revisions we will make.
read point-by-point responses
-
Referee: The section describing computation of the five learner-state features states only that they 'are computed at the COI level based on eye tracking metrics' with no equations, specific metrics (fixation duration, saccade amplitude, pupil dilation, etc.), aggregation rules, thresholds, or validation against self-reports or controlled stimuli. This is load-bearing for the central claim because the visualizations, case-study conclusions about 'consistency and anomalies of learners' learning patterns,' and downstream recommendations for interventions all rest on these features accurately reflecting cognitive states rather than heuristic artifacts.
Authors: We agree that the current manuscript provides only a high-level description of the five learner-state features and lacks the requested computational details, which is a valid concern given their central role. In the revised manuscript, we will add a dedicated subsection detailing the specific eye-tracking metrics for each feature (e.g., fixation duration and count for Attention, pupil dilation and saccade velocity for Cognitive Load, dwell time and regression patterns for Interest and Preference, and temporal alignment metrics for Synchronicity), along with the aggregation rules, formulas, and any thresholds or normalization applied at the COI level. These will be grounded in established eye-tracking literature for inferring cognitive states. We will also explicitly discuss the features' interpretability, potential limitations, and the fact that the evaluation relies on qualitative case studies and user interviews rather than direct quantitative validation against self-reports or controlled stimuli. This addition will strengthen the support for the visualizations and conclusions without altering the paper's scope as a visual analytics contribution. revision: yes
Circularity Check
No circularity: system description with external grounding
full rationale
The paper describes a visual analytics system (COIVis) that extracts COIs from MOOC videos, transforms gaze trajectories into sequences, computes five learner-state features from eye-tracking metrics, and supports multi-view visualizations. No equations, fitted parameters, predictions, or derivations appear in the provided text. The central claims rest on case studies, user interviews, and citations to external eye-tracking literature rather than any self-referential loop or input-renamed-as-output. The work is self-contained as a tool-building contribution without load-bearing self-citations or definitional circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Eye-tracking metrics can be mapped to cognitive states such as Attention, Cognitive Load, Interest, Preference, and Synchronicity
invented entities (1)
-
Concepts of Interest (COIs)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.ArXiv Preprint ArXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
E. Alemdag and K. Cagiltay. A systematic review of eye tracking research on multimedia learning.Computers & Education, 125:413– 428, 2018
work page 2018
-
[3]
M. Ally. Competency profile of the digital and online teacher in future education.International Review of Research in Open and Distributed Learning, 20(2), 2019
work page 2019
-
[4]
G. Andrienko, N. Andrienko, M. Burch, and D. Weiskopf. Visual analytics methodology for eye movement studies.IEEE transactions on Visualization and Computer Graphics, 18(12):2889–2898, 2012
work page 2012
-
[5]
V . Bachurina and M. Arsalidou. Multiple levels of mental attentional demand modulate peak saccade velocity and blink rate.Heliyon, 8(1), 2022
work page 2022
-
[6]
R. Baker, D. Xu, J. Park, R. Yu, Q. Li, B. Cung, C. Fischer, F. Rodriguez, M. Warschauer, and P. Smyth. The benefits and caveats of using clickstream data to understand student self-regulatory behaviors: opening the black box of learning processes.International Journal of Educational Technology in Higher Education, 17(1):13, 2020
work page 2020
-
[7]
V . Balasubramanian, S. G. Doraisamy, and N. K. Kanakarajan. A multimodal approach for extracting content descriptive metadata from lecture videos.Journal of Intelligent Information Systems, 46:121–145,
-
[8]
doi: 10.1007/s10844-015-0356-5
-
[9]
J. Barria-Pineda, J. Guerra, Y . Huang, and P. Brusilovsky. Concept- level knowledge visualization for supporting self-regulated learning. InCompanion Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 141–144, 2017
work page 2017
-
[10]
Z. Betaitia, A. Chefrour, and S. Drissi. Exploring Dropout Rates in MOOC Research: A Bibliometric Analysis.Journal of Learning for Development, 12(1):76–92, Mar. 2025. doi: 10.56059/jl4d.v12i1.1597
-
[11]
A. M. Borghi. Concepts for which we need others more: The case of abstract concepts.Current Directions in Psychological Science, 31(3):238–246, 2022
work page 2022
-
[12]
D. B ¨uhler, F. Hemmert, and J. Hurtienne. Universal and intuitive? scientific guidelines for icon design. InProceedings of Mensch Und Computer 2020, pp. 91–103. 2020
work page 2020
-
[13]
D. B ¨uhler, F. Hemmert, J. Hurtienne, and C. Petersen. Design- ing universal and intuitive pictograms (uipp)–a detailed process for more suitable visual representations.International journal of human- computer studies, 163:102816, 2022
work page 2022
- [14]
-
[15]
K.-F. Chen, G.-J. Hwang, and M.-R. A. Chen. Effects of a concept mapping-guided virtual laboratory learning approach on students’ sci- ence process skills and behavioral patterns.Educational Technology Research and Development, 72(3):1623–1651, 2024
work page 2024
-
[16]
Q. Chen, Y . Chen, D. Liu, C. Shi, Y . Wu, and H. Qu. Peakvizor: Visual analytics of peaks in video clickstreams from massive open online courses.IEEE transactions on visualization and computer graphics, 22(10):2315–2330, 2015
work page 2015
-
[17]
Q. Chen, X. Yue, X. Plantaz, Y . Chen, C. Shi, T.-C. Pong, and H. Qu. Viseq: Visual analytics of learning sequence in massive open online courses.IEEE Transactions on Visualization and Computer Graphics, 26(3):1622–1636, 2018. doi: 10.1109/TVCG.2018.2872961
-
[18]
Z. Chuikova, A. Izmalkova, P. Shirokova, Y . Shtyrov, and A. My- achykov. Eye movement correlates of working memory capacity: Evidence from the reading span task.Psychology, Journal of the Higher School of Economics, 21(3):472–487, 2024. doi: 10.17323/1813-8918 -2024-3-472-487
- [19]
-
[20]
I.-D. Cis ,mas,u, B. R. Cibu, L.-A. Cotfas, and C. Delcea. The Persistence Puzzle: Bibliometric Insights into Dropout in MOOCs.Sustainability, 17(7):2952, Jan. 2025. doi: 10.3390/su17072952
-
[21]
H. C. Cuve, J. Stojanov, X. Roberts-Gaal, C. Catmur, and G. Bird. Validation of gazepoint low-cost eye-tracking and psychophysiology bundle.Behavior Research Methods, 54(2):1027–1049, 2022
work page 2022
-
[22]
R. Deng and Y . Gao. A review of eye tracking research on video-based learning.Education and Information Technologies, 28(6):7671–7702,
-
[23]
doi: 10.1007/s10639-022-11486-7 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, JUNE 20XX 16
-
[24]
L. L. Di Stasi, A. Catena, J. J. Ca ˜nas, S. L. Macknik, and S. Martinez- Conde. Saccadic velocity as an arousal index in naturalistic tasks. Neuroscience & Biobehavioral Reviews, 37(5):968–975, 2013
work page 2013
-
[25]
L. L. Di Stasi, R. Renner, P. Staehr, J. R. Helmert, B. M. Velichkovsky, J. J. Ca ˜nas, A. Catena, and S. Pannasch. Saccadic peak velocity sensitivity to variations in mental workload.Aviation, space, and environmental medicine, 81(4):413–417, 2010
work page 2010
-
[26]
A. T. Duchowski, K. Krejtz, I. Krejtz, C. Biele, A. Niedzielska, P. Kiefer, M. Raubal, and I. Giannopoulos. The index of pupillary activity: Measuring cognitive load vis- `a-vis task difficulty with pupil oscillation. InProceedings of the 2018 CHI conference on human factors in computing systems, pp. 1–13, 2018
work page 2018
-
[27]
I. El Haddioui. Eye tracking applications for e-learning purposes: An overview and perspectives.Cognitive Computing in Technology- Enhanced Learning, pp. 151–174, 2019
work page 2019
-
[28]
T. Evans and I. Jeong. Concept maps as assessment for learning in uni- versity mathematics.Educational Studies in Mathematics, 113(3):475– 498, 2023
work page 2023
-
[29]
J. A. Fredricks, P. C. Blumenfeld, and A. H. Paris. School engagement: Potential of the concept, state of the evidence.Review of educational research, 74(1):59–109, 2004
work page 2004
-
[30]
K. Ghosh, S. R. Nangi, Y . Kanchugantla, P. G. Rayapati, P. K. Bhowmick, and P. Goyal. Augmenting video lectures: Identifying off-topic concepts and linking to relevant video lecture segments. International Journal of Artificial Intelligence in Education, 32(2):382– 412, 2022
work page 2022
- [31]
-
[32]
C. R. Henrie, L. R. Halverson, and C. R. Graham. Measuring student engagement in technology-mediated learning: A review.Computers & Education, 90:36–53, 2015
work page 2015
-
[33]
C. L. Hicks, C. L. von Baeyer, P. A. Spafford, I. van Korlaar, and B. Goodenough. The faces pain scale–revised: toward a common metric in pediatric pain measurement.Pain, 93(2):173–183, 2001
work page 2001
-
[34]
S. Hidi and K. A. Renninger. The four-phase model of interest development.Educational psychologist, 41(2):111–127, 2006
work page 2006
-
[35]
J. Hollander and S. Huette. Extracting blinks from continuous eye- tracking data in a mind wandering paradigm.Consciousness and Cognition, 100:103303, 2022
work page 2022
-
[36]
K. Holmqvist, M. Nystr ¨om, R. Andersson, R. Dewhurst, H. Jarodzka, and J. Van de Weijer.Eye tracking: A comprehensive guide to methods and measures. oup Oxford, 2011
work page 2011
-
[37]
Hu.A Reception Study of Machine Translated Subtitles for MOOCs
K. Hu.A Reception Study of Machine Translated Subtitles for MOOCs. PhD thesis, Dublin City University, 2020
work page 2020
-
[38]
S. Hutt, J. Hardey, R. Bixler, A. Stewart, E. Risko, and S. K. D’Mello. Gaze-based detection of mind wandering during lecture viewing.International Educational Data Mining Society, 2017
work page 2017
-
[39]
P. Immadisetty, P. Rajesh, A. Gupta, A. M. R, S. A, and K. N. Subramanya. Multimodality in Online Education: A Comparative Study, Dec. 2023. doi: 10.48550/arXiv.2312.05797
-
[40]
A. S. Imran, A. Moreno, and F. A. Cheikh. Exploiting visual cues in non-scripted lecture videos for multi-modal action recognition. In 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems, pp. 8–14. IEEE, 2012
work page 2012
-
[41]
International Organization for Standardization. Iso 3864-1:2011 graph- ical symbols — safety colours and safety signs — part 1: Design principles for safety signs and safety markings. https://www.iso.org/ standard/51021.html, 2011
work page 2011
-
[42]
S. J. Isherwood, S. J. McDougall, and M. B. Curry. Icon identifica- tion in context: The changing role of icon characteristics with user experience.Human factors, 49(3):465–476, 2007
work page 2007
-
[43]
D. Jang, I. Yang, and S. Kim. Detecting mind-wandering from eye movement and oculomotor data during learning video lecture. Education Sciences, 10(3):51, 2020
work page 2020
-
[44]
B. Jeon and N. Park. Dropout prediction over weeks in moocs by learning representations of clicks and videos.arXiv preprint arXiv:2002.01955, 2020
-
[45]
M. A. Just and P. A. Carpenter. A theory of reading: from eye fixations to comprehension.Psychological Review, 87(4):329, 1980. doi: 10. 1037/0033-295X.87.4.329
work page 1980
-
[46]
T. Kar, P. Kanungo, S. N. Mohanty, S. Groppe, and J. Groppe. Video shot-boundary detection: issues, challenges and solutions.Artificial Intelligence Review, 57(4):104, 2024
work page 2024
-
[47]
T. Kattenborn, J. Leitloff, F. Schiefer, and S. Hinz. Review on convolutional neural networks (cnn) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173:24–49, 2021
work page 2021
-
[48]
K. R. Koedinger, J. Kim, J. Z. Jia, E. A. McLaughlin, and N. L. Bier. Learning is not a spectator sport: Doing is better than watching for learning from a mooc. InProceedings of the second (2015) ACM conference on learning@ scale, pp. 111–120, 2015
work page 2015
-
[49]
I. Krajbich, C. Armel, and A. Rangel. Visual fixations and the computation and comparison of value in simple choice.Nature neuroscience, 13(10):1292–1298, 2010
work page 2010
-
[50]
I. Krajbich and A. Rangel. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value- based decisions.Proceedings of the National Academy of Sciences, 108(33):13852–13857, 2011
work page 2011
- [51]
- [52]
-
[53]
J. Littenberg-Tobias, J. A. Ruip ´erez-Valiente, and J. Reich. Studying learner behavior in online courses with free-certificate coupons: Results from two case studies.International Review of Research in Open and Distributed Learning, 21(1):1–22, 2020
work page 2020
-
[54]
C. Liu, J. Kim, and H.-C. Wang. Conceptscape: Collaborative concept mapping for video learning. InProceedings of the 2018 CHI conference on human factors in computing systems, pp. 1–12, 2018
work page 2018
-
[55]
Q. Liu, X. Yang, Z. Chen, and W. Zhang. Using synchronized eye movements to assess attentional engagement.Psychological research, 87(7):2039–2047, 2023
work page 2039
-
[56]
H. Luo, T. Koszalka, and M. Zuo. Investigating the effects of visual cues in multimedia instruction using eye tracking. InBlended Learning: Aligning Theory with Practices: 9th International Conference, ICBL 2016, Beijing, China, July 19-21, 2016, Proceedings 9, pp. 63–72. Springer, 2016
work page 2016
-
[57]
Q. Lyu, S. Havaldar, A. Stein, L. Zhang, D. Rao, E. Wong, M. Apid- ianaki, and C. Callison-Burch. Faithful chain-of-thought reasoning. InThe 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023), 2023
work page 2023
- [58]
-
[59]
J. Madsen and L. C. Parra. Cognitive processing of a common stimulus synchronizes brains, hearts, and eyes.PNAS nexus, 1(1):pgac020, 2022
work page 2022
-
[60]
A. Maffei and A. Angrilli. Spontaneous eye blink rate: An index of dopaminergic component of sustained attention and fatigue.Interna- tional Journal of Psychophysiology, 123:58–63, 2018
work page 2018
- [61]
- [62]
-
[63]
S. J. McDougall, O. De Bruijn, and M. B. Curry. Exploring the effects of icon characteristics on user performance: The role of icon concreteness, complexity, and distinctiveness.Journal of Experimental Psychology: Applied, 6(4):291, 2000
work page 2000
-
[64]
S. Mu, M. Cui, X. J. Wang, J. X. Qiao, and D. M. Tang. Learners’ attention preferences of information in online learning: An empirical study based on eye-tracking.Interactive Technology and Smart Edu- cation, 16(3):186–203, 2019
work page 2019
- [65]
-
[66]
A. Narimani and E. Barber `a. Extracting course features and learner pro- filing for course recommendation systems: A comprehensive literature review.The International Review of Research in Open and Distributed Learning, 25(1):197–225, 2024
work page 2024
-
[67]
M. Navarro, ´A. Becerra, R. Daza, R. Cobos, A. Morales, and J. Fierrez. V AAD: Visual Attention Analysis Dashboard applied to e-Learning, Sept. 2024. doi: 10.48550/arXiv.2405.20091
-
[68]
J. D. Novak and A. J. Ca ˜nas. The theory underlying concept maps and how to construct and use them. 2008
work page 2008
-
[69]
M. M. Nujid and D. A. Tholibon. A review of engagement strategies for massive open online courses.International Journal of Evaluation and Research in Education (IJERE), 2024. doi: 10.11591/ijere.v13i5. 29158 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, JUNE 20XX 17
-
[70]
A. Olsen. The tobii i-vt fixation filter.Tobii Technology, 21(4-19):5, 2012
work page 2012
-
[71]
D. F. O. Onah, J. Sinclair, and R. Boyatt. Dropout rates of massive open online courses : Behavioural patterns. In L. G ´omez Chova, A. L ´opez Mart ´ınez, and I. Candel Torres, eds.,EDULEARN14 Pro- ceedings, pp. 5825–5834. IATED Academy, Barcelona, Spain, 2014
work page 2014
-
[72]
Z. Papamitsiou and A. A. Economides. Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence.Journal of educational technology & society, 17(4):49–64, 2014
work page 2014
-
[73]
P. R. Pintrich. The role of motivation in promoting and sustaining self-regulated learning.International journal of educational research, 31(6):459–470, 1999
work page 1999
-
[74]
J. L. Plass, R. Moreno, and R. Br ¨unken. Cognitive load theory. 2010
work page 2010
-
[75]
M. I. Posner and M. K. Rothbart. Research on attention networks as a model for the integration of psychological science.Annu. Rev. Psychol., 58(1):1–23, 2007
work page 2007
-
[76]
L. Rai, Z. Yue, T. Yang, R. Shadiev, and N. Sun. General impact of MOOC assessment methods on learner engagement and performance. In2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), pp. 1–4, Aug. 2017. doi: 10.1109/UMEDIA. 2017.8074135
- [77]
-
[78]
S. Ren, L. Yao, S. Li, X. Sun, and L. Hou. TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14313–14323. IEEE, Seattle, W A, USA, June
-
[79]
doi: 10.1109/CVPR52733.2024.01357
-
[80]
K. A. Renninger and S. Hidi.The power of interest for motivation and engagement. Routledge, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.