Recognition: unknown
GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer
Pith reviewed 2026-05-09 18:57 UTC · model grok-4.3
The pith
GMGaze conditions CLIP features on four learned prototype banks for illumination, background, head pose and appearance before early fusion in a multi-scale transformer with sparse MoE layers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GMGaze introduces semantic prototype conditioning to modulate the CLIP global image embedding using four learned prototype banks (illumination, background, head pose, and appearance) and thereby produces two complementary context-biased global tokens. These tokens are fused at the first layer with CLIP patch tokens and CNN tokens inside a multi-scale transformer; each token then routes through sparse Mixture-of-Experts modules that supply conditional computation. Adversarial domain adaptation together with a feature separation loss keeps the global tokens de-correlated for cross-domain transfer.
What carries the argument
semantic prototype conditioning, which modulates the CLIP global embedding with four learned prototype banks to produce context-biased global tokens for early unified fusion inside the multi-scale transformer
If this is right
- Within-domain mean angular errors fall to 2.49°, 3.22°, 10.16°, and 1.44° on MPIIFaceGaze, EYEDIAP, Gaze360, and ETH-XGaze respectively.
- The model outperforms prior baselines on all four within-domain benchmarks.
- State-of-the-art results appear on two standard cross-domain transfer routes.
- Early-layer fusion of the context-biased tokens prevents the information loss that occurs when features are merged only at the end of the network.
Where Pith is reading between the lines
- The same prototype-bank mechanism could be tested on other context-sensitive vision tasks such as facial action unit detection where lighting and pose also vary independently.
- Because the MoE layers add capacity only where needed, the architecture may allow higher-resolution input images without a linear rise in total parameters.
- Keeping the two global tokens de-correlated may offer a lightweight way to disentangle scene factors in other multimodal models that currently rely on heavy contrastive losses.
Load-bearing premise
The four learned prototype banks can be trained to produce useful, non-redundant context information that improves downstream gaze prediction without overfitting to the specific training distributions of the four benchmarks.
What would settle it
Training and testing on a held-out dataset that combines illumination and head-pose values outside the ranges of MPIIFaceGaze, EYEDIAP, Gaze360, and ETH-XGaze; if mean angular error then rises above the best non-prototype baseline, the conditioning step does not generalize.
Figures
read the original abstract
Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image features, lack of factor-aware conditioning, and impractical capacity scaling. To address these challenges, we propose Globally-conditioned Multi-scale Gaze estimation (GMGaze), which leverages a multi-scale transformer architecture. Specifically, the model first introduces semantic prototype conditioning, which modulates the CLIP global image embedding using four learned prototype banks (i.e., illumination, background, head pose and appearance) to generate two complementary context-biased global tokens. These tokens, along with the CLIP patch and CNN tokens, are fused at the first layer. This early unified fusion prevents information loss common in late-stage merging. Finally, each token passes through sparse Mixture-of-Experts modules, providing conditional computational capacity without uniformly increasing dense parameters. For cross-domain adaptation, we incorporate an adversarial domain adaptation technique with a feature separation loss that encourages the two global tokens to remain de-correlated. Experiments using four public benchmarks (MPIIFaceGaze, EYEDIAP, Gaze360, and ETH-XGaze) show that GMGaze achieves mean angular errors of 2.49$^\circ$, 3.22$^\circ$, 10.16$^\circ$, and 1.44$^\circ$, respectively, outperforming previous baselines in all within-domain settings. In cross-domain evaluations, it provides state-of-the-art (SOTA) results on two standard transfer routes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GMGaze, a gaze estimation architecture that introduces semantic prototype conditioning: four learned prototype banks (illumination, background, head pose, appearance) modulate the CLIP global embedding to produce two complementary context-biased global tokens. These tokens are early-fused at the first layer with CLIP patch tokens and CNN tokens inside a multi-scale transformer; each token then routes through sparse Mixture-of-Experts layers. An adversarial domain-adaptation module with an explicit feature-separation loss is added to encourage decorrelation of the two global tokens for cross-domain transfer. On four public benchmarks the model reports mean angular errors of 2.49°, 3.22°, 10.16°, and 1.44° and claims state-of-the-art within-domain performance plus SOTA on two standard cross-domain routes.
Significance. If the reported error reductions are shown to arise from the semantic conditioning and early-fusion mechanism rather than from increased effective capacity or benchmark-specific tuning, the work would usefully address the late-fusion and factor-aware-conditioning limitations noted in prior CNN-, transformer-, and CLIP-based gaze estimators. The concrete numerical results on standard public datasets and the inclusion of both within- and cross-domain protocols are positive features that facilitate direct comparison.
major comments (2)
- [Methods (prototype conditioning)] Methods section (semantic prototype conditioning): the central explanatory claim is that the four learned prototype banks generate distinct, non-redundant context-biased tokens that improve gaze prediction. No ablation removing individual banks, no activation statistics across the banks, and no quantitative disentanglement or redundancy metrics are provided; without these it is impossible to rule out that the observed gains (e.g., 2.49° on MPIIFaceGaze) simply reflect added capacity rather than the claimed factor-aware conditioning.
- [Experiments] Experiments section (cross-domain results): the claim of SOTA on two standard transfer routes is load-bearing for the generalization argument, yet the manuscript supplies neither the exact source-target pairs used, nor statistical significance tests, nor comparisons against the full set of recent domain-adaptation baselines. This leaves the cross-domain superiority difficult to evaluate.
minor comments (2)
- [Abstract] Abstract and §4: the phrase “impractical capacity scaling” is used without accompanying parameter counts or FLOPs tables comparing GMGaze to the cited baselines.
- [Methods] Notation: the two complementary global tokens produced by the prototype banks are referred to interchangeably as “context-biased global tokens” and “global tokens”; a single consistent label would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation of our contributions.
read point-by-point responses
-
Referee: Methods section (semantic prototype conditioning): the central explanatory claim is that the four learned prototype banks generate distinct, non-redundant context-biased tokens that improve gaze prediction. No ablation removing individual banks, no activation statistics across the banks, and no quantitative disentanglement or redundancy metrics are provided; without these it is impossible to rule out that the observed gains (e.g., 2.49° on MPIIFaceGaze) simply reflect added capacity rather than the claimed factor-aware conditioning.
Authors: We agree that the current manuscript would benefit from explicit empirical validation to demonstrate that the gains derive from the semantic prototype conditioning mechanism rather than from added capacity alone. The four prototype banks are designed to capture distinct factors (illumination, background, head pose, appearance) and produce complementary tokens via modulation of the CLIP global embedding, with early fusion intended to integrate this information effectively before the multi-scale transformer and MoE layers. To address this, we will add in the revised version: individual ablations removing each bank and reporting the resulting angular errors; activation or routing statistics across banks on sample images; and quantitative disentanglement metrics such as pairwise cosine similarity or correlation between the two context-biased tokens. These additions will provide direct evidence for the non-redundant, factor-aware nature of the conditioning. revision: yes
-
Referee: Experiments section (cross-domain results): the claim of SOTA on two standard transfer routes is load-bearing for the generalization argument, yet the manuscript supplies neither the exact source-target pairs used, nor statistical significance tests, nor comparisons against the full set of recent domain-adaptation baselines. This leaves the cross-domain superiority difficult to evaluate.
Authors: We will clarify and expand the cross-domain evaluation in the revision. We will explicitly state the exact source-target pairs corresponding to the two standard transfer routes on which SOTA is claimed. We will add statistical significance testing (e.g., paired t-tests on the per-subject or per-sequence angular errors) to support the reported improvements. We will also broaden the baseline comparisons to include additional recent domain-adaptation methods from the gaze estimation literature. The adversarial domain-adaptation module together with the feature-separation loss is intended to promote decorrelation of the two global tokens and thereby improve transfer; we will include further implementation details and ablation results on this component to aid evaluation. revision: yes
Circularity Check
No circularity; empirical results on external benchmarks with no derivation chain
full rationale
The paper describes an architectural model (GMGaze) involving semantic prototype conditioning with four learned banks, early fusion of tokens, sparse MoE layers, and adversarial domain adaptation with a feature separation loss. It reports mean angular errors on four public external benchmarks (MPIIFaceGaze, EYEDIAP, Gaze360, ETH-XGaze) and claims SOTA in some cross-domain settings. No equations, derivations, or first-principles predictions are present in the manuscript. Performance claims are measured directly against held-out data from independent datasets rather than reducing to internally fitted quantities or self-referential definitions. The prototype banks are a modeling choice whose utility is evaluated empirically, not derived by construction from the results themselves.
Axiom & Free-Parameter Ledger
free parameters (1)
- four semantic prototype banks
Reference graph
Works this paper leans on
-
[1]
Eg-net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism.Expert Systems with Applications, 238:122363, 2024
Xinmei Wu, Lin Li, Haihong Zhu, Gang Zhou, Linfeng Li, Fei Su, Shen He, Yanggang Wang, and Xue Long. Eg-net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism.Expert Systems with Applications, 238:122363, 2024
2024
-
[2]
Appearance-basedgazeestimationusingdeepfeaturesandrandom forest regression.Knowledge-Based Systems, 110:293–301, 2016
Yafei Wang, Tianyi Shen, Guoliang Yuan, Jiming Bian, and Xianping Fu. Appearance-basedgazeestimationusingdeepfeaturesandrandom forest regression.Knowledge-Based Systems, 110:293–301, 2016
2016
-
[3]
Self- calibrateddrivergazeestimationviagazepatternlearning.Knowledge- Based Systems, 235:107630, 2022
Guoliang Yuan, Yafei Wang, Huizhu Yan, and Xianping Fu. Self- calibrateddrivergazeestimationviagazepatternlearning.Knowledge- Based Systems, 235:107630, 2022
2022
-
[4]
Bio-inspired vision mimetics toward next-generation collision- avoidance automation.The Innovation, 4(1), 2023
Gary JW Xu, Kun Guo, Seop Hyeong Park, Poly ZH Sun, and Aiguo Song. Bio-inspired vision mimetics toward next-generation collision- avoidance automation.The Innovation, 4(1), 2023
2023
-
[5]
It’s written all over your face: Full-face appearance-based gaze estimation
XucongZhang,YusukeSugano,MarioFritz,andAndreasBulling. It’s written all over your face: Full-face appearance-based gaze estimation. InProceedingsoftheIEEEconferenceoncomputervisionandpattern recognition workshops, pages 51–60, 2017
2017
-
[6]
A coarse-to-fine adaptive network for appearance-based gaze estimation
Yihua Cheng, Shiyao Huang, Fei Wang, Chen Qian, and Feng Lu. A coarse-to-fine adaptive network for appearance-based gaze estimation. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 10623–10630, 2020
2020
-
[7]
Gaze estimation using transformer
Yihua Cheng and Feng Lu. Gaze estimation using transformer. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 3341–3347. IEEE, 2022
2022
-
[8]
Gazeclip: Towards enhancing gaze estimation via text guidance.arXiv preprint arXiv:2401.00260, 2023
Jun Wang, Hao Ruan, Mingjie Wang, Chuanghui Zhang, Huachun Li, and Jun Zhou. Gazeclip: Towards enhancing gaze estimation via text guidance.arXiv preprint arXiv:2401.00260, 2023
-
[9]
Collaborative contrastive learning for cross-domain gaze estimation.Pattern Recognition, 161:111244, 2025
Lifan Xia, Yong Li, Xin Cai, Zhen Cui, Chunyan Xu, and Antoni B Chan. Collaborative contrastive learning for cross-domain gaze estimation.Pattern Recognition, 161:111244, 2025
2025
-
[10]
Gaze label alignment: Alleviating domain shift for gaze estimation
Guanzhong Zeng, Jingjing Wang, Zefu Xu, Pengwei Yin, Wenqi Ren, Di Xie, and Jiang Zhu. Gaze label alignment: Alleviating domain shift for gaze estimation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9780–9788, 2025
2025
-
[11]
Slippage-robust linear features for eye tracking
Tawaana Gustad Homavazir, VS Raghu Parupudi, Surya LSR Pilla, and Pamela Cosman. Slippage-robust linear features for eye tracking. Expert Systems with Applications, 264:125799, 2025
2025
-
[12]
Xinyuan Zhao, Xianrui Chen, and Ahmad Chaddad. Gazeformer-moe: Context-aware gaze estimation via clip and moe transformer.arXiv preprint arXiv:2601.12316, 2026
-
[13]
Eye gaze estimation: A survey on deep learning- based approaches.Expert Systems with Applications, 199:116894, 2022
Primesh Pathirana, Shashimal Senarath, Dulani Meedeniya, and Sampath Jayarathna. Eye gaze estimation: A survey on deep learning- based approaches.Expert Systems with Applications, 199:116894, 2022
2022
-
[14]
Appearance- based gaze estimation with deep learning: A review and benchmark
Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. Appearance- based gaze estimation with deep learning: A review and benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):7509–7528, 2024
2024
-
[15]
Mimicking visual searching with integrated top down cues and low-level features.Neurocomputing, 133:1–17, 2014
Jiawei Xu and Shigang Yue. Mimicking visual searching with integrated top down cues and low-level features.Neurocomputing, 133:1–17, 2014
2014
-
[16]
A bio-inspired motion sensitive model and its application to estimating human gaze positionsunderclassifieddrivingconditions.Neurocomputing,345:23– 35, 2019
Jiawei Xu, Seop Hyeong Park, and Xiaoqin Zhang. A bio-inspired motion sensitive model and its application to estimating human gaze positionsunderclassifieddrivingconditions.Neurocomputing,345:23– 35, 2019
2019
-
[17]
Mpiigaze: Real-world dataset and deep appearance-based gaze estima- tion.IEEE transactions on pattern analysis and machine intelligence, 41(1):162–175, 2017
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. Mpiigaze: Real-world dataset and deep appearance-based gaze estima- tion.IEEE transactions on pattern analysis and machine intelligence, 41(1):162–175, 2017
2017
-
[18]
Puregaze: Purifying gaze feature for generalizable gaze estimation
Yihua Cheng, Yiwei Bao, and Feng Lu. Puregaze: Purifying gaze feature for generalizable gaze estimation. InProceedings of the AAAI ConferenceonArtificialIntelligence,volume36,pages436–443,2022
2022
-
[19]
Generalizinggaze estimation with rotation consistency
YiweiBao,YunfeiLiu,HaofeiWang,andFengLu. Generalizinggaze estimation with rotation consistency. InProceedings of the IEEE/CVF ConferenceonComputerVisionandPatternRecognition,pages4207– 4216, 2022
2022
-
[20]
Generalizing gaze estimation with outlier-guided collaborative adaptation
Yunfei Liu, Ruicong Liu, Haofei Wang, and Feng Lu. Generalizing gaze estimation with outlier-guided collaborative adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3835–3844, 2021
2021
-
[21]
Yi Tian, Xiyun Wang, Sihui Zhang, Wanru Xu, Yi Jin, and Yaping Huang.‘disengageandintegrate’:Personalizedcausalnetworkforgaze estimation.IEEE Transactions on Image Processing, 34:3733–3747, 2025
2025
-
[22]
Test time prompt tuning for domain adaptive gaze estimation
Jingjing Wang and Pengwei Yin. Test time prompt tuning for domain adaptive gaze estimation. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025
2025
-
[23]
Deep face profiler(defap): Towardsexplicit, non-restrained, non-invasive, facial and gaze comprehension.Expert Systems with Applications, 254:124425, 2024
WasiqKhan,LukeTopham,HibaAlsmadi,AlaAlKafri,andHoshang Kolivand. Deep face profiler(defap): Towardsexplicit, non-restrained, non-invasive, facial and gaze comprehension.Expert Systems with Applications, 254:124425, 2024
2024
-
[24]
In the eye of transformer: Global–local correlation for egocentric gaze estimation and beyond.International Journal of Computer Vision, 132(3):854– 871, 2024
Bolin Lai, Miao Liu, Fiona Ryan, and James M Rehg. In the eye of transformer: Global–local correlation for egocentric gaze estimation and beyond.International Journal of Computer Vision, 132(3):854– 871, 2024
2024
-
[25]
Dvgaze: Dual-view gaze estimation
Yihua Cheng and Feng Lu. Dvgaze: Dual-view gaze estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20632–20641, October 2023
2023
-
[26]
Clip-driven dual feature enhancing network for gaze estimation.arXiv e-prints, pages arXiv–2502, 2025
LinZhang,YiTian,WanruXu,YiJin,andYapingHuang. Clip-driven dual feature enhancing network for gaze estimation.arXiv e-prints, pages arXiv–2502, 2025
2025
-
[27]
Cr- clip: Image-text contrastive regression for generalized gaze estimation
Yitong Zhu, Xurong Xie, Naiming Yao, Hui Chen, and Feng Tian. Cr- clip: Image-text contrastive regression for generalized gaze estimation. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025
2025
-
[28]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013
work page internal anchor Pith review arXiv 2013
-
[29]
Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016
2016
-
[30]
EYEDIAP: A database for the development and evaluation of gaze estimationalgorithmsfromRGBandRGB-Dcameras
Kenneth Alberto Funes Mora, Florent Monay, and Jean-Marc Odobez. EYEDIAP: A database for the development and evaluation of gaze estimationalgorithmsfromRGBandRGB-Dcameras. InProceedings of the ACM Symposium on Eye Tracking Research and Applications (ETRA), pages 255–258. ACM, 2014
2014
-
[31]
ETH-XGaze: A large scale dataset for gaze estimationunderextremeheadposeandgazevariation
Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. ETH-XGaze: A large scale dataset for gaze estimationunderextremeheadposeandgazevariation. InProceedings First Author et al.:Preprint submitted to ElsevierPage 14 of 15 Short Title of the Article of the European Conference on Computer Vision (ECCV), pages 365–
-
[32]
Gaze360: Physically unconstrained gaze estimation in the wild
Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. Gaze360: Physically unconstrained gaze estimation in the wild. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2176–2184. IEEE, 2019
2019
-
[33]
It’s written all over your face: Full-face appearance-based gaze estimation
XucongZhang,YusukeSugano,MarioFritz,andAndreasBulling. It’s written all over your face: Full-face appearance-based gaze estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2299–2308. IEEE, 2017
2017
-
[34]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, GabrielGoh,SandhiniAgarwal,GirishSastry,AmandaAskell,Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
2021
-
[35]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regular- ization.arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
Appearance-basedgazeestimation using dilated-convolutions
ZhaokangChenandBertramEShi. Appearance-basedgazeestimation using dilated-convolutions. InAsian Conference on Computer Vision, pages 309–324. Springer, 2018
2018
-
[37]
Adaptive feature fusion network for gaze tracking in mobile tablets
Yiwei Bao, Yihua Cheng, Yunfei Liu, and Feng Lu. Adaptive feature fusion network for gaze tracking in mobile tablets. In2020 25th International Conference on Pattern Recognition (ICPR), pages 9936–
-
[38]
Attention-guided and fine-grained feature extraction from face images for gaze estimation.Engineering Applications of Artificial Intelligence, 126:106994, 2023
ChenglinWu,HuanqiangHu,KeanLin,QingWang,TianjianLiu,and Guannan Chen. Attention-guided and fine-grained feature extraction from face images for gaze estimation.Engineering Applications of Artificial Intelligence, 126:106994, 2023
2023
-
[39]
Differential contrastive training for gaze estimation
LinZhang,YiTian,XiyunWang,WanruXu,YiJin,andYapingHuang. Differential contrastive training for gaze estimation. InProceedings of the 33rd ACM International Conference on Multimedia, pages 3477– 3486, 2025
2025
-
[40]
Appearance debiased gaze estimation via stochastic subject-wise adversarial learning.Pattern Recognition, 152:110441, 2024
Suneung Kim, Woo-Jeoung Nam, and Seong-Whan Lee. Appearance debiased gaze estimation via stochastic subject-wise adversarial learning.Pattern Recognition, 152:110441, 2024
2024
-
[41]
Democratizing eye-tracking? appearance-based gaze estimation with improved attention branch.Engineering Applications of Artificial Intelligence, 149:110494, 2025
Eduard Kuric, Peter Demcak, Jozef Majzel, and Giang Nguyen. Democratizing eye-tracking? appearance-based gaze estimation with improved attention branch.Engineering Applications of Artificial Intelligence, 149:110494, 2025
2025
-
[42]
Joint pyramidal perceptual attention and hierarchical consistency constraint for gaze estimation.Computer Vision and Image Understanding, 248:104105, 2024
Haiying Xia, Zhuolin Gong, Yumei Tan, and Shuxiang Song. Joint pyramidal perceptual attention and hierarchical consistency constraint for gaze estimation.Computer Vision and Image Understanding, 248:104105, 2024
2024
-
[43]
Adgaze: Anisotropic gaussian label distribution learning for fine-grained gaze estimation.Pattern Recognition, 164:111536, 2025
Duantengchuan Li, Shutong Wang, Wanli Zhao, Lingyun Kang, Liangshan Dong, Jiazhang Wang, and Xiaoguang Wang. Adgaze: Anisotropic gaussian label distribution learning for fine-grained gaze estimation.Pattern Recognition, 164:111536, 2025
2025
-
[44]
Nonlinear multi-head cross-attention network and programmable gradient information for gaze estimation
Yujie Li, Yuhang Hong, Ziwen Wang, Jiahui Chen, Rongjie Liu, Shuxue Ding, and Benying Tan. Nonlinear multi-head cross-attention network and programmable gradient information for gaze estimation. Scientific Reports, 15(1):27135, 2025
2025
-
[45]
Frequency-spatial interaction network for gaze estimation
Yuanning Jia, Zhi Liu, Ying Lv, Xiaofeng Lu, Xuefeng Liu, and Jie Chen. Frequency-spatial interaction network for gaze estimation. Displays, 86:102878, 2025
2025
-
[46]
Slyklatent: A learning framework for gaze estimation using deep facial feature learning.IEEE Transactions on Human-Machine Systems, 2025
Samuel Adebayo, Joost C Dessing, and Seán McLoone. Slyklatent: A learning framework for gaze estimation using deep facial feature learning.IEEE Transactions on Human-Machine Systems, 2025
2025
-
[47]
Yupeng Zhong and Sang Hun Lee. Gazesymcat: A symmetric cross- attention transformer for robust gaze estimation under extreme head poses and gaze variations.Journal of Computational Design and Engineering, 12(3):115–129, 2025
2025
-
[48]
Irisgeometrictransformationguideddeepappearance-basedgaze estimation.IEEE Transactions on Image Processing, 2025
Wei Nie, Zhiyong Wang, Weihong Ren, Hanlin Zhang, and Honghai Liu. Irisgeometrictransformationguideddeepappearance-basedgaze estimation.IEEE Transactions on Image Processing, 2025
2025
-
[49]
Hongyu Qu, Jianan Wei, Xiangbo Shu, Yazhou Yao, Wenguan Wang, and Jinhui Tang. Omnigaze: Reward-inspired generalizable gaze estimation in the wild.arXiv preprint arXiv:2510.13660, 2025
-
[50]
Contrastive regression for domain adaptation on gaze estimation
Yaoming Wang, Yangzhou Jiang, Jin Li, Bingbing Ni, Wenrui Dai, Chenglin Li, Hongkai Xiong, and Teng Li. Contrastive regression for domain adaptation on gaze estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19376–19385, 2022
2022
-
[51]
Jitter does matter: Adapting gaze estimation to new domains
Ruicong Liu, Yiwei Bao, Mingjie Xu, Haofei Wang, Yunfei Liu, and Feng Lu. Jitter does matter: Adapting gaze estimation to new domains. arXiv preprint arXiv:2210.02082, 2022
-
[52]
Latentgaze: Cross-domain gaze estimation through gaze-aware analytic latent code manipulation
Isack Lee, Jun-Seok Yun, Hee Hyeon Kim, Youngju Na, and Seok Bong Yoo. Latentgaze: Cross-domain gaze estimation through gaze-aware analytic latent code manipulation. InProceedings of the asian conference on computer vision, pages 3379–3395, 2022
2022
-
[53]
Ghr-2d: Gaze and head redirection via disentanglement and diffusion for gaze estimation
Daosong Hu, Mingyue Cui, and Kai Huang. Ghr-2d: Gaze and head redirection via disentanglement and diffusion for gaze estimation. Engineering Applications of Artificial Intelligence, 160:111901, 2025
2025
-
[54]
Pnp-ga+: Plug- and-play domain adaptation for gaze estimation using model variants
Ruicong Liu, Yunfei Liu, Haofei Wang, and Feng Lu. Pnp-ga+: Plug- and-play domain adaptation for gaze estimation using model variants. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3707–3721, 2024
2024
-
[55]
Rt-gene:Real- time eye gaze estimation in natural environments
TobiasFischer,HyungJinChang,andYiannisDemiris. Rt-gene:Real- time eye gaze estimation in natural environments. InProceedings of the European conference on computer vision (ECCV), pages 334–352, 2018
2018
-
[56]
Gazecapsnet: A lightweight gaze estimation framework.Sensors, 25(4):1224, 2025
Shakhnoza Muksimova, Yakhyokhuja Valikhujaev, Sabina Umirza- kova, Jushkin Baltayev, and Young Im Cho. Gazecapsnet: A lightweight gaze estimation framework.Sensors, 25(4):1224, 2025
2025
-
[57]
Learning to prompt for vision-language models.International journal of computer vision, 130(9):2337–2348, 2022
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.International journal of computer vision, 130(9):2337–2348, 2022. First Author et al.:Preprint submitted to ElsevierPage 15 of 15
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.