Recognition: unknown
See Through the Noise: Improving Domain Generalization in Gaze Estimation
Pith reviewed 2026-05-10 08:33 UTC · model grok-4.3
The pith
SeeTN framework improves cross-domain gaze estimation by identifying label noise through prototype-based semantic alignment and transferring information from clean to noisy samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing a semantic embedding space through prototype-based transformation, SeeTN preserves a consistent topological structure between gaze features and continuous labels. Feature-label affinity consistency then distinguishes noisy from clean samples, and a novel affinity regularization transfers gaze-related information from clean to noisy samples in the semantic manifold. This promotes semantic structure alignment and enforces domain-invariant gaze relationships, enhancing robustness against label noise and achieving superior cross-domain generalization without compromising source-domain accuracy.
What carries the argument
Prototype-based semantic embedding space with affinity regularization that measures consistency to separate samples and transfers information across the manifold.
Load-bearing premise
Measuring feature-label affinity consistency in the prototype-based semantic space reliably separates noisy from clean samples and the regularization step preserves true gaze relationships rather than spreading errors.
What would settle it
Experiments on gaze datasets with controlled label noise that show SeeTN producing no cross-domain accuracy gains or drops in source-domain performance would indicate the approach fails to separate or correct noise effectively.
Figures
read the original abstract
Generalizable gaze estimation methods have garnered increasing attention due to their critical importance in real-world applications and have achieved significant progress. However, they often overlook the effect of label noise, arising from the inherent difficulty of acquiring precise gaze annotations, on model generalization performance. In this paper, we are the first to comprehensively investigate the negative effects of label noise on generalization in gaze estimation. Further, we propose a novel solution, called See-Through-Noise (SeeTN) framework, which improves generalization from a novel perspective of mitigating label noise. Specifically, we propose to construct a semantic embedding space via a prototype-based transformation to preserve a consistent topological structure between gaze features and continuous labels. We then measure feature-label affinity consistency to distinguish noisy from clean samples, and introduce a novel affinity regularization in the semantic manifold to transfer gaze-related information from clean to noisy samples. Our proposed SeeTN promotes semantic structure alignment and enforces domain-invariant gaze relationships, thereby enhancing robustness against label noise. Extensive experiments demonstrate that our SeeTN effectively mitigates the adverse impact of source-domain noise, leading to superior cross-domain generalization without compromising the source-domain accuracy, and highlight the importance of explicitly handling noise in generalized gaze estimation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to be the first to systematically study the negative effects of label noise on domain generalization in gaze estimation. It proposes the SeeTN framework, which builds a prototype-based semantic embedding space to preserve topological structure between features and continuous gaze labels, identifies noisy samples by measuring feature-label affinity consistency, and applies a novel affinity regularization on the semantic manifold to transfer gaze-related information from clean to noisy samples. The method is said to promote semantic alignment and domain-invariant relationships, yielding superior cross-domain generalization without degrading source-domain accuracy, as supported by extensive experiments.
Significance. If the empirical claims hold, the work would be significant as the first explicit treatment of label noise in gaze DG, addressing a practical issue in real-world annotation. The prototype-based handling of continuous labels and the affinity regularization idea are technically interesting extensions beyond standard DG techniques. Credit is due for the reproducible experimental protocol implied by the extensive cross-domain tests and the focus on not harming source accuracy.
major comments (2)
- [method description of prototype-based transformation and affinity consistency] The prototype construction step (described in the method for the semantic embedding space): prototypes are derived from the full training set containing noisy labels. This creates a risk that noisy samples distort prototype locations and affinity scores, undermining reliable separation of clean vs. noisy samples. In gaze estimation, where labels are continuous angles and domain shifts alter feature distributions, the consistency metric may misclassify samples, causing the subsequent regularization to propagate errors rather than preserve true gaze relationships. This directly affects the load-bearing claim that SeeTN mitigates source noise for better generalization.
- [experiments and results] Experiments section: while the abstract asserts that SeeTN 'effectively mitigates the adverse impact of source-domain noise' and shows 'superior cross-domain generalization,' no quantitative results, ablation studies on the affinity regularization, or error analysis on prototype contamination are referenced in the provided summary. Without these, it is difficult to verify whether gains are robust or sensitive to the noise detection threshold.
minor comments (2)
- [abstract and method] The abstract states the method 'preserves a consistent topological structure' but does not specify the exact loss or distance metric used in the prototype transformation; adding this detail would improve clarity.
- [introduction] Consider adding a short discussion of how the approach differs from existing noise-robust learning methods in other continuous regression tasks (e.g., head pose estimation) to better position the novelty.
Simulated Author's Rebuttal
We thank the referee for the careful review and valuable feedback on our work. We address each major comment below with clarifications and indicate planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: The prototype construction step (described in the method for the semantic embedding space): prototypes are derived from the full training set containing noisy labels. This creates a risk that noisy samples distort prototype locations and affinity scores, undermining reliable separation of clean vs. noisy samples. In gaze estimation, where labels are continuous angles and domain shifts alter feature distributions, the consistency metric may misclassify samples, causing the subsequent regularization to propagate errors rather than preserve true gaze relationships. This directly affects the load-bearing claim that SeeTN mitigates source noise for better generalization.
Authors: We appreciate the referee's concern about potential prototype distortion from noisy labels. In SeeTN, prototypes are computed from the full set to preserve the overall topological structure between features and continuous gaze labels, but the affinity consistency metric is then applied to quantify deviations and identify noisy samples for targeted regularization. This design aims to limit error propagation by transferring information primarily from clean samples on the semantic manifold. While our cross-domain results indicate effective noise mitigation in practice, we acknowledge the validity of the point and will add a dedicated analysis of prototype stability and sensitivity to initial noise contamination in the revised manuscript. revision: partial
-
Referee: Experiments section: while the abstract asserts that SeeTN 'effectively mitigates the adverse impact of source-domain noise' and shows 'superior cross-domain generalization,' no quantitative results, ablation studies on the affinity regularization, or error analysis on prototype contamination are referenced in the provided summary. Without these, it is difficult to verify whether gains are robust or sensitive to the noise detection threshold.
Authors: The manuscript reports extensive cross-domain experiments demonstrating improved generalization and preserved source accuracy under noisy conditions. However, we agree that more granular ablations specifically isolating the affinity regularization component and quantitative error analysis on prototype contamination would better substantiate robustness and sensitivity to the noise threshold. We will incorporate these additional results and analyses in the revised experiments section. revision: yes
Circularity Check
No circularity: algorithmic framework with independent procedural steps
full rationale
The paper describes SeeTN as a sequence of algorithmic operations: prototype-based transformation to build a semantic embedding space, affinity consistency measurement for sample separation, and affinity regularization for information transfer. These are presented as design choices without equations or derivations that collapse predictions back to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core components. The method remains self-contained as an empirical proposal validated through experiments rather than tautological reductions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Prototype-based transformation preserves consistent topological structure between gaze features and continuous labels.
- domain assumption Feature-label affinity consistency distinguishes noisy from clean samples.
Reference graph
Works this paper leans on
-
[1]
Robust bi-tempered logistic loss based on bregman divergences.Advances in Neural Information Processing Systems, 32, 2019
Ehsan Amid, Manfred KK Warmuth, Rohan Anil, and Tomer Koren. Robust bi-tempered logistic loss based on bregman divergences.Advances in Neural Information Processing Systems, 32, 2019. 3
2019
-
[2]
From feature to gaze: A gener- alizable replacement of linear layer for gaze estimation
Yiwei Bao and Feng Lu. From feature to gaze: A gener- alizable replacement of linear layer for gaze estimation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1409–1418, 2024. 2, 6
2024
-
[3]
Gen- eralizing gaze estimation with rotation consistency
Yiwei Bao, Yunfei Liu, Haofei Wang, and Feng Lu. Gen- eralizing gaze estimation with rotation consistency. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4207–4216, 2022. 1, 2
2022
-
[4]
Appearance-based gaze estimation using attention and difference mechanism
Pradipta Biswas et al. Appearance-based gaze estimation using attention and difference mechanism. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3143–3152, 2021. 1
2021
-
[5]
Utilizing vr and gaze tracking to develop ar solutions for in- dustrial maintenance
Alisa Burova, John M ¨akel¨a, Jaakko Hakulinen, Tuuli Keski- nen, Hanna Heinonen, Sanni Siltanen, and Markku Turunen. Utilizing vr and gaze tracking to develop ar solutions for in- dustrial maintenance. InProceedings of the 2020 CHI con- ference on human factors in computing systems, pages 1–13,
2020
-
[6]
Source-free adaptive gaze estimation by uncertainty reduc- tion
Xin Cai, Jiabei Zeng, Shiguang Shan, and Xilin Chen. Source-free adaptive gaze estimation by uncertainty reduc- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 22035–22045,
-
[7]
Deep semantic gaze embedding and scanpath comparison for expertise classification during opt viewing
Nora Castner, Thomas C Kuebler, Katharina Scheiter, Ju- liane Richter, Th´er´ese Eder, Fabian H ¨uttig, Constanze Keu- tel, and Enkelejda Kasneci. Deep semantic gaze embedding and scanpath comparison for expertise classification during opt viewing. InACM symposium on eye tracking research and applications, pages 1–10, 2020. 1
2020
-
[8]
A coarse-to-fine adaptive network for appearance- based gaze estimation
Yihua Cheng, Shiyao Huang, Fei Wang, Chen Qian, and Feng Lu. A coarse-to-fine adaptive network for appearance- based gaze estimation. InProceedings of the AAAI confer- ence on artificial intelligence, pages 10623–10630, 2020. 6
2020
-
[9]
Puregaze: Purifying gaze feature for generalizable gaze estimation
Yihua Cheng, Yiwei Bao, and Feng Lu. Puregaze: Purifying gaze feature for generalizable gaze estimation. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 436–443, 2022. 1, 2, 6
2022
-
[10]
Appearance-based gaze estimation with deep learning: A re- view and benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):7509–7528, 2024
Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. Appearance-based gaze estimation with deep learning: A re- view and benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):7509–7528, 2024. 5
2024
-
[11]
Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras
Kenneth Alberto Funes Mora, Florent Monay, and Jean- Marc Odobez. Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. InProceedings of the symposium on eye tracking research and applications, pages 255–258, 2014. 5
2014
-
[12]
Towards understanding deep learning from noisy labels with small-loss criterion
Xian-Jin Gui, Wei Wang, and Zhang-Hao Tian. Towards un- derstanding deep learning from noisy labels with small-loss criterion.arXiv preprint arXiv:2106.09291, 2021. 3
-
[13]
Identity mappings in deep residual networks
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. InEuropean conference on computer vision, pages 630–645. Springer,
-
[14]
Gaze360: Physically uncon- strained gaze estimation in the wild
Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Ma- tusik, and Antonio Torralba. Gaze360: Physically uncon- strained gaze estimation in the wild. InProceedings of the IEEE/CVF international conference on computer vision, pages 6912–6921, 2019. 1, 5
2019
-
[15]
Gaze-contingent ocular parallax rendering for virtual reality.ACM Transactions on Graphics (TOG), 39(2):1–12,
Robert Konrad, Anastasios Angelopoulos, and Gordon Wet- zstein. Gaze-contingent ocular parallax rendering for virtual reality.ACM Transactions on Graphics (TOG), 39(2):1–12,
-
[16]
Pinpointing: Precise head-and eye-based target selection for augmented reality
Mikko Kyt ¨o, Barrett Ens, Thammathip Piumsomboon, Gun A Lee, and Mark Billinghurst. Pinpointing: Precise head-and eye-based target selection for augmented reality. In Proceedings of the 2018 CHI conference on human factors in computing systems, pages 1–14, 2018. 1
2018
-
[17]
Latentgaze: Cross-domain gaze estima- tion through gaze-aware analytic latent code manipulation
Isack Lee, Jun-Seok Yun, Hee Hyeon Kim, Youngju Na, and Seok Bong Yoo. Latentgaze: Cross-domain gaze estima- tion through gaze-aware analytic latent code manipulation. InProceedings of the asian conference on computer vision, pages 3379–3395, 2022. 6
2022
-
[18]
Cleannet: Transfer learning for scalable image clas- sifier training with label noise
Kuang-Huei Lee, Xiaodong He, Lei Zhang, and Linjun Yang. Cleannet: Transfer learning for scalable image clas- sifier training with label noise. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 5447–5456, 2018. 1
2018
-
[19]
Junnan Li, Richard Socher, and Steven CH Hoi
Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020. 3, 6, 1
-
[20]
Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. Webvision database: Visual learning and under- standing from web data.arXiv preprint arXiv:1708.02862,
-
[21]
De-confounded gaze estimation
Ziyang Liang, Yiwei Bao, and Feng Lu. De-confounded gaze estimation. InEuropean Conference on Computer Vision, pages 219–235. Springer, 2024. 6
2024
-
[22]
Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 7
2008
-
[23]
Manifold alignment for person independent appearance- based gaze estimation
Timo Schneider, Boris Schauerte, and Rainer Stiefelhagen. Manifold alignment for person independent appearance- based gaze estimation. In2014 22nd international confer- ence on pattern recognition, pages 1167–1172. IEEE, 2014. 1
2014
-
[24]
Selfie: Re- furbishing unclean samples for robust deep learning
Hwanjun Song, Minseok Kim, and Jae-Gil Lee. Selfie: Re- furbishing unclean samples for robust deep learning. InIn- ternational conference on machine learning, pages 5907–
-
[25]
Learning from noisy labels with deep neural 9 networks: A survey.IEEE transactions on neural networks and learning systems, 34(11):8135–8153, 2022
Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from noisy labels with deep neural 9 networks: A survey.IEEE transactions on neural networks and learning systems, 34(11):8135–8153, 2022. 1
2022
-
[26]
Learning with noisy labels via self- supervised adversarial noisy masking
Yuanpeng Tu, Boshen Zhang, Yuxi Li, Liang Liu, Jian Li, Jiangning Zhang, Yabiao Wang, Chengjie Wang, and Cai Rong Zhao. Learning with noisy labels via self- supervised adversarial noisy masking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16186–16195, 2023. 3
2023
-
[27]
Generalizing eye tracking with bayesian adversarial learning
Kang Wang, Rui Zhao, Hui Su, and Qiang Ji. Generalizing eye tracking with bayesian adversarial learning. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11907–11916, 2019. 1, 6
2019
-
[28]
Suppressing uncertainty in gaze estimation
Shijing Wang and Yaping Huang. Suppressing uncertainty in gaze estimation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5581–5589, 2024. 3, 6, 1
2024
-
[29]
Con- trastive regression for domain adaptation on gaze estimation
Yaoming Wang, Yangzhou Jiang, Jin Li, Bingbing Ni, Wen- rui Dai, Chenglin Li, Hongkai Xiong, and Teng Li. Con- trastive regression for domain adaptation on gaze estimation. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 19376–19385, 2022. 1, 2
2022
-
[30]
Fine-grained classification with noisy labels
Qi Wei, Lei Feng, Haoliang Sun, Ren Wang, Chenhui Guo, and Yilong Yin. Fine-grained classification with noisy labels. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 11651–11660, 2023. 3
2023
-
[31]
Collaborative contrastive learning for cross- domain gaze estimation.Pattern Recognition, 161:111244,
Lifan Xia, Yong Li, Xin Cai, Zhen Cui, Chunyan Xu, and Antoni B Chan. Collaborative contrastive learning for cross- domain gaze estimation.Pattern Recognition, 161:111244,
-
[32]
Learning from massive noisy labeled data for im- age classification
Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. Learning from massive noisy labeled data for im- age classification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2691–2699,
-
[33]
Learning a gener- alized gaze estimator from gaze-consistent feature
Mingjie Xu, Haofei Wang, and Feng Lu. Learning a gener- alized gaze estimator from gaze-consistent feature. InPro- ceedings of the AAAI conference on artificial intelligence, pages 3027–3035, 2023. 2, 6
2023
-
[34]
How does disagreement help gener- alization against label corruption? InInternational confer- ence on machine learning, pages 7164–7173
Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. How does disagreement help gener- alization against label corruption? InInternational confer- ence on machine learning, pages 7164–7173. PMLR, 2019. 3
2019
-
[35]
Appearance-based gaze estimation in the wild
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. Appearance-based gaze estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4511–4520, 2015. 1
2015
-
[36]
Mpiigaze: Real-world dataset and deep appearance- based gaze estimation.IEEE transactions on pattern analy- sis and machine intelligence, 41(1):162–175, 2017
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. Mpiigaze: Real-world dataset and deep appearance- based gaze estimation.IEEE transactions on pattern analy- sis and machine intelligence, 41(1):162–175, 2017. 5
2017
-
[37]
It’s written all over your face: Full-face appearance- based gaze estimation
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. It’s written all over your face: Full-face appearance- based gaze estimation. InProceedings of the IEEE confer- ence on computer vision and pattern recognition workshops, pages 51–60, 2017. 6
2017
-
[38]
Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation
Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. InEuropean conference on computer vi- sion, pages 365–381. Springer, 2020. 5
2020
-
[39]
Rankmatch: Fostering confidence and consistency in learning with noisy labels
Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, and Guanbin Li. Rankmatch: Fostering confidence and consistency in learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1644–1654, 2023. 3 10 See Through the Noise: Improving Domain Generalization in Gaze Estimation Supplementar...
2023
-
[40]
Algorithm 1SeeTN 1:Input:NetworkF(·, θ f ), RegressorG(·, θ g), Dataset DS, Prototypesµ
SeeTN Algorithm The full algorithm for implementing SeeTN is shown in Al- gorithm 1. Algorithm 1SeeTN 1:Input:NetworkF(·, θ f ), RegressorG(·, θ g), Dataset DS, Prototypesµ. 2:θ f , θg = WarmUp(G(F(·, θf ), θg)) 3:Get initialD C S andD N S byη 4:whilet <MaxEpochdo 5:foritem = 1tonum itersdo 6:FromD C S, draw a mini-batch{(x C i , yC i ), i= 1...BC} 7:From...
-
[41]
Experiments on the Synthetic Noisy Dataset First, we provide an explanation of the rationale underly- ing the synthetic noisy dataset design
Additional Quantitative Experiments 2.1. Experiments on the Synthetic Noisy Dataset First, we provide an explanation of the rationale underly- ing the synthetic noisy dataset design. In tasks involving noisy labels, simultaneously obtaining both noisy labels and truly accurate annotations from real-world datasets is of- ten challenging, thereby limiting t...
-
[42]
Visualization of Unseen Domain In tha main paper, we have shown the feature visualization of our SeeTN in source domain
Additional Visualization Results 3.1. Visualization of Unseen Domain In tha main paper, we have shown the feature visualization of our SeeTN in source domain. To further demonstrate the generalizable abilities, we provide the t-SNE visualiza- tion results of baseline and SeeTN on the unseen domain MPIIGaze, as illustrated in Fig. 1. It can be observed tha...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.