A Novel Contactless Human Machine Interface based on Machine Learning

Frederic Magoules; Qinmeng Zou

arxiv: 1907.04390 · v1 · pith:XBPMSZUVnew · submitted 2019-07-09 · 💻 cs.HC

A Novel Contactless Human Machine Interface based on Machine Learning

Frederic Magoules , Qinmeng Zou This is my paper

Pith reviewed 2026-05-24 23:57 UTC · model grok-4.3

classification 💻 cs.HC

keywords contactless human-machine interfacehand gesture recognitioncomputer visionmachine learningwebcamvirtual interfacesgesture-based control

0 comments

The pith

A standard webcam combined with computer vision and machine learning suffices for rich contactless computer control equivalent to a mouse and keyboard through simple hand gestures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a global framework for contactless human-machine interaction that depends only on a simple image acquisition device such as a computer camera. Established computer vision methods capture and process images while machine learning detects and tracks hand gestures in real time. This setup lets users operate virtual interfaces with basic gestures to achieve interaction comparable to physical peripherals. A sympathetic reader would care because the claim removes the need for specialized hardware, showing that everyday equipment can support full computer operation. The work focuses on assembling known techniques into a practical system rather than introducing new algorithms.

Core claim

The paper describes a global framework that enables contactless human machine interaction using computer vision and machine learning techniques. The main originality of the framework is that only a very simple image acquisition device, as a computer camera, is sufficient to establish a rich human machine interaction as traditional devices such as mouse or keyboard. This framework is based on well known computer vision techniques and efficient machine learning techniques are used to detect and track user hand gestures so the end user can control his computer using virtual interfaces with very simple gestures.

What carries the argument

The global framework that integrates computer vision techniques for image capture and processing with machine learning for real-time hand gesture detection and tracking to drive virtual interface control.

If this is right

Users achieve mouse- and keyboard-equivalent computer control without physical contact or specialized hardware.
Simple gestures suffice to operate virtual interfaces through continuous real-time tracking.
Standard, readily available computer vision and machine learning methods can be assembled into a complete contactless input system.
Interaction becomes feasible in settings where touching devices is impractical or restricted.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support accessibility for people who cannot operate physical input devices due to motor limitations.
Contactless control may reduce shared-device hygiene issues in public or clinical environments.
The same camera-based pipeline might extend to other simple input tasks such as menu navigation in embedded systems.

Load-bearing premise

The framework assumes that well-known computer vision techniques combined with efficient machine learning can reliably detect and track user hand gestures in real time to enable control via virtual interfaces with very simple gestures.

What would settle it

A controlled test in which the system fails to maintain accurate real-time gesture detection and tracking under ordinary indoor lighting changes, cluttered backgrounds, or varied hand positions would show that a simple camera does not suffice for the claimed level of interaction.

Figures

Figures reproduced from arXiv: 1907.04390 by Frederic Magoules, Qinmeng Zou.

**Figure 1.** Figure 1: Global overview of the framework. i.e., a small latency between the commands given by the end user with hand motions and the execution of the actions on the machine. The plan of the paper is the following. Section 2 gives a global description of the framework and of its modular architecture. In Section 3, the different modules of the framework are detailed together with some implementation issues. Section… view at source ↗

**Figure 2.** Figure 2: Architecture overview. 3 Detailed Description of the Architecture 3.1 Functions to Isolate Zones of Interest Module FIZI module, which stands for Functions to Isolate Zones of Interest, is the module in charge of the video segmentation part. Its main goal is to segment and select the zones of interests in each image of the video sequence. In our case, the zones of interest are the hands of the end user, an… view at source ↗

**Figure 3.** Figure 3: Diagram of the mapping approaches. From the left to the right, absolute, relative [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Sequence of gesture for typing the word ‘fox’. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

This paper describes a global framework that enables contactless human machine interaction using computer vision and machine learning techniques. The main originality of our framework is that only a very simple image acquisition device, as a computer camera, is sufficient to establish a rich human machine interaction as traditional devices such as mouse or keyboard. This framework is based on well known computer vision techniques and efficient machine learning techniques are used to detect and track user hand gestures so the end user can control his computer using virtual interfaces with very simple gestures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a standard camera-based hand gesture system using off-the-shelf CV and ML but supplies no accuracy numbers, robustness tests, or implementation details to support its claim that this replaces mouse and keyboard.

read the letter

The core claim is that a basic webcam plus known computer vision and machine learning methods can deliver usable contactless control through simple hand gestures. Nothing in the abstract or title indicates a new algorithm, new feature, or new theoretical result; the work is framed as an application of existing tools to build virtual interfaces. That framing is honest but also limits what the paper can contribute beyond a high-level description of the pipeline. The main positive is that the setup is deliberately low-cost and avoids specialized hardware, which keeps the idea accessible for accessibility or constrained environments. The description of using gestures to interact with virtual controls is clear at the conceptual level. The central weakness is the complete absence of any quantitative evidence. No recognition rates, false-positive figures, latency measurements, or tests under varying lighting, backgrounds, or users appear in the provided text. Without those numbers the assertion that the system is sufficient for real interaction rests on an untested assumption rather than demonstrated performance. The paper also does not compare against prior gesture systems or discuss failure modes, so it is hard to judge whether the approach improves on what already exists. This leaves the manuscript as a sketch rather than a completed piece of work. It might interest someone building a quick prototype for a specific niche, but it does not contain enough substance for a reading group or for citation in research that needs reproducible results. I would not send it to peer review in its current form; the lack of validation makes it unsuitable for a serious referee process.

Referee Report

2 major / 0 minor

Summary. The manuscript describes a global framework for contactless human-machine interaction that relies on a standard computer camera together with well-known computer vision techniques and efficient machine learning methods to detect and track hand gestures, thereby allowing users to control a computer through virtual interfaces with simple gestures. The central originality asserted is that this minimal hardware setup is sufficient to deliver rich interaction equivalent to traditional devices such as a mouse or keyboard.

Significance. If the performance claims were demonstrated with quantitative evidence, the work could contribute to accessible and natural user interfaces in HCI by showing that commodity cameras can replace physical input devices. The absence of any implementation details, accuracy metrics, latency figures, or robustness tests, however, prevents any assessment of whether the claimed sufficiency holds.

major comments (2)

[Abstract] Abstract: The claim that 'only a very simple image acquisition device, as a computer camera, is sufficient to establish a rich human machine interaction as traditional devices such as mouse or keyboard' is presented without any supporting evidence, recognition rates, false-positive rates, latency benchmarks, or tests across lighting/background/user variation. This assertion is load-bearing for the entire contribution.
[Abstract] Abstract: The framework is said to rest on 'well known computer vision techniques' and 'efficient machine learning techniques' for real-time hand-gesture detection and tracking, yet no specific methods, training data, or performance characterization are supplied, leaving the reliability of the real-time pipeline unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review of our manuscript. We address the major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'only a very simple image acquisition device, as a computer camera, is sufficient to establish a rich human machine interaction as traditional devices such as mouse or keyboard' is presented without any supporting evidence, recognition rates, false-positive rates, latency benchmarks, or tests across lighting/background/user variation. This assertion is load-bearing for the entire contribution.

Authors: The manuscript presents a framework for contactless interaction and asserts that a simple camera is sufficient based on the maturity of computer vision and machine learning methods for hand tracking. The paper does not include quantitative benchmarks because its contribution lies in the system-level integration rather than in new algorithmic performance. We believe this is a valid contribution, though we acknowledge that empirical validation would be valuable for future work. revision: no
Referee: [Abstract] Abstract: The framework is said to rest on 'well known computer vision techniques' and 'efficient machine learning techniques' for real-time hand-gesture detection and tracking, yet no specific methods, training data, or performance characterization are supplied, leaving the reliability of the real-time pipeline unverified.

Authors: The use of 'well known' techniques is deliberate to highlight that the novelty is in the application to contactless HMI rather than in new CV or ML methods. The manuscript describes the overall approach at the framework level, without delving into implementation specifics or performance numbers. revision: no

Circularity Check

0 steps flagged

No derivation chain or equations present; framework is descriptive only

full rationale

The paper describes a high-level framework for contactless HMI using unspecified 'well known computer vision techniques' and 'efficient machine learning techniques' to detect/track hand gestures. No equations, parameters, predictions, or self-citations appear in the provided text. The central claim reduces to an assertion that standard CV+ML suffice, without any fitted inputs, self-definitional steps, or load-bearing citations that could create circularity. This is a normal non-finding for a non-mathematical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5599 in / 1037 out tokens · 24990 ms · 2026-05-24T23:57:36.818382+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Trackingandrecognisinghandgestures using statistical shape models

T.Ahmad, C.Taylor, A.Lanitis, andT.Cootes. Trackingandrecognisinghandgestures using statistical shape models. InProceedings of 6th British Conf on Machine vision, Vol.2, pages 403–412, Surrey, UK, 1995. BMVA Press

work page 1995
[2]

Cipolla and A

R. Cipolla and A. Pentland.Computer vision for human machine interaction. Cam- bridge University Press, 1998

work page 1998
[3]

A. Dix, J. Finlay, G. Abowd, and R. Beale.Human computer interaction. Pearson Prentice Hall, 2004

work page 2004
[4]

Gianni and P

F. Gianni and P. Dalle. Interaction visuo-gestuelle avec un mur d’images. In Pro- ceedings of 2nd International Society for Gesture Studies: Interacting Bodies / Corps en interaction , Lyon, 15-18 Jun. 2005. Ecole Normale Supérieure Lettres et Sciences Humaines, juin 2005

work page 2005
[5]

Joseph and J

J. Joseph and J. LaViola. A survey of hand posture and gesture recognition techniques and technology. Technical Report CS-99-11, 1999. Brown University Providence, RI, USA

work page 1999
[6]

Kjeldsen, A

R. Kjeldsen, A. Levas, and C. Pinhanez. Dynamically reconﬁgurable vision-based user interfaces. Mach. Vision Appl., 16(1):6–12, 2004

work page 2004
[7]

F. Lai, F. Magoulès, and F. Lherminier. Vapnik’s learning theory applied to energy con- sumption forecasts in residential buildings.International Journal of Computer Mathe- matics, 85(10):1563–1588, 2008

work page 2008
[8]

Lenmann, L

S. Lenmann, L. Bretzner, and B. Thuresson. Computer vision based hand gesture inter- faces for human computer interaction. Technical report, Royal Institute of Technology of Sweden, 2002. 6

work page 2002
[9]

Magoulès, M

F. Magoulès, M. Piliougine, and D. Elizondo. Support vector regression for electric- ity consumption prediction in a building in japan. InProceedings of IEEE Intl Conf on Computational Science and Engineering (CSE) and IEEE Intl Conf on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symp on Distributed Computing and Applications for Business Engine...

work page 2016
[10]

Magoulès, H.-X

F. Magoulès, H.-X. Zhao, and D. Elizondo. Development of an RDP neural network for building energy consumption fault detection diagnosis.Energy and Buildings, 62:133– 138, 2013

work page 2013
[11]

Martin and J

J. Martin and J. Crowley. An appearance based approach to gesture-recognition. In Proceedings of 9th Intl Conf on Image Analysis and Processing, Vol.2, pages 340–347, London, UK, 1997. Springer-Verlag

work page 1997
[12]

Moeslund, A

T. Moeslund, A. Hilton, and V. Kruger. A survey of advances in vision-based human motion capture and analysis.Computer Vision and Image Understanding, 104(2):90– 126, 2006

work page 2006
[13]

Ouhaddi and P

H. Ouhaddi and P. Horain. 3d hand gesture tracking by model registration. Available online at: citeseer.ist.psu.edu/article/ouhaddi99hand.html (accessed Novem- ber 2007)

work page 2007
[14]

R. Poppe. Vision based human motion analysis: an overview.Computer Vision and Image Understanding, 108(1-2):4–18, 2007

work page 2007
[15]

Sturman, D

D. Sturman, D. Zeltzer, and P. Medialab. A survey of glove-based input.Computer Graphics and Applications, IEEE, 14(1):30–39, 1994

work page 1994
[16]

Utsumi, T

A. Utsumi, T. Miyasato, F. Kishino, and R. Nakatsu. Hand gesture recognition system using multiple cameras. InProceedings of Intl Conf on Pattern Recognition, Vol.1, page 667, Washington, DC, USA, 1996. IEEE CPS

work page 1996
[17]

Wu and T

Y. Wu and T. Huang. Vision based gesture recognition: a review.Lecture Notes in Computer Science, 1739:103+, 1999

work page 1999
[18]

Zhao and F

H.-X. Zhao and F. Magoulès. A new parallel implementation of SVM on multi-core systems. In Y. Li, editor,Proceedings of Intl Conf on Modeling, Simulation and Control (ICMSC 2010), Cairo, Egypt, 2-4 Nov. 2010. ISBN/ISSN: 978-1-4244-8823-0, 2010

work page 2010
[19]

Zhao and F

H.-X. Zhao and F. Magoulès. Parallel support vector machines applied to the prediction of multiple buildings energy consumption.Journal of Algorithms and Computational Technology, 4(2):231–250, 2010

work page 2010
[20]

Zhao and F

H.-X. Zhao and F. Magoulès. Feature selection for support vector regression in the application of building energy prediction. In Proceedings of 9th IEEE Intl Symp on Applied Machine Intelligence and Informatics (SAMI 2011), Smolenice, Slovakia, 27- 29 Jan. 2011. IEEE CPS, 2011

work page 2011
[21]

Zhao and F

H.-X. Zhao and F. Magoulès. New parallel support vector regression for predicting building energy consumption. InProceedings of IEEE Symp Series on Computational Intelligence in Multicriteria Decision Making, Paris, France, April 11–15, 2011. IEEE CPS, 2011

work page 2011
[22]

Zhao and F

H.-X. Zhao and F. Magoulès. Parallel support vector machines on multi-core and multiprocessor systems. In R. Fox, editor, Proceedings of 11th Intl Conference on Artiﬁcial Intelligence and Applications (AIA 2011), Innsbruck, Austria, February 14– 16, 2011. IASTED, 2011. 7

work page 2011
[23]

Zhao and F

H.-X. Zhao and F. Magoulès. Feature selection for predicting building energy consump- tion based on statistical learning method.Journal of Algorithms and Computational Technology, 6(1):59–78, 2012

work page 2012
[24]

Zhao and F

H.-X. Zhao and F. Magoulès. A review on the prediction of building energy consump- tion. Renewable and Sustainable Energy Reviews, 16(6):3586–3592, 2012. 8

work page 2012

[1] [1]

Trackingandrecognisinghandgestures using statistical shape models

T.Ahmad, C.Taylor, A.Lanitis, andT.Cootes. Trackingandrecognisinghandgestures using statistical shape models. InProceedings of 6th British Conf on Machine vision, Vol.2, pages 403–412, Surrey, UK, 1995. BMVA Press

work page 1995

[2] [2]

Cipolla and A

R. Cipolla and A. Pentland.Computer vision for human machine interaction. Cam- bridge University Press, 1998

work page 1998

[3] [3]

A. Dix, J. Finlay, G. Abowd, and R. Beale.Human computer interaction. Pearson Prentice Hall, 2004

work page 2004

[4] [4]

Gianni and P

F. Gianni and P. Dalle. Interaction visuo-gestuelle avec un mur d’images. In Pro- ceedings of 2nd International Society for Gesture Studies: Interacting Bodies / Corps en interaction , Lyon, 15-18 Jun. 2005. Ecole Normale Supérieure Lettres et Sciences Humaines, juin 2005

work page 2005

[5] [5]

Joseph and J

J. Joseph and J. LaViola. A survey of hand posture and gesture recognition techniques and technology. Technical Report CS-99-11, 1999. Brown University Providence, RI, USA

work page 1999

[6] [6]

Kjeldsen, A

R. Kjeldsen, A. Levas, and C. Pinhanez. Dynamically reconﬁgurable vision-based user interfaces. Mach. Vision Appl., 16(1):6–12, 2004

work page 2004

[7] [7]

F. Lai, F. Magoulès, and F. Lherminier. Vapnik’s learning theory applied to energy con- sumption forecasts in residential buildings.International Journal of Computer Mathe- matics, 85(10):1563–1588, 2008

work page 2008

[8] [8]

Lenmann, L

S. Lenmann, L. Bretzner, and B. Thuresson. Computer vision based hand gesture inter- faces for human computer interaction. Technical report, Royal Institute of Technology of Sweden, 2002. 6

work page 2002

[9] [9]

Magoulès, M

F. Magoulès, M. Piliougine, and D. Elizondo. Support vector regression for electric- ity consumption prediction in a building in japan. InProceedings of IEEE Intl Conf on Computational Science and Engineering (CSE) and IEEE Intl Conf on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symp on Distributed Computing and Applications for Business Engine...

work page 2016

[10] [10]

Magoulès, H.-X

F. Magoulès, H.-X. Zhao, and D. Elizondo. Development of an RDP neural network for building energy consumption fault detection diagnosis.Energy and Buildings, 62:133– 138, 2013

work page 2013

[11] [11]

Martin and J

J. Martin and J. Crowley. An appearance based approach to gesture-recognition. In Proceedings of 9th Intl Conf on Image Analysis and Processing, Vol.2, pages 340–347, London, UK, 1997. Springer-Verlag

work page 1997

[12] [12]

Moeslund, A

T. Moeslund, A. Hilton, and V. Kruger. A survey of advances in vision-based human motion capture and analysis.Computer Vision and Image Understanding, 104(2):90– 126, 2006

work page 2006

[13] [13]

Ouhaddi and P

H. Ouhaddi and P. Horain. 3d hand gesture tracking by model registration. Available online at: citeseer.ist.psu.edu/article/ouhaddi99hand.html (accessed Novem- ber 2007)

work page 2007

[14] [14]

R. Poppe. Vision based human motion analysis: an overview.Computer Vision and Image Understanding, 108(1-2):4–18, 2007

work page 2007

[15] [15]

Sturman, D

D. Sturman, D. Zeltzer, and P. Medialab. A survey of glove-based input.Computer Graphics and Applications, IEEE, 14(1):30–39, 1994

work page 1994

[16] [16]

Utsumi, T

A. Utsumi, T. Miyasato, F. Kishino, and R. Nakatsu. Hand gesture recognition system using multiple cameras. InProceedings of Intl Conf on Pattern Recognition, Vol.1, page 667, Washington, DC, USA, 1996. IEEE CPS

work page 1996

[17] [17]

Wu and T

Y. Wu and T. Huang. Vision based gesture recognition: a review.Lecture Notes in Computer Science, 1739:103+, 1999

work page 1999

[18] [18]

Zhao and F

H.-X. Zhao and F. Magoulès. A new parallel implementation of SVM on multi-core systems. In Y. Li, editor,Proceedings of Intl Conf on Modeling, Simulation and Control (ICMSC 2010), Cairo, Egypt, 2-4 Nov. 2010. ISBN/ISSN: 978-1-4244-8823-0, 2010

work page 2010

[19] [19]

Zhao and F

H.-X. Zhao and F. Magoulès. Parallel support vector machines applied to the prediction of multiple buildings energy consumption.Journal of Algorithms and Computational Technology, 4(2):231–250, 2010

work page 2010

[20] [20]

Zhao and F

H.-X. Zhao and F. Magoulès. Feature selection for support vector regression in the application of building energy prediction. In Proceedings of 9th IEEE Intl Symp on Applied Machine Intelligence and Informatics (SAMI 2011), Smolenice, Slovakia, 27- 29 Jan. 2011. IEEE CPS, 2011

work page 2011

[21] [21]

Zhao and F

H.-X. Zhao and F. Magoulès. New parallel support vector regression for predicting building energy consumption. InProceedings of IEEE Symp Series on Computational Intelligence in Multicriteria Decision Making, Paris, France, April 11–15, 2011. IEEE CPS, 2011

work page 2011

[22] [22]

Zhao and F

H.-X. Zhao and F. Magoulès. Parallel support vector machines on multi-core and multiprocessor systems. In R. Fox, editor, Proceedings of 11th Intl Conference on Artiﬁcial Intelligence and Applications (AIA 2011), Innsbruck, Austria, February 14– 16, 2011. IASTED, 2011. 7

work page 2011

[23] [23]

Zhao and F

H.-X. Zhao and F. Magoulès. Feature selection for predicting building energy consump- tion based on statistical learning method.Journal of Algorithms and Computational Technology, 6(1):59–78, 2012

work page 2012

[24] [24]

Zhao and F

H.-X. Zhao and F. Magoulès. A review on the prediction of building energy consump- tion. Renewable and Sustainable Energy Reviews, 16(6):3586–3592, 2012. 8

work page 2012