pith. sign in

arxiv: 2604.22479 · v1 · submitted 2026-04-24 · 💻 cs.CV · eess.IV

Improving Driver Drowsiness Detection via Personalized EAR/MAR Thresholds and CNN-Based Classification

Pith reviewed 2026-05-08 12:31 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords driver drowsinesspersonalized thresholdseye aspect ratiomouth aspect ratioCNN classificationyawning detectionreal-time monitoringfatigue detection
0
0 comments X

The pith

Personalized eye and mouth aspect ratio thresholds with CNN classification improve driver drowsiness detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Driver drowsiness detection systems typically use fixed thresholds for eye and mouth movements, but these do not account for differences between people or changing conditions. This paper tests a system that calibrates thresholds to each driver before starting and adds convolutional neural network models to classify when eyes are closed or when yawning occurs. The personalized approach raises accuracy by 2 to 3 percent, while the CNN components reach over 99 percent accuracy on eye detection and nearly 99 percent on yawning. If these gains hold in real driving, they could lead to more reliable alerts that help prevent fatigue-related accidents. The evaluation covers both existing datasets and new recordings with varied lighting and poses.

Core claim

The central claim is that a drowsiness detection system using personalized EAR and MAR thresholds calibrated pre-driving, combined with CNN-based classification for eye states and yawning, achieves higher accuracy than fixed threshold methods alone, specifically improving by 2-3% with personalization and reaching 99.1% for eyes and 98.8% for yawning on diverse datasets.

What carries the argument

Personalized calibration of Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR) thresholds for individual drivers, augmented by CNN models that classify eye openness and yawning behavior.

If this is right

  • Real-time monitoring can issue warnings based on combined signals from eyelids, head position, and yawning.
  • The hybrid method handles variations in illumination and head poses better than metrics alone.
  • Testing on public datasets plus custom ones under different conditions validates the improvements.
  • Such systems may support continuous driver monitoring in vehicles to reduce accident risks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Calibration might need periodic updates if fatigue alters facial features over long drives.
  • High CNN accuracy opens the possibility of running these models efficiently on mobile or embedded hardware in cars.
  • Similar personalization could apply to other biometric monitoring tasks like stress detection.

Load-bearing premise

That the pre-driving personalized thresholds for EAR and MAR will continue to work accurately as illumination, head poses, and the driver's fatigue level change during actual driving.

What would settle it

Observing a significant drop in detection accuracy when testing the system on drivers or in conditions not included in the calibration or training data, such as extreme lighting changes or new facial structures.

Figures

Figures reproduced from arXiv: 2604.22479 by Eray Tonbul, G\"okdeniz Ersoy, Mehmet Alper Tatar, Serap K{\i}rb{\i}z.

Figure 2
Figure 2. Figure 2: The facial landmark points around the mouth used for view at source ↗
Figure 3
Figure 3. Figure 3: The flowchart of the proposed system classification method. Both approaches are implemented and evaluated under identical experimental conditions. A. Personalized Facial Landmark–Based Drowsiness Detec￾tion The proposed personalized method extracts 468 facial land￾marks using MediaPipe Face Mesh [11]. The facial landmarks around the eyes are used to calculate the EAR as in Eq. (1). The facial landmarks aro… view at source ↗
Figure 4
Figure 4. Figure 4: Examples of real-time detection outputs for (a) eye view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrices comparing general and personal view at source ↗
read the original abstract

Driver drowsiness is a major cause of traffic accidents worldwide, posing a serious threat to public safety. Vision-based driver monitoring systems often rely on fixed Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR) thresholds; however, such fixed values frequently fail to generalize across individuals due to variations in facial structure, illumination, and driving conditions. This paper proposes a personalized driver drowsiness detection system that monitors eyelid movements, head position, and yawning behavior in real time and provides warnings when signs of fatigue are detected. The system employs driver-specific EAR and MAR thresholds, calibrated before driving, to improve classical metric-based detection. In addition, deep learning-based Convolutional Neural Network (CNN) models are integrated to enhance accuracy in challenging scenarios. The system is evaluated using publicly available datasets as well as a custom dataset collected under diverse lighting conditions, head poses, and user characteristics. Experimental results show that personalized thresholding improves detection accuracy by 2-3% compared to fixed thresholds, while CNN-based classification achieves 99.1% accuracy for eye state detection and 98.8% for yawning detection, demonstrating the effectiveness of combining classical metrics with deep learning for robust real-time driver monitoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a hybrid driver drowsiness detection system that combines driver-specific EAR and MAR thresholds, calibrated once before driving, with CNN models for real-time eye-state and yawning classification. It evaluates the approach on public datasets plus a custom dataset collected under varied lighting, head poses, and user characteristics, claiming a 2-3% accuracy improvement from personalization over fixed thresholds together with 99.1% eye-state and 98.8% yawning detection accuracies.

Significance. If the reported gains hold under realistic longitudinal conditions, the work would offer a practical route to improving generalization of vision-based drowsiness monitors by accounting for inter-driver facial variation while retaining interpretable metrics alongside deep learning. The hybrid design is a reasonable engineering choice for real-time embedded deployment.

major comments (2)
  1. [Experimental evaluation] The evaluation (described after the method section) provides no temporal-split or longitudinal experiments in which EAR/MAR thresholds are calibrated on an initial segment of a continuous recording and then tested on later segments of the same session under changing illumination, head pose, or fatigue. Without such tests the 2-3% improvement claim cannot be shown to survive the very variations the abstract lists as motivation for personalization.
  2. [Experimental evaluation] The CNN accuracies (99.1% eye, 98.8% yawning) are reported without explicit confirmation that test drivers or sessions are fully disjoint from the data used to set personalized thresholds or to train the networks. This leaves open the possibility that reported figures partly reflect driver-specific overfitting rather than generalization.
minor comments (2)
  1. [Abstract and §3] The abstract and method description should state the exact public datasets used and the number of subjects/sessions in the custom dataset.
  2. [Figures] Figure captions for the CNN architecture and sample detections would benefit from explicit mention of input resolution and preprocessing steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the rigor of our experimental claims. We address each major point below and will incorporate revisions to provide clearer evidence of generalization.

read point-by-point responses
  1. Referee: The evaluation (described after the method section) provides no temporal-split or longitudinal experiments in which EAR/MAR thresholds are calibrated on an initial segment of a continuous recording and then tested on later segments of the same session under changing illumination, head pose, or fatigue. Without such tests the 2-3% improvement claim cannot be shown to survive the very variations the abstract lists as motivation for personalization.

    Authors: We agree this is a limitation in the current evaluation protocol. While our custom dataset includes varied conditions across sessions and we performed per-driver calibration separate from testing, we did not explicitly conduct or report temporal splits within continuous recordings. In the revised manuscript we will add longitudinal experiments that calibrate EAR/MAR thresholds on initial segments of each recording and evaluate on subsequent segments under changing illumination, head pose, and fatigue levels. This will directly test whether the reported 2-3% gain persists under the variations motivating personalization. revision: yes

  2. Referee: The CNN accuracies (99.1% eye, 98.8% yawning) are reported without explicit confirmation that test drivers or sessions are fully disjoint from the data used to set personalized thresholds or to train the networks. This leaves open the possibility that reported figures partly reflect driver-specific overfitting rather than generalization.

    Authors: We confirm that CNN training used driver-disjoint partitions on both public and custom datasets, and that personalized thresholds were derived from calibration data held out from the test sessions. However, the manuscript does not state this partitioning explicitly. We will revise the experimental evaluation section to detail the exact data splits, including driver/session disjointness for threshold calibration and model training/testing, thereby removing any ambiguity about potential overfitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical evaluation of proposed system

full rationale

The paper describes an empirical system for drowsiness detection that calibrates driver-specific EAR/MAR thresholds pre-driving and augments with CNN classifiers for eye and yawning states. Reported accuracy gains (2-3% over fixed thresholds) and CNN performance figures (99.1% eye, 98.8% yawning) are presented as outcomes of experiments on public datasets plus a custom collection under varied lighting, poses, and users. No derivation chain, equations, or first-principles steps are shown that reduce by construction to their own inputs; there are no self-citations invoked as load-bearing uniqueness theorems, no ansatzes smuggled via prior work, and no renaming of known patterns as novel organization. The evaluation is self-contained against external datasets rather than tautological, satisfying the criteria for a non-circular finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer vision assumptions for facial landmark detection and image classification, with personalization introduced as the main addition. Abstract provides no specific free parameter values or new entities.

free parameters (1)
  • personalized EAR and MAR thresholds
    Calibrated per driver before driving; no specific values or exact fitting procedure given in abstract
axioms (2)
  • domain assumption EAR and MAR metrics derived from facial landmarks reliably indicate drowsiness when thresholds are appropriately set
    Underlies the classical metric-based component of the system
  • domain assumption CNN models trained on facial image data can accurately classify eye openness and yawning states
    Basis for integrating deep learning to handle challenging scenarios

pith-pipeline@v0.9.0 · 5529 in / 1381 out tokens · 67580 ms · 2026-05-08T12:31:03.690552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    The effect of sleep disorders and fatigue on in-vehicle traffic accidents,

    S. Tekin and M. Seyit, “The effect of sleep disorders and fatigue on in-vehicle traffic accidents,”Namık Kemal Medical Journal, 2022

  2. [2]

    Real-time eye blink detection using facial landmarks,

    J. Cech and T. Soukupova, “Real-time eye blink detection using facial landmarks,”Cent. Mach. Perception, Dep. Cybern. Fac. Electr. Eng. Czech Tech. Univ. Prague, pp. 1–8, 2016

  3. [3]

    A cnn-lstm-based deep learning approach for driver drowsiness prediction,

    M. W. Gomaa, R. O. Mahmoud, and A. M. Sarhan, “A cnn-lstm-based deep learning approach for driver drowsiness prediction,”Journal of Engineering Research, vol. 6, no. 3, pp. 59–70, 2022

  4. [4]

    Improving facial emotion recognition through dataset merg- ing and balanced training strategies,

    S. Kırbız, “Improving facial emotion recognition through dataset merg- ing and balanced training strategies,”Journal of the Franklin Institute, vol. 362, no. 7, p. 107659, 2025

  5. [5]

    A review of recent develop- ments in driver drowsiness detection systems,

    Y . Albadawi, M. Takruri, and M. Awad, “A review of recent develop- ments in driver drowsiness detection systems,”Sensors, vol. 22, no. 5, p. 2069, 2022

  6. [6]

    Comparative analysis of vehicle-based and driver-based features for driver drowsiness monitoring by support vector machines,

    M. H. Baccour, F. Driewer, T. Sch ¨ack, and E. Kasneci, “Comparative analysis of vehicle-based and driver-based features for driver drowsiness monitoring by support vector machines,”IEEE transactions on intelli- gent transportation systems, vol. 23, no. 12, pp. 23 164–23 178, 2022

  7. [7]

    Real-time fatigue detection algorithms using machine learning for yawning and eye state,

    F. Makhmudov, D. Turimov, M. Xamidov, F. Nazarov, and Y .-I. Cho, “Real-time fatigue detection algorithms using machine learning for yawning and eye state,”Sensors, vol. 24, no. 23, p. 7810, 2024

  8. [8]

    Perclos-based technologies for detecting drowsiness: current evidence and future directions,

    T. Abe, “Perclos-based technologies for detecting drowsiness: current evidence and future directions,”Sleep Advances, vol. 4, no. 1, p. zpad006, 2023

  9. [9]

    Yawning detection using embedded smart cameras,

    M. Omidyeganeh, S. Shirmohammadi, S. Abtahi, A. Khurshid, M. Farhan, J. Scharcanski, B. Hariri, D. Laroche, and L. Martel, “Yawning detection using embedded smart cameras,”IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 3, pp. 570–582, 2016

  10. [10]

    Detecting driver drowsiness using hybrid facial features and ensemble learning,

    C. Xu, W. Huang, J. Liu, and L. Li, “Detecting driver drowsiness using hybrid facial features and ensemble learning,”Information, vol. 16, no. 4, p. 294, 2025

  11. [11]

    A comprehensive survey and evaluation of mediapipe face mesh for human emotion recognition,

    S. A. Jakhete and N. Kulkarni, “A comprehensive survey and evaluation of mediapipe face mesh for human emotion recognition,” in2024 8th International Conference on Computing, Communication, Control and Automation (ICCUBEA). IEEE, 2024, pp. 1–8

  12. [12]

    Mrl eye dataset,

    Akash Shingha, “Mrl eye dataset,” https://www.kaggle.com/datasets/ akashshingha850/mrl-eye-dataset, Kaggle, 2024, kaggle dataset, ac- cessed on 2026-01-15

  13. [13]

    Yawn dataset,

    David Vazquez CIC, “Yawn dataset,” https://www.kaggle.com/datasets/ davidvazquezcic/yawn-dataset/data, Kaggle, 2024, kaggle dataset, ac- cessed on 2026-01-15

  14. [14]

    A review on clahe based enhancement techniques,

    R. Sharma and A. Kamra, “A review on clahe based enhancement techniques,” in2023 6th International Conference on Contemporary Computing and Informatics (IC3I), vol. 6. IEEE, 2023, pp. 321–325

  15. [15]

    Cnn-based emotion recognition using data augmentation and preprocessing methods,

    B. Kayao ˘glu, T. Toktas ¸, and S. Kırbız, “Cnn-based emotion recognition using data augmentation and preprocessing methods,” in2023 Innova- tions in Intelligent Systems and Applications Conference (ASYU). IEEE, 2023, pp. 1–4