3D Temporal Analysis for Autism Spectrum Disorder Screening During Attention Tasks

Dena Al-Thani; Elizabeth B Varghese; Inam Qadir; Marwa Qaraqe

arxiv: 2606.04836 · v1 · pith:2WFQCHR2new · submitted 2026-06-03 · 💻 cs.CV

3D Temporal Analysis for Autism Spectrum Disorder Screening During Attention Tasks

Inam Qadir , Elizabeth B Varghese , Dena Al-Thani , Marwa Qaraqe This is my paper

Pith reviewed 2026-06-28 07:02 UTC · model grok-4.3

classification 💻 cs.CV

keywords autism spectrum disorder3D head posefacial expression analysisGRU classifiervirtual reality screeningtemporal classificationASD screeningmultimodal fusion

0 comments

The pith

GRU models classify autism spectrum disorder in school-age children at up to 84.6 percent accuracy using 3D head pose and facial features extracted from VR attention tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a 3D temporal analysis framework that extracts head pose parameters and facial expressions from video of children performing virtual reality tasks. It trains LSTM and GRU classifiers on these features from a sample of 39 participants aged 7-12 and shows that the 3D approach beats 2D baselines. The highest result comes from combining the two feature types after dimensionality reduction. The goal is to move beyond subjective assessments toward objective, automated screening that can catch missed ASD cases. This would allow earlier support for children's social, cognitive, and academic growth.

Core claim

A novel 3D temporal analysis framework built on DECA extracts comprehensive head pose parameters including translational components Tx, Ty, Tz and facial expressions independent of pose from video data of 39 participants during Virtual Reality-Continuous Performance Test tasks. GRU-based models on 3D head pose features reach 83.9 percent accuracy and on 3D facial features reach 81.4 percent accuracy, outperforming 2D baseline approaches by 10.7 percent and 7.5 percent respectively. Multimodal fusion of the 3D features with PCA-based dimensionality reduction achieves 84.6 percent accuracy and outperforms unimodal approaches, establishing a foundation for objective automated screening tools fo

What carries the argument

The 3D temporal analysis framework built on DECA that extracts pose-independent head pose parameters and facial expressions from video, then classifies them with GRU temporal models.

Load-bearing premise

The 39-participant sample collected during VR tasks produces 3D features that capture spatial displacement patterns characteristic of ASD behaviors in the broader school-age population.

What would settle it

Apply the identical DECA-based 3D feature extraction and GRU classification pipeline to an independent cohort of at least 100 new school-age children with independently confirmed ASD or typical development diagnoses and measure whether accuracy stays above 80 percent.

Figures

Figures reproduced from arXiv: 2606.04836 by Dena Al-Thani, Elizabeth B Varghese, Inam Qadir, Marwa Qaraqe.

**Figure 2.** Figure 2: Overview of the feature extraction component. Video frames undergo preprocessing and encoding via ResNet [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: RNN-based temporal modeling architecture for [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Testing environment with two monitors- one [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Comprehensive performance comparison between 2D and 3D features for (a) GRU and (b) LSTM architectures [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Accurate Autism Spectrum Disorder (ASD) screening for school-age children is crucial to identify cases that may have been missed earlier and to enable timely interventions supporting social, cognitive, and academic development. Current ASD screening relies on subjective assessments and 2D analysis methods that fail to capture spatial displacement patterns characteristic of ASD behaviors. In this study, a novel 3D temporal analysis framework is presented, built on top of DECA (Detailed Expression Capture and Animation), a 3D modeling framework, to extract comprehensive head pose parameters (including translational components $T_x, T_y, T_z$) and facial expressions independent of pose variations. LSTM and GRU-based temporal classifiers were trained on the extracted 3D features from video data collected from 39 participants (19 ASD, 20 TD) aged 7-12 years during Virtual Reality-Continuous Performance Test tasks. The GRU-based models demonstrated superior performance, with 3D head pose features achieving 83.9\% accuracy and 3D facial features reaching 81.4\% accuracy, outperforming 2D baseline approaches by 10.7\% and 7.5\%, respectively. Furthermore, multimodal fusion of 3D head pose and facial features with PCA-based dimensionality reduction achieved the highest accuracy of 84.6\%, outperforming unimodal approaches. This work establishes a foundation for objective, automated screening tools addressing current diagnostic limitations in ASD identification for school-age populations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

3D DECA features lift accuracy 7-10% over 2D baselines on this VR-CPT task, but the n=39 sample and absent validation details leave the gains provisional.

read the letter

The paper extracts 3D head pose (including Tx, Ty, Tz) and expression parameters via DECA from videos of 39 children aged 7-12 doing VR continuous performance tests, then trains GRU and LSTM models on those sequences. GRU head-pose features hit 83.9% accuracy, facial features 81.4%, and the PCA-fused multimodal version 84.6%, beating the 2D baselines they report.

What stands out is the direct application of an existing 3D reconstruction tool to this screening setting. Including translational pose components and testing fusion is a reasonable step beyond pure 2D analysis, and the concrete accuracy deltas give a usable comparison point.

The soft spot is the cohort. With only 19 ASD and 20 TD participants, temporal classifiers on high-dimensional sequences are easy to overfit. The abstract supplies no information on cross-validation procedure, whether splits were subject-independent, or any statistical testing, so it is unclear whether the reported lift reflects genuine 3D signal or dataset-specific motion patterns. That is the main uncertainty.

No circular claims or invented quantities appear; the results are presented as empirical outcomes on the collected data. The work stays within standard practice for this type of study.

This is for researchers working on vision-based tools for developmental screening or VR-based assessment. It offers a concrete incremental step from 2D methods but needs tighter validation to move beyond a pilot result.

I would send it for peer review so the methods section can be examined in full and the validation choices clarified.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a 3D temporal analysis framework that uses DECA to extract head-pose parameters (Tx, Ty, Tz plus rotations) and facial-expression coefficients from VR-Continuous Performance Test videos of 39 school-age children (19 ASD, 20 TD). LSTM/GRU classifiers trained on these features are reported to reach 83.9% accuracy (head pose), 81.4% (facial), and 84.6% (multimodal + PCA), outperforming 2D baselines by 10.7% and 7.5%.

Significance. If the performance numbers survive subject-independent validation, the approach would supply an objective, pose-independent screening signal that 2D video methods miss; the VR-CPT protocol and DECA pipeline are sensible choices for capturing spatial displacement patterns. The small cohort nevertheless caps the strength of any generalizability claim.

major comments (2)

[Methods] Methods (classifier training subsection): no description is given of the cross-validation procedure. With n=39 and high-dimensional temporal sequences, any non-subject-wise split (frame- or session-level) risks identity leakage; the reported 83.9–84.6% accuracies and the 7.5–10.7% gains over 2D baselines cannot be attributed to the 3D representation until leave-one-subject-out or stratified subject-independent CV is demonstrated.
[Results] Results: neither statistical significance tests, confidence intervals, nor error bars are reported for the accuracy figures, and the exact implementation of the 2D baselines (feature extraction, temporal modeling, hyper-parameters) is not detailed, preventing assessment of whether the claimed improvements are robust.

minor comments (1)

[Abstract] Abstract and participant description: only total n and ASD/TD split are stated; gender distribution, mean age, or any cognitive/IQ matching information is absent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the manuscript to incorporate clarifications and additional details where appropriate.

read point-by-point responses

Referee: [Methods] Methods (classifier training subsection): no description is given of the cross-validation procedure. With n=39 and high-dimensional temporal sequences, any non-subject-wise split (frame- or session-level) risks identity leakage; the reported 83.9–84.6% accuracies and the 7.5–10.7% gains over 2D baselines cannot be attributed to the 3D representation until leave-one-subject-out or stratified subject-independent CV is demonstrated.

Authors: We agree that an explicit description of the cross-validation procedure is necessary given the small sample size. Our experiments employed leave-one-subject-out (LOSO) cross-validation, with each subject's entire temporal sequence held out as the test set in turn, to ensure subject-independent evaluation and avoid identity leakage. We will add a dedicated paragraph in the Methods (classifier training subsection) detailing this procedure, including sequence handling, number of folds, and how multimodal fusion was performed under LOSO. revision: yes
Referee: [Results] Results: neither statistical significance tests, confidence intervals, nor error bars are reported for the accuracy figures, and the exact implementation of the 2D baselines (feature extraction, temporal modeling, hyper-parameters) is not detailed, preventing assessment of whether the claimed improvements are robust.

Authors: We acknowledge that statistical rigor and baseline transparency are required to substantiate the reported gains. In the revision we will add (i) 95% confidence intervals and error bars computed via bootstrap resampling across LOSO folds, (ii) paired statistical tests (McNemar’s test for accuracy differences) with p-values, and (iii) an expanded description of the 2D baselines that specifies the exact 2D feature extractors, temporal model architectures, hyper-parameter search ranges, and training protocols used for the comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical ML pipeline is self-contained

full rationale

The paper describes a standard supervised classification pipeline: DECA-based 3D feature extraction from VR-CPT videos of 39 children, followed by training and evaluation of LSTM/GRU models on head-pose and expression features, with reported accuracies and comparisons to 2D baselines. No equations, parameter-fitting steps, or claims reduce by construction to their own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in the abstract or described methods. The performance numbers are direct outputs of cross-validation or hold-out evaluation on the collected data, not renamed fits or self-defined quantities. This is the normal, non-circular case for an empirical computer-vision study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of the upstream DECA 3D reconstruction in this specific recording setup and on the assumption that the small collected cohort yields representative behavioral features. No free parameters are explicitly named in the abstract, and no new physical entities are introduced.

axioms (1)

domain assumption DECA framework extracts accurate 3D head pose parameters (Tx, Ty, Tz) and facial expressions independent of pose variations from the recorded videos.
The framework is used without additional validation or error analysis mentioned in the abstract.

pith-pipeline@v0.9.1-grok · 5800 in / 1575 out tokens · 31256 ms · 2026-06-28T07:02:12.569133+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 2 canonical work pages · 1 internal anchor

[1]

A. P. Association et al. Quick reference to the diagnostic criteria from DSM-IV-TR. APA Washington, DC, 2000

2000
[2]

Asteriadis, K

S. Asteriadis, K. Karpouzis, and S. Kollias. The importance of eye gaze and head pose to estimating levels of attention. In 2011 Third International Conference on Games and Virtual Worlds for Serious Applications, pages 186–191. IEEE, 2011

2011
[3]

Banire, D

B. Banire, D. Al Thani, M. Qaraqe, and B. Mansoor. Face- based attention recognition model for children with autism spectrum disorder. Journal of Healthcare Informatics Re- search, 5:420–445, 2021

2021
[4]

T. D. Barry, R. Sturner, K. Seymour, B. Howard, L. McGoron, P. Bergmann, R. Kent, C. Sullivan, T. S. Tomeny, J. S. Pierce, et al. School-based screening to identify children at risk for attention-deficit/hyperactivity disorder: barriers and implications. Children’s Health Care, 45(3):241–265, 2016

2016
[5]

J. A. Brian, L. Zwaigenbaum, and A. Ip. Standards of diag- nostic assessment for autism spectrum disorder. Paediatrics & child health, 24(7):444–451, 2019

2019
[6]

Bulat and G

A. Bulat and G. Tzimiropoulos. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE international conference on computer vision, pages 1021–1030, 2017

2017
[7]

Canedo and A

D. Canedo and A. J. Neves. Facial expression recognition using computer vision: A systematic review. Applied Sciences, 9(21):4678, 2019

2019
[8]

J. H. Cheong, E. Jolly, T. Xie, S. Byrne, M. Kenney, and L. J. Chang. Py-feat: Python facial expression analysis toolbox. Affective Science, 4(4):781–796, 2023

2023
[9]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Dawson, K

G. Dawson, K. Campbell, J. Hashemi, S. J. Lippmann, V. Smith, K. Carpenter, H. Egger, S. Espinosa, S. Vermeer, J. Baker, et al. Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Scientific reports, 8(1):17008, 2018

2018
[11]

Ehlers, C

S. Ehlers, C. Gillberg, and L. Wing. A screening questionnaire for asperger syndrome and other high-functioning autism spectrum disorders in school age children. Journal of autism and developmental disorders, 29:129–141, 1999

1999
[12]

Elangovan, N

G. Elangovan, N. J. Kumar, J. Shobana, M. Ramprasath, G. P. Joshi, and W. Cho. Fusion of transfer learning with nature-inspired dandelion algorithm for autism spectrum disorder detection and classification using facial features. Scientific Reports, 14(1):31104, 2024

2024
[13]

Y. Feng, H. Feng, M. J. Black, and T. Bolkart. Learning an animatable detailed 3D face model from in-the-wild images. volume 40, 2021

2021
[14]

P. A. Filipek, P. J. Accardo, G. T. Baranek, E. H. Cook, G. Dawson, B. Gordon, J. S. Gravel, C. P. Johnson, R. J. Kallen, S. E. Levy, et al. The screening and diagnosis of autistic spectrum disorders1. Autism, pages 11–56, 2013

2013
[15]

Gokmen, E

M. Gokmen, E. Sariyanidi, L. Yankowitz, C. J. Zampella, R. T. Schultz, and B. Tunc. Detecting autism from head movements using kinesics. In Proceedings of the 26th International Conference on Multimodal Interaction, pages 350–354, 2024

2024
[16]

A. Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012

2012
[17]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016
[18]

Johnson and R

A. Johnson and R. W. Proctor. Attention: Theory and practice. Sage, 2004

2004
[19]

Kamp-Becker, K

I. Kamp-Becker, K. Albertowski, J. Becker, M. Ghahre- man, A. Langmann, T. Mingebach, L. Poustka, L. Weber, H. Schmidt, J. Smidt, et al. Diagnostic accuracy of the ados and ados-2 in clinical practice. European child & adolescent psychiatry, 27:1193–1207, 2018

2018
[20]

Kanwal, K

A. Kanwal, K. Javed, S. Ali, S. Rubab, M. A. Khan, A. Alasiry, M. Marzougui, and M. Shabaz. A hybrid framework for detection of autism using convnext-t and embedding clusters. The Journal of Supercomputing, 80(6):8156–8178, 2024

2024
[21]

S. R. Leekam, M. R. Prior, and M. Uljarevic. Restricted and repetitive behaviors in autism spectrum disorders: a review of research in the last decade. Psychological bulletin, 137(4):562, 2011

2011
[22]

J. Li, Z. Chen, G. Li, G. Ouyang, and X. Li. Automatic classification of asd children using appearance-based features from videos. Neurocomputing, 470:40–50, 2022

2022
[23]

T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero. Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017

2017
[24]

T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017

2017
[25]

Lockwood Estrin, V

G. Lockwood Estrin, V. Milner, D. Spain, F. Happé, and E. Colvert. Barriers to autism spectrum disorder diagnosis for young women and girls: A systematic review. Review journal of autism and developmental disorders, 8(4):454–470, 2021

2021
[26]

C. Lord, M. Elsabbagh, G. Baird, and J. Veenstra- Vanderweele. Autism spectrum disorder. The lancet, 392(10146):508–520, 2018

2018
[27]

C. Lord, M. Rutter, and A. Le Couteur. Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive devel- opmental disorders. Journal of autism and developmental disorders, 24(5):659–685, 1994

1994
[28]

Lu and M

A. Lu and M. Perkowski. Deep learning approach for screening autism spectrum disorder in children with facial images and analysis of ethnoracial factors in model development and application. Brain Sciences, 11(11):1446, 2021

2021
[29]

Maćkiewicz and W

A. Maćkiewicz and W. Ratajczak. Principal components analysis (pca). Computers & Geosciences, 19(3):303–342, 1993

1993
[30]

M. J. Maenner, Z. Warren, A. R. Williams, et al. Preva- lence and characteristics of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, united states, 2020. MMWR Surveillance Summaries, 72(2):1–14, 2023

2020
[31]

K. B. Martin, Z. Hammal, G. Ren, J. F. Cohn, J. Cassell, M. Ogihara, J. C. Britton, A. Gutierrez, and D. S. Messinger. Objective measurement of head movement differences in chil- dren with and without autism spectrum disorder. Molecular autism, 9(1):14, 2018

2018
[32]

I. J. Oosterling, M. Wensing, S. H. Swinkels, R. J. Van Der Gaag, J. C. Visser, T. Woudenberg, R. Minderaa, M.- P. Steenhuis, and J. K. Buitelaar. Advancing early detection of autism spectrum disorder by applying an integrated two- stage screening approach. Journal of Child Psychology and Psychiatry, 51(3):250–258, 2010

2010
[33]

Qadir, M

I. Qadir, M. A. Iqbal, S. Ashraf, and S. Akram. A fusion of cnn and sift for multicultural facial expression recognition. Mul- timedia Tools and Applications, 84(28):33505–33523, 2025

2025
[34]

D. L. Robins, D. Fein, M. L. Barton, and J. A. Green. The modified checklist for autism in toddlers: an initial study investigating the early detection of autism and pervasive developmental disorders. Journal of autism and developmental disorders, 31:131–144, 2001

2001
[35]

Sariyanidi, L

E. Sariyanidi, L. Yankowitz, R. T. Schultz, J. D. Herrington, B. Tunc, and J. Cohn. Beyond facs: Data-driven facial expression dictionaries, with application to predicting autism. arXiv preprint arXiv:2505.24679, 2025

work page arXiv 2025
[36]

Schopler, R

E. Schopler, R. J. Reichler, and B. R. Renner. The childhood autism rating scale (CARS). Western Psychological Services Los Angeles, CA, 2010

2010
[37]

R. C. Sheldrick, M. P. Maye, and A. S. Carter. Age at first identification of autism spectrum disorder: an analysis of two us surveys. Journal of the American Academy of Child & Adolescent Psychiatry, 56(4):313–320, 2017

2017
[38]

Simeoli, A

R. Simeoli, A. Rega, M. Cerasuolo, R. Nappo, and D. Marocco. Using machine learning for motion analysis to early detect autism spectrum disorder: A systematic review. Review Journal of Autism and Developmental Disorders, pages 1–20, 2024

2024
[39]

C. Song, J. Li, and G. Ouyang. Early diagnosis of asd based on facial expression recognition and head pose estimation. In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 1248–1253. IEEE, 2022

2022
[40]

E. B. Varghese, M. Qaraqe, and D. Al-Thani. Attention level evaluation in children with autism: leveraging head pose and gaze parameters from videos for educational intervention. IEEE Transactions on Learning Technologies, 17:1737–1753, 2024

2024
[41]

A. T. Wieckowski, T. Hamner, S. Nanovic, K. S. Porto, K. L. Coulter, S. Y. Eldeeb, C.-M. A. Chen, D. A. Fein, M. L. Barton, L. B. Adamson, et al. Early and repeated screening detects autism spectrum disorder. The Journal of pediatrics, 234:227–235, 2021

2021
[42]

Wing and J

L. Wing and J. Gould. Severe impairments of social interaction and associated abnormalities in children: Epidemiology and classification. Journal of autism and developmental disorders, 9(1):11–29, 1979

1979

[1] [1]

A. P. Association et al. Quick reference to the diagnostic criteria from DSM-IV-TR. APA Washington, DC, 2000

2000

[2] [2]

Asteriadis, K

S. Asteriadis, K. Karpouzis, and S. Kollias. The importance of eye gaze and head pose to estimating levels of attention. In 2011 Third International Conference on Games and Virtual Worlds for Serious Applications, pages 186–191. IEEE, 2011

2011

[3] [3]

Banire, D

B. Banire, D. Al Thani, M. Qaraqe, and B. Mansoor. Face- based attention recognition model for children with autism spectrum disorder. Journal of Healthcare Informatics Re- search, 5:420–445, 2021

2021

[4] [4]

T. D. Barry, R. Sturner, K. Seymour, B. Howard, L. McGoron, P. Bergmann, R. Kent, C. Sullivan, T. S. Tomeny, J. S. Pierce, et al. School-based screening to identify children at risk for attention-deficit/hyperactivity disorder: barriers and implications. Children’s Health Care, 45(3):241–265, 2016

2016

[5] [5]

J. A. Brian, L. Zwaigenbaum, and A. Ip. Standards of diag- nostic assessment for autism spectrum disorder. Paediatrics & child health, 24(7):444–451, 2019

2019

[6] [6]

Bulat and G

A. Bulat and G. Tzimiropoulos. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE international conference on computer vision, pages 1021–1030, 2017

2017

[7] [7]

Canedo and A

D. Canedo and A. J. Neves. Facial expression recognition using computer vision: A systematic review. Applied Sciences, 9(21):4678, 2019

2019

[8] [8]

J. H. Cheong, E. Jolly, T. Xie, S. Byrne, M. Kenney, and L. J. Chang. Py-feat: Python facial expression analysis toolbox. Affective Science, 4(4):781–796, 2023

2023

[9] [9]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

Dawson, K

G. Dawson, K. Campbell, J. Hashemi, S. J. Lippmann, V. Smith, K. Carpenter, H. Egger, S. Espinosa, S. Vermeer, J. Baker, et al. Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Scientific reports, 8(1):17008, 2018

2018

[11] [11]

Ehlers, C

S. Ehlers, C. Gillberg, and L. Wing. A screening questionnaire for asperger syndrome and other high-functioning autism spectrum disorders in school age children. Journal of autism and developmental disorders, 29:129–141, 1999

1999

[12] [12]

Elangovan, N

G. Elangovan, N. J. Kumar, J. Shobana, M. Ramprasath, G. P. Joshi, and W. Cho. Fusion of transfer learning with nature-inspired dandelion algorithm for autism spectrum disorder detection and classification using facial features. Scientific Reports, 14(1):31104, 2024

2024

[13] [13]

Y. Feng, H. Feng, M. J. Black, and T. Bolkart. Learning an animatable detailed 3D face model from in-the-wild images. volume 40, 2021

2021

[14] [14]

P. A. Filipek, P. J. Accardo, G. T. Baranek, E. H. Cook, G. Dawson, B. Gordon, J. S. Gravel, C. P. Johnson, R. J. Kallen, S. E. Levy, et al. The screening and diagnosis of autistic spectrum disorders1. Autism, pages 11–56, 2013

2013

[15] [15]

Gokmen, E

M. Gokmen, E. Sariyanidi, L. Yankowitz, C. J. Zampella, R. T. Schultz, and B. Tunc. Detecting autism from head movements using kinesics. In Proceedings of the 26th International Conference on Multimodal Interaction, pages 350–354, 2024

2024

[16] [16]

A. Graves. Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012

2012

[17] [17]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016

[18] [18]

Johnson and R

A. Johnson and R. W. Proctor. Attention: Theory and practice. Sage, 2004

2004

[19] [19]

Kamp-Becker, K

I. Kamp-Becker, K. Albertowski, J. Becker, M. Ghahre- man, A. Langmann, T. Mingebach, L. Poustka, L. Weber, H. Schmidt, J. Smidt, et al. Diagnostic accuracy of the ados and ados-2 in clinical practice. European child & adolescent psychiatry, 27:1193–1207, 2018

2018

[20] [20]

Kanwal, K

A. Kanwal, K. Javed, S. Ali, S. Rubab, M. A. Khan, A. Alasiry, M. Marzougui, and M. Shabaz. A hybrid framework for detection of autism using convnext-t and embedding clusters. The Journal of Supercomputing, 80(6):8156–8178, 2024

2024

[21] [21]

S. R. Leekam, M. R. Prior, and M. Uljarevic. Restricted and repetitive behaviors in autism spectrum disorders: a review of research in the last decade. Psychological bulletin, 137(4):562, 2011

2011

[22] [22]

J. Li, Z. Chen, G. Li, G. Ouyang, and X. Li. Automatic classification of asd children using appearance-based features from videos. Neurocomputing, 470:40–50, 2022

2022

[23] [23]

T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero. Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017

2017

[24] [24]

T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017

2017

[25] [25]

Lockwood Estrin, V

G. Lockwood Estrin, V. Milner, D. Spain, F. Happé, and E. Colvert. Barriers to autism spectrum disorder diagnosis for young women and girls: A systematic review. Review journal of autism and developmental disorders, 8(4):454–470, 2021

2021

[26] [26]

C. Lord, M. Elsabbagh, G. Baird, and J. Veenstra- Vanderweele. Autism spectrum disorder. The lancet, 392(10146):508–520, 2018

2018

[27] [27]

C. Lord, M. Rutter, and A. Le Couteur. Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive devel- opmental disorders. Journal of autism and developmental disorders, 24(5):659–685, 1994

1994

[28] [28]

Lu and M

A. Lu and M. Perkowski. Deep learning approach for screening autism spectrum disorder in children with facial images and analysis of ethnoracial factors in model development and application. Brain Sciences, 11(11):1446, 2021

2021

[29] [29]

Maćkiewicz and W

A. Maćkiewicz and W. Ratajczak. Principal components analysis (pca). Computers & Geosciences, 19(3):303–342, 1993

1993

[30] [30]

M. J. Maenner, Z. Warren, A. R. Williams, et al. Preva- lence and characteristics of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, united states, 2020. MMWR Surveillance Summaries, 72(2):1–14, 2023

2020

[31] [31]

K. B. Martin, Z. Hammal, G. Ren, J. F. Cohn, J. Cassell, M. Ogihara, J. C. Britton, A. Gutierrez, and D. S. Messinger. Objective measurement of head movement differences in chil- dren with and without autism spectrum disorder. Molecular autism, 9(1):14, 2018

2018

[32] [32]

I. J. Oosterling, M. Wensing, S. H. Swinkels, R. J. Van Der Gaag, J. C. Visser, T. Woudenberg, R. Minderaa, M.- P. Steenhuis, and J. K. Buitelaar. Advancing early detection of autism spectrum disorder by applying an integrated two- stage screening approach. Journal of Child Psychology and Psychiatry, 51(3):250–258, 2010

2010

[33] [33]

Qadir, M

I. Qadir, M. A. Iqbal, S. Ashraf, and S. Akram. A fusion of cnn and sift for multicultural facial expression recognition. Mul- timedia Tools and Applications, 84(28):33505–33523, 2025

2025

[34] [34]

D. L. Robins, D. Fein, M. L. Barton, and J. A. Green. The modified checklist for autism in toddlers: an initial study investigating the early detection of autism and pervasive developmental disorders. Journal of autism and developmental disorders, 31:131–144, 2001

2001

[35] [35]

Sariyanidi, L

E. Sariyanidi, L. Yankowitz, R. T. Schultz, J. D. Herrington, B. Tunc, and J. Cohn. Beyond facs: Data-driven facial expression dictionaries, with application to predicting autism. arXiv preprint arXiv:2505.24679, 2025

work page arXiv 2025

[36] [36]

Schopler, R

E. Schopler, R. J. Reichler, and B. R. Renner. The childhood autism rating scale (CARS). Western Psychological Services Los Angeles, CA, 2010

2010

[37] [37]

R. C. Sheldrick, M. P. Maye, and A. S. Carter. Age at first identification of autism spectrum disorder: an analysis of two us surveys. Journal of the American Academy of Child & Adolescent Psychiatry, 56(4):313–320, 2017

2017

[38] [38]

Simeoli, A

R. Simeoli, A. Rega, M. Cerasuolo, R. Nappo, and D. Marocco. Using machine learning for motion analysis to early detect autism spectrum disorder: A systematic review. Review Journal of Autism and Developmental Disorders, pages 1–20, 2024

2024

[39] [39]

C. Song, J. Li, and G. Ouyang. Early diagnosis of asd based on facial expression recognition and head pose estimation. In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 1248–1253. IEEE, 2022

2022

[40] [40]

E. B. Varghese, M. Qaraqe, and D. Al-Thani. Attention level evaluation in children with autism: leveraging head pose and gaze parameters from videos for educational intervention. IEEE Transactions on Learning Technologies, 17:1737–1753, 2024

2024

[41] [41]

A. T. Wieckowski, T. Hamner, S. Nanovic, K. S. Porto, K. L. Coulter, S. Y. Eldeeb, C.-M. A. Chen, D. A. Fein, M. L. Barton, L. B. Adamson, et al. Early and repeated screening detects autism spectrum disorder. The Journal of pediatrics, 234:227–235, 2021

2021

[42] [42]

Wing and J

L. Wing and J. Gould. Severe impairments of social interaction and associated abnormalities in children: Epidemiology and classification. Journal of autism and developmental disorders, 9(1):11–29, 1979

1979