pith. sign in

arxiv: 2605.20103 · v1 · pith:TG4PY4BUnew · submitted 2026-05-19 · 🧬 q-bio.PE

Face morphometric profiles of groups as early markers for certain diseases?

Pith reviewed 2026-05-20 02:46 UTC · model grok-4.3

classification 🧬 q-bio.PE
keywords face morphometryAlzheimer diseasegenetic risk markersfacial landmarksclusteringpopulation studyearly detectionmultifactorial diseases
0
0 comments X

The pith

Facial landmark clusters from large photo sets can identify groups sharing genetic backgrounds that predispose them to late diseases like Alzheimer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper processes a dataset of 200000 photos of men to extract facial landmarks and compute distances between them. Clustering identifies groups with similar facial traits and tracks how their densities shift across age cohorts. Genes tied to facial development are noted as overlapping with those involved in Alzheimer's, supporting the idea that face shape reflects an individual's genetic makeup. If this holds, everyday facial measurements could act as a simple, early indicator of elevated risk for multifactorial diseases that appear late in life. Readers might care because it proposes a non-invasive route to spot vulnerabilities before clinical symptoms emerge.

Core claim

Clusters formed from distances between facial landmarks in a large population dataset reveal subgroups whose morphometric profiles express the genetic background against which late multifactorial diseases develop, allowing face morphometry to function as a risk marker for conditions such as Alzheimer.

What carries the argument

Clustering of inter-landmark distances extracted from facial photographs, which groups individuals by shared traits and computes density variations across age cohorts.

If this is right

  • Face morphometry profiles can be tracked across age cohorts to reveal population dynamics over time.
  • Overlap between genes for facial development and Alzheimer's supports using morphometry to capture relevant genetic backgrounds.
  • Such profiles could serve as early markers for certain late multifactorial diseases.
  • Densities of morphometric groups inside the overall population provide a quantitative view of subgroup prevalence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the clusters prove stable, repeated photo-based screening might allow population-level monitoring of shifting disease risks without blood tests.
  • The method could be tested on other multifactorial conditions that share genetic pathways with facial traits.
  • Extending the analysis to mixed-gender or non-Cuban datasets would check whether the observed patterns generalize.

Load-bearing premise

That clusters based on facial landmark distances meaningfully correspond to genetic subgroups predisposed to diseases like Alzheimer without direct clinical outcome data or incidence validation.

What would settle it

A follow-up study that measures actual Alzheimer incidence rates in the identified facial clusters versus the rest of the population and finds no significant difference.

Figures

Figures reproduced from arXiv: 2605.20103 by Augusto Gonzalez, Heydi Mendez-Vazquez, Joan Nieves, Roberto Herrero, Yoanna Martinez-Diaz.

Figure 1
Figure 1. Figure 1: Top: Principal Component Analysis of world cancer risk data. Ethnic and cultural groups exhibit distinctive patterns. Bottom: Component variances and directions of maximal increase in the risk for different cancers [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top: Face landmarks coming from the DLIB software. Bottom: The selected landmarks, which capture osseous structure and are relatively independent of face expression (green points) [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Reduced PCA of TCGA expression data for Glioblastoma. Only the 51 genes of Ref. [14] are used to conform the PCA matrix. Normal tissue (blue) and tumor data (red) are perfectly separated [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

Background: Face morphometry has been shown to work as a diagnosis tool in a set of syndromes. Face similarities are usually indications of more complete genetic similarities. Purpose: To show preliminary results on the face morphometry profile of the Cuban population and to argue that it could be used to define early markers for diseases, like Alzheimer. Methods: A dataset composed of photos of 200000 men is processed. Facial landmarks are extracted by means of the DLIB library and distances between them are computed. By clustering samples with similar facial traits, groups are formed and their densities inside the population are computed. Results: The face morphometry profiles for two age cohorts are obtained, showing the population dynamics. Genes involved in facial development are shown to be related to Alzheimer's disease. Conclusions: Late multifactorial diseases develop against the genetic background of each individual, which is expressed by its face morphometry. The latter can be thus considered a risk marker.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that face morphometric profiles derived from clustering facial landmark distances in a dataset of 200,000 photos of Cuban men can serve as early markers for diseases such as Alzheimer's. The approach involves DLIB landmark extraction, distance computation, clustering for similar traits, density calculations for age cohorts, and linking facial development genes to Alzheimer's based on existing knowledge, concluding that genetic background expressed in face morphometry allows it to be a risk marker.

Significance. If the morphometric clusters were shown to predict disease risk or correspond to relevant genetic subgroups, this could represent a novel, scalable method for early disease marker identification using readily available photographic data. The work highlights potential connections between facial genetics and multifactorial diseases but currently lacks the empirical support to establish this link.

major comments (3)
  1. The Results section claims that 'the face morphometry profiles for two age cohorts are obtained, showing the population dynamics' yet provides no quantitative data, cluster statistics, density values, or figures to support this. Without these, the preliminary results cannot be evaluated.
  2. Conclusions: The central claim that face morphometry 'can be thus considered a risk marker' for diseases like Alzheimer is not derived from the new dataset or clustering analysis. It relies on asserted prior associations between facial development genes and Alzheimer's without demonstrating that the formed groups capture predisposing genetic subgroups or correlate with disease incidence.
  3. The Methods section describes extracting landmarks with DLIB and computing distances but does not specify the clustering algorithm, parameters, validation procedures, or how the dataset of 200000 men was selected and processed, which are critical for assessing whether the clusters meaningfully relate to genetic backgrounds.
minor comments (2)
  1. The abstract states 'Genes involved in facial development are shown to be related to Alzheimer's disease' but this appears to reference external literature rather than results from the described analysis of the photo dataset.
  2. The manuscript would benefit from clearer notation for the computed distances and cluster densities, as well as inclusion of any error analysis or reproducibility details.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of this preliminary study. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: The Results section claims that 'the face morphometry profiles for two age cohorts are obtained, showing the population dynamics' yet provides no quantitative data, cluster statistics, density values, or figures to support this. Without these, the preliminary results cannot be evaluated.

    Authors: We agree that the Results section is currently descriptive and lacks the quantitative support needed for full evaluation. As this is framed as a preliminary report, the emphasis was on the overall approach, but we will add cluster statistics, density values, and figures (such as density plots across age cohorts) to the revised manuscript. revision: yes

  2. Referee: Conclusions: The central claim that face morphometry 'can be thus considered a risk marker' for diseases like Alzheimer is not derived from the new dataset or clustering analysis. It relies on asserted prior associations between facial development genes and Alzheimer's without demonstrating that the formed groups capture predisposing genetic subgroups or correlate with disease incidence.

    Authors: The manuscript uses the clustering results together with established literature on facial development genes to propose a potential link to multifactorial disease risk. We acknowledge that the current dataset provides no direct disease incidence or genetic validation. We will revise the Conclusions to present the risk-marker idea explicitly as a hypothesis requiring further empirical testing rather than a direct derivation from the clustering alone. revision: partial

  3. Referee: The Methods section describes extracting landmarks with DLIB and computing distances but does not specify the clustering algorithm, parameters, validation procedures, or how the dataset of 200000 men was selected and processed, which are critical for assessing whether the clusters meaningfully relate to genetic backgrounds.

    Authors: We agree that additional methodological detail is required for reproducibility and assessment. The revised Methods section will specify the clustering algorithm, chosen parameters (including number of clusters and distance metric), validation procedures, and the criteria used for selecting and preprocessing the 200,000 photographs. revision: yes

standing simulated objections not resolved
  • The photographic dataset contains no associated disease incidence or genetic information, so direct empirical demonstration that the morphometric clusters correspond to predisposing genetic subgroups or predict disease risk is outside the scope of the present study.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from dataset processing

full rationale

The paper's core pipeline extracts DLIB landmarks from 200000 photos, computes inter-landmark distances, performs clustering on similar traits, and reports group densities for two age cohorts to obtain morphometric profiles. These steps produce the reported population dynamics directly from the input data without any fitted parameters renamed as predictions or equations that reduce to the inputs by construction. The link to Alzheimer's via facial-development genes is presented as established prior knowledge in the results and conclusions rather than derived from the clusters themselves. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing elements in the provided text. The marker interpretation is an interpretive extension, not a circular reduction of the morphometric results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the untested premise that facial morphometric similarity directly indexes genetic disease predisposition; this premise is not derived from the clustering results but imported from prior gene studies. No free parameters or invented entities are explicitly introduced in the abstract, but the clustering step implicitly requires choices whose justification is absent.

axioms (2)
  • domain assumption Facial landmarks detected by DLIB accurately capture biologically meaningful morphometric traits
    Invoked in the methods description of landmark extraction and distance computation.
  • domain assumption Clusters of similar facial measurements correspond to genetically distinct subgroups relevant to disease risk
    Required for the leap from population profiles to risk-marker status in the conclusions.

pith-pipeline@v0.9.0 · 5703 in / 1319 out tokens · 57421 ms · 2026-05-20T02:46:11.171961+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Weight and height significance related to age

    Corredera-Guerra R.F., Balado -Sansón R.M., Sardiñas -Arce M.E., Montesinos -Estévez T.C, Gómez -Padrón E.I. Weight and height significance related to age. Study carried out in school children from Cerro municipality. Rev. Cubana Med. Gen. Integr 2009; 25(3)

  2. [2]

    Hemoglobin research and the origins of molecular medicine

    Schechter AN. Hemoglobin research and the origins of molecular medicine. Blood, The Journal of the American Society of Hematology 2008; 112(10):3927-38

  3. [3]

    Integration of genetic and epigenetic markers for risk stratification: opportunities and challenges

    Pashayan N, Reisel D, Widschwendter M. Integration of genetic and epigenetic markers for risk stratification: opportunities and challenges. Personalized medicine 2016; 13(2):93 -95

  4. [4]

    The inheritance of fingerprint patterns

    Slatis HM, Katznelson MB, Bonne -Tamir B. The inheritance of fingerprint patterns. American journal of human genetics 1976; 28(3):280

  5. [5]

    Diagnostically relevant facial gestalt information from ordinary photos

    Ferry Q, Steinberg J, Webber C, FitzPatrick DR, Ponting CP, Zisserman A, Nellåker C. Diagnostically relevant facial gestalt information from ordinary photos. Elife 2014; 3:e02020

  6. [6]

    Leveraging multi -ethnic evidence for risk assessment of quantitative traits in minority populations

    Coram MA, Fang H, Candille SI, Assimes TL, Tang H. Leveraging multi -ethnic evidence for risk assessment of quantitative traits in minority populations. The American Journal of Human Genetics 2017;101(2):218-26

  7. [7]

    Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention

    Tomasetti C, Li L, V ogelstein B. Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science 2017; 355(6331):1330-4

  8. [8]

    & Altman, N

    Lever, J., Krzywinski, M. & Altman, N. Principal component analysis. Nat Methods 2017; 14, 641–642

  9. [9]

    Dlib-ml: A machine learning toolkit

    King DE. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 2009; 10:1755-8

  10. [10]

    Prevalence of Alzheimer′ s disease in rural and urban areas in Cuba and factors influencing on its occurrence : epidemiological cross -sectional protocol

    Ricardo YL, Zamora MC, Hernández JP, Martínez CR. Prevalence of Alzheimer′ s disease in rural and urban areas in Cuba and factors influencing on its occurrence : epidemiological cross -sectional protocol. BMJ open 2022; 12(11):e052704

  11. [11]

    Shared heritability of human face and brain shape

    Naqvi S, Sleyp Y , Hoskens H, Indencleef K, Spence JP, Bruffaerts R, Radwan A, Eller RJ, Richmond S, Shriver MD, Shaffer JR. Shared heritability of human face and brain shape. Nature Genetics 2021; 53(6):830-9

  12. [12]

    Alzheimer's Dement 2023; 19: 1598-1695

    2023 Alzheimer's disease facts and figures. Alzheimer's Dement 2023; 19: 1598-1695

  13. [13]

    Infectious origin of Alzheimer’s disease: Amyloid beta as a component of brain antimicrobial immunity

    V ojtechova I, Machacek T, Kristofikova Z, Stuchlik A, Petrasek T. Infectious origin of Alzheimer’s disease: Amyloid beta as a component of brain antimicrobial immunity. PLoS pathogens 2022;18(11):e1010929

  14. [14]

    Facial genetics: a brief overview

    Richmond S, Howe LJ, Lewis S, Stergiakouli E, Zhurov A. Facial genetics: a brief overview. Frontiers in genetics 2018;9:462

  15. [15]

    GeneCards ™ 2002: towards a complete, object - oriented, human gene compendium

    Safran M, Solomon I, Shmueli O, Lapidot M , Shen-Orr S, et al. GeneCards ™ 2002: towards a complete, object - oriented, human gene compendium. Bioinformatics 2002; 18(11):1542-3

  16. [16]

    Principal component analysis of RNA-seq data unveils a novel prostate cancer - associated gene expression signature

    Perera Y , Gonzalez A and Perez R. Principal component analysis of RNA-seq data unveils a novel prostate cancer - associated gene expression signature. Arch. Can. Res 2021. 9(S4): 002

  17. [17]

    Review The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge

    Tomczak K, Czerwińska P, Wiznerowicz M. Review The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemporary Oncology/Współczesna Onkologia 2015; 2015(1):68 -77

  18. [18]

    On the gene expression landscape of cancer

    Gonzalez A, Leon DA, Perera Y , Perez R. On the gene expression landscape of cancer. Plos one 2023;18(2):e0277786

  19. [19]

    Look-alike humans identified by facial recognition algorithms show genetic similarities

    Joshi RS, Rigau M, García -Prieto CA, de Moura MC, Piñeyro D, et al. Look-alike humans identified by facial recognition algorithms show genetic similarities. Cell reports 2022;40(8)

  20. [20]

    Facial expression recognition based on Electroencephalogram and facial landmark localization

    Li D, Wang Z, Gao Q, Song Y , Yu X, et al. Facial expression recognition based on Electroencephalogram and facial landmark localization. Technology and Health Care 2019;27(4):373-87. 8

  21. [21]

    Age progression/regression by conditional adversarial autoencoder

    Zhang Z, Song Y , Qi H. Age progression/regression by conditional adversarial autoencoder. InProceedings of the IEEE conference on computer vision and pattern recognition 2017; 5810 -5818. 9 Fig. 1. Top: Principal Component Analysis of world cancer risk data. Ethnic and cultural groups exhibit distinctive patterns. Bottom: Component variances and directio...