Recognition: unknown
Predicting Blastocyst Formation in IVF: Integrating DINOv2 and Attention-Based LSTM on Time-Lapse Embryo Images
Pith reviewed 2026-05-10 15:58 UTC · model grok-4.3
The pith
A hybrid DINOv2 and attention LSTM model predicts which embryos will form blastocysts from limited daily images at 96.4 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that DINOv2 extracts useful spatial features from embryo images and an LSTM equipped with multi-head attention then models their temporal progression to predict blastocyst formation, reaching 96.4 percent accuracy on a dataset of 704 videos while remaining robust to missing frames.
What carries the argument
The hybrid pipeline in which DINOv2 supplies per-image feature vectors that are then processed by a multi-head attention LSTM to capture developmental dynamics over time.
Load-bearing premise
The 704 embryo videos used for training and testing represent the range of imaging conditions and patient demographics encountered in other IVF laboratories.
What would settle it
Accuracy falling below 85 percent when the trained model is applied to embryo images collected at a different clinic with different time-lapse cameras or patient populations.
Figures
read the original abstract
The selection of the optimal embryo for transfer is a critical yet challenging step in in vitro fertilization (IVF), primarily due to its reliance on the manual inspection of extensive time-lapse imaging data. A key obstacle in this process is predicting blastocyst formation from the limited number of daily images available. Many clinics also lack complete time-lapse systems, so full videos are often unavailable. In this study, we aimed to predict which embryos will develop into blastocysts using limited daily images from time-lapse recordings. We propose a novel hybrid model that combines DINOv2, a transformer-based vision model, with an enhanced long short-term memory (LSTM) network featuring a multi-head attention layer. DINOv2 extracts meaningful features from embryo images, and the LSTM model then uses these features to analyze embryo development over time and generate final predictions. We tested our model on a real dataset of 704 embryo videos. The model achieved 96.4% accuracy, surpassing existing methods. It also performs well with missing frames, making it valuable for many IVF laboratories with limited imaging systems. Our approach can assist embryologists in selecting better embryos more efficiently and with greater confidence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a hybrid model that uses DINOv2 to extract features from time-lapse embryo images and feeds them into an attention-augmented LSTM to predict blastocyst formation. It evaluates the approach on a dataset of 704 embryo videos, reports 96.4% accuracy (surpassing prior methods), and claims robustness when frames are missing.
Significance. If the accuracy claim survives proper patient-level cross-validation and external testing, the work would offer a practical aid for embryo selection in IVF clinics that lack complete time-lapse systems. The choice of a pre-trained vision transformer plus temporal attention is a reasonable modern adaptation, and explicit handling of incomplete sequences addresses a genuine clinical constraint.
major comments (2)
- [Results] Results section: the headline 96.4% accuracy on 704 videos is presented without any information on train-test split ratios, patient- or embryo-level stratification, k-fold cross-validation, class balance, or statistical testing. In time-series embryo data, failure to isolate images from the same IVF cycle across splits risks leakage and renders the performance claim uninterpretable.
- [Methods] Methods section: no description is given of how the 704 videos were acquired (number of patients, embryos per patient, imaging protocol, or exact daily sampling), nor of the baseline methods, their hyper-parameters, or the statistical tests used to assert superiority. These omissions make it impossible to assess whether the reported gains are reproducible or clinically meaningful.
minor comments (1)
- [Abstract] The abstract would benefit from a single sentence on validation strategy to allow readers to gauge the 96.4% figure immediately.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important omissions in our description of the experimental protocol. We agree that these details are necessary for assessing the validity of our results and will revise the manuscript accordingly to enhance transparency and reproducibility.
read point-by-point responses
-
Referee: [Results] Results section: the headline 96.4% accuracy on 704 videos is presented without any information on train-test split ratios, patient- or embryo-level stratification, k-fold cross-validation, class balance, or statistical testing. In time-series embryo data, failure to isolate images from the same IVF cycle across splits risks leakage and renders the performance claim uninterpretable.
Authors: We agree that the original manuscript omitted these critical details on the evaluation protocol, which is a valid concern given the risk of data leakage in time-series embryo imaging. In the revised version, we will add a dedicated subsection detailing the train-test split ratios, patient-level stratification, k-fold cross-validation procedure, class balance, and the statistical tests used to compare against baselines. This will directly address the potential for leakage and make the 96.4% accuracy claim fully interpretable. revision: yes
-
Referee: [Methods] Methods section: no description is given of how the 704 videos were acquired (number of patients, embryos per patient, imaging protocol, or exact daily sampling), nor of the baseline methods, their hyper-parameters, or the statistical tests used to assert superiority. These omissions make it impossible to assess whether the reported gains are reproducible or clinically meaningful.
Authors: We acknowledge that the Methods section was insufficiently detailed regarding dataset acquisition and the implementation of baselines. We will expand this section in the revision to describe the acquisition process (including patient and embryo counts, imaging protocol, and daily sampling), provide full descriptions of the baseline methods along with their hyper-parameters, and specify the statistical tests employed. These additions will support reproducibility and allow readers to better evaluate the clinical relevance of the reported improvements. revision: yes
Circularity Check
Standard supervised ML pipeline with no circular derivation
full rationale
The paper describes a conventional supervised learning setup: DINOv2 extracts image features from time-lapse embryo frames, these features are fed into an LSTM with multi-head attention for temporal modeling, the network is trained on labeled videos, and accuracy is measured on held-out test data. No load-bearing step reduces by construction to its own inputs, no fitted parameter is relabeled as a prediction, and no self-citation chain is invoked to justify the architecture or results. The reported 96.4% accuracy is an empirical evaluation metric, not a tautological consequence of the model definition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Eugster, A
A. Eugster, A. J. Vingerhoets, Psychological aspects of in vitro fertil- ization: a review, Social science & medicine 48 (5) (1999) 575–589
1999
-
[2]
D. A. Blake, M. Proctor, N. Johnson, D. Olive, C. M. Farquhar, Q. Lam- berts, Cleavage stage versus blastocyst stage embryo transfer in assisted conception, Cochrane Database of Systematic Reviews (4) (2005)
2005
-
[3]
H. M. Lukassen, D. D. Braat, A. M. Wetzels, G. A. Zielhuis, E. M. Adang, E. Scheenjes, J. A. Kremer, Two cycles with single embryo transfer versus one cycle with double embryo transfer: a randomized controlled trial, Human Reproduction 20 (3) (2005) 702–708
2005
-
[4]
J.E.Swain, Decisionsfortheivflaboratory: comparativeanalysisofem- bryo culture incubators, Reproductive biomedicine online 28 (5) (2014) 535–547
2014
-
[5]
C. Wong, A. Chen, B. Behr, S. Shen, Time-lapse microscopy and image analysis in basic and clinical embryo development research, Reproduc- tive BioMedicine Online 26 (2) (2013) 120–129
2013
-
[6]
Q. Liao, Q. Zhang, X. Feng, H. Huang, H. Xu, B. Tian, J. Liu, Q. Yu, N. Guo, Q. Liu, et al., Development of deep learning algorithms for predicting blastocyst formation and quality by time-lapse monitoring, Communications biology 4 (1) (2021) 415
2021
-
[7]
Machtinger, C
R. Machtinger, C. Racowsky, Morphological systems of human embryo assessmentandclinicalevidence, Reproductivebiomedicineonline26(3) (2013) 210–221
2013
-
[8]
Motato, M
Y. Motato, M. J. de los Santos, M. J. Escriba, B. A. Ruiz, J. Remohí, M. Meseguer, Morphokinetic analysis and embryonic prediction for blas- tocyst formation through an integrated time-lapse system, Fertility and sterility 105 (2) (2016) 376–384. 22
2016
-
[9]
Z. A. Varzaneh, A. Orooji, L. Erfannia, M. Shanbehzadeh, A new covid- 19 intubation prediction strategy using an intelligent feature selection and k-nn method, Informatics in medicine unlocked 28 (2022) 100825
2022
-
[10]
Jamali, P
M. Jamali, P. Davidsson, R. Khoshkangini, M. G. Ljungqvist, R.-C. Mihailescu, Context in object detection: a systematic literature review, Artificial Intelligence Review 58 (6) (2025) 1–89
2025
-
[11]
Z. A. Varzaneh, S. M. Mousavi, R. Khoshkangini, S. M. Moosavi Khaliji, An ensemble model based on transfer learning for the early detection of alzheimer’s disease, Scientific Reports 15 (1) (2025) 34634
2025
-
[12]
D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis, Annual review of biomedical engineering 19 (1) (2017) 221–248
2017
-
[13]
M. I. Razzak, S. Naz, A. Zaib, Deep learning for medical image pro- cessing: Overview, challenges and the future, Classification in BioApps: Automation of decision making (2017) 323–350
2017
-
[14]
E. I. Fernandez, A. S. Ferreira, M. H. M. Cecílio, D. S. Chéles, R. C. M. de Souza, M. F. G. Nogueira, J. C. Rocha, Artificial intelligence in the ivf laboratory: overview through the application of different types of algorithms for the classification of reproductive data, Journal of Assisted Reproduction and Genetics 37 (10) (2020) 2359–2376
2020
-
[15]
Luong, N
T.-M.-T. Luong, N. Q. K. Le, Artificial intelligence in time-lapse sys- tem: advances, applications, and future perspectives in reproductive medicine, Journal of assisted reproduction and genetics 41 (2) (2024) 239–252
2024
-
[16]
Abbasi, P
M. Abbasi, P. Saeedi, J. Au, J. Havelock, Time series classification for modality-converted videos: A case study on predicting human embryo implantation from time-lapse images, in: 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP), IEEE, 2023, pp. 1–6
2023
-
[17]
A.Sharma, A.Dorobantiu, S.Ali, M.Iliceto, M.H.Stensen, E.Delbarre, M. A. Riegler, H. L. Hammer, Deep learning methods to forecasting human embryo development in time-lapse videos, bioRxiv (2024) 2024– 03. 23
2024
-
[18]
Kalyani, P
K. Kalyani, P. S. Deshpande, A deep learning model for predicting blas- tocyst formation from cleavage-stage human embryos using time-lapse images, Scientific Reports 14 (1) (2024) 28019
2024
-
[19]
Gomez, M
T. Gomez, M. Feyeux, J. Boulant, N. Normand, L. David, P. Paul- Gilloteaux, T. Fréour, H. Mouchère, A time-lapse embryo dataset for morphokinetic parameter prediction, Data in Brief 42 (2022) 108258
2022
-
[20]
Y. A. Mohamed, U. K. Yusof, I. S. Isa, M. M. Zain, An automated blas- tocyst grading system using convolutional neural network and transfer learning, in: 2023 IEEE 13th International Conference on Control Sys- tem, Computing and Engineering (ICCSCE), IEEE, 2023, pp. 202–207
2023
-
[21]
A.A.Mazroa, M.Maashi, Y.Said, M.Maray, A.A.Alzahrani, A.Alkha- rashi, A. M. Al-Sharafi, Anomaly detection in embryo development and morphology using medical computer vision-aided swin transformer with boosted dipper-throated optimization algorithm, Bioengineering 11 (10) (2024) 1044
2024
-
[22]
J. Kim, Z. Shi, D. Jeong, J. Knittel, H. Y. Yang, Y. Song, W. Li, Y. Li, D. Ben-Yosef, D. Needleman, et al., Multimodal learning for embryo vi- ability prediction in clinical ivf, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2024, pp. 542–552
2024
-
[23]
X. Xie, P. Yan, F.-Y. Cheng, F. Gao, Q. Mai, G. Li, Early prediction of blastocyst development via time-lapse video analysis, in: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), IEEE, 2022, pp. 1–5
2022
-
[24]
K. Garg, A. Dev, P. Bansal, H. Mittal, An efficient deep learning model for embryo classification, in: 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, 2024, pp. 358–363
2024
-
[25]
Z. A. Varzaneh, N. Wölner-Hanssen, R. Khoshkangini, A lightweight transformer approach for predicting blastocyst formation on limited em- bryo images, in: 2025 International Conference on Visual Communica- tions and Image Processing (VCIP), IEEE, 2025, pp. 1–5. 24
2025
-
[26]
P. C. of the American Society for Reproductive Medicine, P. C. of the Society for Assisted Reproductive Technology, et al., Blastocyst culture and transfer in clinically assisted reproduction: a committee opinion, Fertility and Sterility 110 (7) (2018) 1246–1252
2018
-
[27]
DINOv2: Learning Robust Visual Features without Supervision
M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khali- dov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., Dinov2: Learning robust visual features without supervision, arXiv preprint arXiv:2304.07193 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
Hashemi, Enlarging smaller images before inputting into convolu- tional neural network: zero-padding vs
M. Hashemi, Enlarging smaller images before inputting into convolu- tional neural network: zero-padding vs. interpolation, Journal of Big Data 6 (1) (2019) 1–13
2019
-
[29]
Y. Yu, X. Si, C. Hu, J. Zhang, A review of recurrent neural networks: Lstm cells and network architectures, Neural computation 31 (7) (2019) 1235–1270
2019
-
[30]
D. Neil, M. Pfeiffer, S.-C. Liu, Phased lstm: Accelerating recurrent net- work training for long or event-based sequences, Advances in neural information processing systems 29 (2016)
2016
-
[31]
S. M. Al-Selwi, M. F. Hassan, S. J. Abdulkadir, A. Muneer, E. H. Sum- iea, A. Alqushaibi, M. G. Ragab, Rnn-lstm: From applications to mod- eling techniques and beyond—systematic review, Journal of King Saud University-Computer and Information Sciences (2024) 102068
2024
-
[32]
Multi-head attention: Collaborate instead of concatenate.arXiv preprint arXiv:2006.16362,
J.-B. Cordonnier, A. Loukas, M. Jaggi, Multi-head attention: Collabo- rate instead of concatenate, arXiv preprint arXiv:2006.16362 (2020)
- [33]
-
[34]
Naidu, T
G. Naidu, T. Zuva, E. M. Sibanda, A review of evaluation metrics in machine learning algorithms, in: Computer science on-line conference, Springer, 2023, pp. 15–25
2023
-
[35]
Vujović, et al., Classification model evaluation metrics, International Journal of Advanced Computer Science and Applications 12 (6) (2021) 599–606
Ž. Vujović, et al., Classification model evaluation metrics, International Journal of Advanced Computer Science and Applications 12 (6) (2021) 599–606. 25
2021
-
[36]
URLhttps://www.kaggle.com/datasets/modlee/time-series- classification-data/data
Modlee, Car (2024). URLhttps://www.kaggle.com/datasets/modlee/time-series- classification-data/data
2024
-
[37]
URLhttps://www.kaggle.com/datasets/shebrahimi/financial- distress?select=Financial+Distress.csv
Ebrahimi, Financial (2017). URLhttps://www.kaggle.com/datasets/shebrahimi/financial- distress?select=Financial+Distress.csv
2017
-
[38]
Candanedo, Occupancy (2016)
L. Candanedo, Occupancy (2016). URLhttps://archive.ics.uci.edu/dataset/357/occupancy+dete ction
2016
-
[39]
Roesler, Eeg (2016)
O. Roesler, Eeg (2016). URLhttps://archive.ics.uci.edu/dataset/264/eeg+eye+state 26
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.