arxiv: 2604.26869 · v1 · submitted 2026-04-29 · 💻 cs.LG · cs.CV

Recognition: unknown

KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment

Attila Pint\'er , Javier Rico , Attila R\'epai , Jalal Al-Afandi , Adrienn \'Eva Borsy , Andr\'as Kozma , Hajnalka Andrikovics , Gy\"orgy Cserey

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:40 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords karyotypingmicroservice architectureAI-assisted cytogeneticschromosome segmentationcloud and on-premise deploymentEfficientNetMask R-CNNclinical cytogenetics

0 comments

The pith

KAYRA packages a multi-model AI pipeline for chromosome analysis as a microservice that deploys equally well in the cloud or on local servers without moving patient data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KAYRA as an end-to-end system for karyotyping that combines three neural networks—segmentation, instance detection, and classification—inside a containerized microservice architecture. The design uses cascaded region-of-interest narrowing so each model works only on the relevant chromosome areas, and the same containers run either as a cloud service or as a fully on-premise installation. In a pilot test on 459 chromosomes from ten metaphase spreads, the system reached 98.91 percent segmentation accuracy and 89.1 percent classification accuracy, outperforming an older density-thresholding commercial tool on all measured axes and a modern AI-supported tool on segmentation. The authors argue that this architecture meets the operational constraints of clinical cytogenetic labs, including mandatory human expert review and strict data-privacy rules.

Core claim

KAYRA is a containerized microservice pipeline that orchestrates an EfficientNet-B5 plus U-Net semantic segmenter, a Mask R-CNN instance detector, and a ResNet-18 classifier through cascaded ROI narrowing; the same images run as either a cloud-hosted service or an on-premise installation and deliver 98.91 percent segmentation accuracy, 89.1 percent classification accuracy, and 89.76 percent rotation accuracy on a pilot set of 459 chromosomes from ten metaphase spreads, with statistically significant gains over the older reference on all three metrics.

What carries the argument

The containerized microservice pipeline with cascaded ROI-narrowing that routes only the chromosome-bearing regions to each successive neural network while supporting identical deployment in cloud or on-premise environments.

If this is right

Clinical laboratories with data-egress restrictions can run the full AI workflow locally while still receiving model updates through container images.
The human-in-the-loop review step remains unchanged, allowing cytogeneticists to correct or override AI outputs before final karyotype reporting.
Segmentation gains feed directly into downstream classification and rotation steps, potentially reducing the total number of manual corrections required per spread.
The architecture isolates each model so that one component can be retrained or replaced without redeploying the entire pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same microservice pattern could be reused for other high-resolution medical imaging tasks that must stay inside institutional firewalls.
If the pilot performance scales with larger training sets, the classification gap versus the modern reference might reach statistical significance in a bigger study.
On-premise deployment removes the need for continuous network connectivity, which may matter for labs in regions with unreliable internet.

Load-bearing premise

That accuracy measured on 459 chromosomes from ten metaphase spreads will generalize to the full variety of clinical samples and that the two commercial reference systems are fair and up-to-date benchmarks.

What would settle it

A prospective study on several hundred additional metaphase spreads drawn from diverse patient populations, scored by the same three metrics and tested with the identical Fisher exact test, would confirm or refute whether the reported accuracy advantages hold outside the pilot set.

Figures

Figures reproduced from arXiv: 2604.26869 by Adrienn \'Eva Borsy, Andr\'as Kozma, Attila Pint\'er, Attila R\'epai, Gy\"orgy Cserey, Hajnalka Andrikovics, Jalal Al-Afandi, Javier Rico.

**Figure 1.** Figure 1: Cooperation between instance detection and segmentation on a cluster of crossing chromosomes. Panel (a) shows Mask R-CNN’s per-chromosome bounding-box proposals — including overlapping, ambiguous boxes around the crossing region. Panel (b) shows the corresponding per-instance segmentation masks: the U-Net-refined ROI feeds Mask R-CNN’s mask head, which separates the crossing chromosomes (highlighted con… view at source ↗

**Figure 2.** Figure 2: Aggregate accuracy on 459 chromosomes from 10 metaphase spreads, for KAYRA versus the two commercial reference systems. KAYRA improves over the older density-thresholding reference on all three axes (p < 0.0001 for segmentation and classification, by Fisher’s exact test on chromosome-level counts) and over the modern AI-supported reference on segmentation (p < 0.0001); the classification gap to the modern … view at source ↗

read the original abstract

We present KAYRA, an end-to-end karyotyping system that operates inside the operational constraints of a clinical cytogenetic laboratory. KAYRA is architected as a containerized microservice pipeline whose ML stack combines an EfficientNet-B5 + U-Net semantic segmenter, a Mask R-CNN (ResNet-50 + FPN) instance detector, and a ResNet-18 classifier, orchestrated through a cascaded ROI-narrowing strategy that focuses each downstream model on the chromosome-bearing region. The same container images are deployed both as a cloud service and as an on-premise installation, supporting clinical environments where patient-data egress is not permitted as well as those where it is. A pilot clinical evaluation against two commercial reference karyotyping systems on 459 chromosomes from 10 metaphase spreads shows segmentation accuracy of 98.91 % (vs. 78.21 % / 40.52 %), classification accuracy of 89.1 % (vs. 86.9 % / 54.5 %), and rotation accuracy of 89.76 % (vs. 94.55 % / 78.43 %). KAYRA improves over the older density-thresholding reference on all three axes (p < 0.0001 for segmentation and classification by Fisher's exact test on chromosome-level counts), and on segmentation also against the modern AI- supported reference (p < 0.0001); on classification the difference vs. the modern AI reference is not statistically significant at the present test-set size (p = 0.34). The system reaches TRL 6 maturity and integrates the human-in-the-loop expert-review workflow that diagnostic cytogenetic practice requires. The thesis of this paper is that a multi-model cytogenetic AI service can be packaged as a microservice architecture supporting flexible deployment - cloud-hosted or on-premise - while delivering strong empirical performance on a pilot clinical evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KAYRA is a practical containerized pipeline for karyotyping that handles cloud and on-prem constraints, but the pilot evaluation on 10 spreads has clear statistical and scale problems.

read the letter

KAYRA packages standard vision models into a microservice that runs the same code either in the cloud or locally without sending patient data out. That dual-deployment design is the main concrete thing the paper adds for cytogenetic labs facing data-egress rules. The pipeline chains an EfficientNet-B5 plus U-Net segmenter, Mask R-CNN detector, and ResNet-18 classifier through cascaded ROI narrowing, then keeps the human expert review step that real diagnostic work requires. The abstract reports 98.91 percent segmentation accuracy on 459 chromosomes from 10 spreads, beating the older reference on all three metrics and the modern AI reference on segmentation, with p less than 0.0001 from Fisher's exact test on some comparisons. It reaches TRL 6 and shows the system can be installed on-premise. That combination of known models with a working deployment story is the useful engineering piece. The evaluation has two clear soft spots. Ten metaphase spreads is too small a test set to support claims about routine clinical performance, and chromosomes within one spread share staining and imaging conditions so they are not independent samples. Running Fisher's exact test on the 459 individual chromosome counts therefore inflates the significance and understates uncertainty. Training data details, error bars, and fuller descriptions of the commercial baselines are also missing, and rotation accuracy came in below one of the references. This paper is for groups building or evaluating deployable medical imaging tools rather than for readers chasing new algorithms. A cytogenetic lab or medical AI engineer could extract the microservice pattern and the cascaded workflow as a starting point. It deserves a serious referee because it ships an implemented system with real pilot numbers instead of pure theory, even though the statistics and sample size will need tightening before publication.

Referee Report

4 major / 2 minor

Summary. The manuscript presents KAYRA, a containerized microservice architecture for AI-assisted karyotyping that combines an EfficientNet-B5 + U-Net segmenter, Mask R-CNN detector, and ResNet-18 classifier in a cascaded ROI-narrowing pipeline. The system supports both cloud and on-premise deployment to address clinical data-privacy constraints, reaches TRL 6, and incorporates human-in-the-loop review. A pilot evaluation on 459 chromosomes from 10 metaphase spreads reports segmentation accuracy of 98.91% (vs. 78.21%/40.52%), classification accuracy of 89.1% (vs. 86.9%/54.5%), and rotation accuracy of 89.76% (vs. 94.55%/78.43%), with Fisher's exact tests claiming statistically significant gains over two commercial references on segmentation and classification.

Significance. If the reported performance gains hold after correcting for within-spread dependencies, the work would demonstrate a practically deployable, privacy-aware AI karyotyping service that integrates into existing clinical workflows. The microservice packaging, dual-deployment support, and explicit human-in-the-loop design are concrete strengths that address real operational constraints in cytogenetic laboratories.

major comments (4)

[Pilot clinical evaluation] Pilot clinical evaluation: Fisher's exact test is applied to 459 individual chromosome counts to support p < 0.0001 claims for segmentation and classification improvements. Chromosomes within each of the 10 metaphase spreads share staining, preparation, and imaging conditions and are therefore dependent; treating them as independent units violates the test assumption and inflates significance. A clustered analysis (e.g., treating spreads as the unit of replication or using mixed-effects models) is required to substantiate the headline statistical claims.
[Methods] Methods: No details are supplied on the size, source, or composition of the training data for the EfficientNet-B5 + U-Net, Mask R-CNN, or ResNet-18 models, nor on hyperparameter selection, training/validation splits, or regularization. Without this information the risk of overfitting to the small pilot test set cannot be assessed.
[Results] Results: The test set comprises only 10 metaphase spreads. While chromosome-level counts are large, no per-spread accuracy breakdowns, confidence intervals, or variability analysis across spreads are reported, limiting claims about generalization to routine clinical workloads with heterogeneous preparation and imaging conditions.
[Comparison to baselines] Comparison to baselines: The two commercial reference systems are not described with respect to their underlying algorithms, versions, training regimes, or update status. This absence makes it difficult to judge whether the reported performance differences constitute fair, contemporary comparisons.

minor comments (2)

[Abstract] Abstract: KAYRA's rotation accuracy (89.76%) is lower than one commercial reference (94.55%), yet the significance statements emphasize only the positive comparisons; a balanced presentation of all three metrics would improve clarity.
[Abstract] Overall: The phrase 'the thesis of this paper is that...' in the final sentence of the abstract is unconventional for a research article and could be replaced with a standard summary statement.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important statistical, methodological, and reporting issues that we will address in the revision. We respond point by point below.

read point-by-point responses

Referee: Pilot clinical evaluation: Fisher's exact test is applied to 459 individual chromosome counts... treating them as independent units violates the test assumption and inflates significance. A clustered analysis is required.

Authors: We agree that the independence assumption is violated, as chromosomes within each metaphase spread share staining, preparation, and imaging conditions. The chromosome-level Fisher's exact tests were used for an initial pilot assessment, but this is a valid concern. In the revised manuscript we will add a clustered analysis (e.g., mixed-effects logistic regression with spread as random effect) or report per-spread accuracies with appropriate variability measures. We will also include a clear caveat on the pilot nature of the evaluation and the limitations of the current statistical approach. revision: yes
Referee: Methods: No details are supplied on the size, source, or composition of the training data for the EfficientNet-B5 + U-Net, Mask R-CNN, or ResNet-18 models, nor on hyperparameter selection, training/validation splits, or regularization.

Authors: We acknowledge the omission. The revised Methods section will be expanded to include, for each model: training set sizes (images and chromosomes), data sources (public datasets and/or de-identified clinical collections), class balance and composition, train/validation/test splits, hyperparameter tuning procedure, and regularization methods (dropout, augmentation, weight decay). This will allow readers to assess overfitting risk relative to the pilot test set. revision: yes
Referee: Results: The test set comprises only 10 metaphase spreads. While chromosome-level counts are large, no per-spread accuracy breakdowns, confidence intervals, or variability analysis across spreads are reported.

Authors: We recognize that the small number of spreads limits strong generalization claims. The revised Results and supplementary material will include per-spread accuracy tables, standard deviations across the 10 spreads, and binomial or bootstrap confidence intervals. We will also strengthen the Discussion to emphasize the pilot scale and the need for larger multi-center validation. revision: yes
Referee: Comparison to baselines: The two commercial reference systems are not described with respect to their underlying algorithms, versions, training regimes, or update status.

Authors: We will revise the comparison section to provide additional available details on the two commercial systems, including their algorithmic basis (one density-thresholding, one AI-supported), reported versions at the time of testing, and any public information on training data or updates. As these are proprietary products, complete internal training regimes cannot be disclosed, but we will clarify the comparison protocol and limitations to the best of our knowledge. revision: partial

Circularity Check

0 steps flagged

No circularity: all claims are direct empirical comparisons on held-out pilot data

full rationale

The paper describes a containerized microservice pipeline (EfficientNet-B5 + U-Net, Mask R-CNN, ResNet-18) and reports segmentation, classification, and rotation accuracies measured on 459 chromosomes from 10 metaphase spreads against two external commercial systems. Performance differences are assessed via Fisher's exact test on chromosome-level counts. No equations, derivations, fitted parameters renamed as predictions, or self-citations appear as load-bearing steps. The central thesis rests on external benchmarking rather than internal construction or self-referential justification.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied engineering paper describing a software system and its empirical evaluation. No free parameters, mathematical axioms, or new invented entities are introduced; performance claims rest on the pilot dataset and model training.

pith-pipeline@v0.9.0 · 5702 in / 1310 out tokens · 60473 ms · 2026-05-07T11:40:54.309840+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages · 2 internal anchors

[1]

2023] You, S., Xia, J., et al

Cytogenetic AI: [You et al. 2023] You, S., Xia, J., et al. (2023). AutoKary2022: A Large-Scale Densely Annotated Dataset for Chromosome Instance Segmentation. IEEE ICME. arXiv:2303.15839. [Xia et al. 2024] Xia, J., Wang, J., et al. (2024). KaryoXpert: An accurate chromosome seg- mentation and classification framework. Computers in Biology and Medicine 177...

work page arXiv 2023
[2]

[He et al

arXiv:1703.06870. [He et al. 2016] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. CVPR

work page arXiv 2016
[3]

Deep Residual Learning for Image Recognition

arXiv:1512.03385. [Ren et al. 2015] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NeurIPS

work page internal anchor Pith review arXiv 2015
[4]

Girshick, and Jian Sun

arXiv:1506.01497. [Ronneberger et al. 2015] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI

work page arXiv 2015
[5]

U-Net: Convolutional Networks for Biomedical Image Segmentation

arXiv:1505.04597. [Cai & Vasconcelos 2018] Cai, Z., & Vasconcelos, N. (2018). Cascade R-CNN: Delving into High Quality Object Detection. CVPR

work page internal anchor Pith review arXiv 2018
[6]

[Liu et al

arXiv:1712.00726. [Liu et al. 2021] Liu, Z., Lin, Y., et al. (2021). Swin Transformer: Hierarchical Vision Trans- former using Shifted Windows. ICCV

work page arXiv 2021
[7]

arXiv preprint arXiv:2103.14030 , year=

arXiv:2103.14030. [Cheng et al. 2022] Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022). Masked-attention Mask Transformer for Universal Image Segmentation. CVPR

work page arXiv 2022
[8]

[Otsu 1979] Otsu, N

arXiv:2112.01527. [Otsu 1979] Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. SMC 9(1):62–66. [Alguacil et al. 2021] Alguacil, A., et al. (2021). Effects of boundary conditions in fully convo- lutional networks. arXiv:2106.11160. 10

work page arXiv 1979