pith. machine review for the scientific record. sign in

arxiv: 2603.25762 · v2 · submitted 2026-03-26 · 🧬 q-bio.GN · quant-ph

Recognition: no theorem link

QHap: Quantum-Inspired Haplotype Phasing

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:29 UTC · model grok-4.3

classification 🧬 q-bio.GN quant-ph
keywords haplotype phasingMax-Cut optimizationsimulated bifurcationlong-read sequencingMHC regionPore-C dataquantum-inspired algorithmsswitch error
0
0 comments X

The pith

QHap reformulates haplotype phasing as a Max-Cut problem and solves it with a GPU-accelerated ballistic simulated bifurcation method to deliver 4- to 20-fold speedups while keeping switch errors at zero on the MHC region.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that the NP-hard haplotype phasing task can be recast as a graph optimization problem whose solution yields accurate parental allele assignments at much higher speed than current tools. It does so by building a Max-Cut graph whose edges carry quality-weighted probabilities from sequencing reads and then applying a classical but physics-inspired solver that runs efficiently on GPUs. The central demonstration is that this approach matches or exceeds the accuracy of established programs on long-read data from several platforms while cutting runtime dramatically on the highly variable MHC region. Adding chromatin conformation data further extends the method to near-chromosome-scale haplotypes. A sympathetic reader would care because phasing underpins precision medicine and population genetics, yet existing algorithms cannot keep pace with the volume of long-read sequencing now being generated.

Core claim

Haplotype phasing is reformulated as a Max-Cut problem on a graph whose edges are constructed from read overlaps with quality-weighted probabilistic weights; this instance is then solved by a ballistic simulated bifurcation algorithm running on GPUs. The resulting haplotypes show zero switch error relative to ground truth on the MHC region across multiple long-read platforms and deliver 4- to 20-fold acceleration compared with HapCUT2 and WhatsHap. A read-based regional mode and an SNP-based chromosome-scale mode are both supported, and incorporation of Pore-C chromatin data raises haplotype N50 by up to 15-fold.

What carries the argument

Max-Cut reformulation of haplotype phasing with quality-weighted probabilistic edges, solved by a GPU-accelerated ballistic simulated bifurcation optimizer.

If this is right

  • Regional and chromosome-scale phasing become feasible on commodity hardware for the volume of data produced by current long-read platforms.
  • Integration of chromatin conformation capture data routinely extends haplotype blocks to near-chromosome length.
  • The same graph-construction and solver pipeline can be applied to other long-range genomic linkage problems that are currently treated as separate NP-hard tasks.
  • Classical hardware running physics-inspired solvers can absorb the computational growth of sequencing datasets without requiring quantum hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the Max-Cut mapping proves robust, similar reformulations may accelerate other NP-hard problems in genomics such as de novo assembly or variant calling under complex population models.
  • The zero-switch-error result on the MHC suggests the probabilistic edge weights are capturing linkage information that standard likelihood models miss; testing the same weights on whole-genome data would reveal whether the advantage generalizes.
  • Because the solver runs on GPUs, it can be embedded directly in existing sequencing pipelines, shortening the time from raw reads to phased haplotypes for clinical use.

Load-bearing premise

Casting haplotype relationships as a Max-Cut graph whose edges are built from sequencing read qualities and probabilities does not introduce systematic biases that would raise switch errors on real biological data.

What would settle it

A blinded comparison on an independent long-read MHC dataset in which the Max-Cut solutions produce a higher switch-error rate than the best existing tool.

Figures

Figures reproduced from arXiv: 2603.25762 by Chentao Yang, Dongming Fang, Jiawei Zhang, Jun-Han Huang, Lei He, Lin Yang, Man-Hong Yung, Qinyuan Zheng, Rui Zhang, Wanyi Chen, Xian-Zhe Tao, Xinmeng Shi, Yang Zhou, Yibo Chen, Yuhui Sun.

Figure 1
Figure 1. Figure 1: Overview of the QHap framework. QHap takes aligned reads (BAM) and variant calls (VCF) as input, constructs a read–SNP base matrix B, and initializes a haplotype pair during preprocessing (a). In the read-based method (b), B is encoded into a ternary matrix M from which a weighted graph is built with reads as vertices, where each edge weight counts the number of shared loci carrying opposing alleles. Conne… view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison of quantum-inspired optimization algorithms on QHap-constructed graphs. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of Ising energy convergence for quantum-inspired optimization algorithms and classical SA. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of SNP linkage depth on phasing accuracy and computational cost. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance of QHap integrated with Pore-C data. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Schematic overview of Pore-C data integration in the QHap framework. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Haplotype phasing, the process of resolving parental allele inheritance patterns in diploid genomes, is critical for precision medicine and population genetics, yet the underlying optimization is NP-hard, posing a scalability challenge. To address this, we introduce QHap, a haplotype phasing algorithm that leverages quantum-annealing-inspired optimization. By reformulating haplotype phasing as a Max-Cut problem and deploying a GPU-accelerated ballistic simulated bifurcation solver, QHap accelerates phasing while maintaining accuracy comparable to established phasing tools. On the highly polymorphic human major histocompatibility complex region, QHap demonstrates 4- to 20-fold acceleration over HapCUT2 and WhatsHap with zero switch error across multiple long-read sequencing platforms. The framework implements two strategies: a read-based method for regional phasing, and a single nucleotide polymorphism-based method that, through quality-weighted probabilistic edge construction, efficiently scales to chromosome-scale tasks. Integration of Pore-C chromatin conformation capture data increases the haplotype N50 by up to 15-fold, enabling near-chromosome-scale haplotype reconstruction. QHap demonstrates that quantum-inspired algorithms operating on classical hardware offer a promising approach to addressing the growing computational demands of sequencing data, establishing a new paradigm for applying physics-inspired optimization to fundamental challenges in computational genomics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces QHap, a haplotype phasing algorithm that reformulates the NP-hard phasing problem as a Max-Cut optimization on a graph with quality-weighted probabilistic edges derived from read overlaps. It deploys a GPU-accelerated ballistic simulated bifurcation solver and reports 4- to 20-fold speedups over HapCUT2 and WhatsHap with zero switch error on the MHC region across long-read platforms, plus up to 15-fold N50 gains when integrating Pore-C data via read-based and SNP-based strategies.

Significance. If the central performance claims hold under rigorous validation, QHap would demonstrate that classical hardware running physics-inspired heuristics can deliver practical speedups for large-scale phasing without sacrificing accuracy, offering a scalable alternative for precision-medicine applications involving highly polymorphic regions and multi-modal sequencing data.

major comments (3)
  1. [Abstract] Abstract: the headline claim of 'zero switch error' on MHC data across platforms is presented without any description of the ground-truth validation (trio/pedigree or otherwise), error-bar reporting, or explicit mapping from the Max-Cut objective to biological haplotype consistency, rendering the result unverifiable from the given information.
  2. [Abstract] Abstract: the reformulation as a Max-Cut problem with quality-weighted probabilistic edges is asserted to preserve accuracy, yet no ablation isolating the probabilistic weighting against a deterministic edge construction on the same datasets is provided, leaving open the risk of systematic bias in switch-error rates.
  3. [Abstract] Abstract (SNP-based method): because the ballistic simulated-bifurcation solver is a heuristic, the manuscript must quantify how closely the obtained cut matches the global optimum of the weighted graph and whether any mismatch increases switch errors relative to exact solvers on the same instances.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by naming the specific long-read platforms and MHC sample identifiers used for the reported benchmarks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on the manuscript. We have revised the abstract and added supporting analyses and clarifications to address the concerns about validation details, ablations, and heuristic performance. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of 'zero switch error' on MHC data across platforms is presented without any description of the ground-truth validation (trio/pedigree or otherwise), error-bar reporting, or explicit mapping from the Max-Cut objective to biological haplotype consistency, rendering the result unverifiable from the given information.

    Authors: We agree that additional context is warranted in the abstract. The revised abstract now includes: 'Ground-truth haplotypes were derived from trio-phased data in the 1000 Genomes Project, with zero switch errors confirmed across platforms (see Methods for Max-Cut mapping and Results for error bars reported as standard deviations over 10 replicates).' The mapping from the Max-Cut objective to haplotype consistency is explicitly derived in Section 2.1 and Equation (1), where maximizing the cut minimizes allele-assignment inconsistencies equivalent to switch errors. revision: yes

  2. Referee: [Abstract] Abstract: the reformulation as a Max-Cut problem with quality-weighted probabilistic edges is asserted to preserve accuracy, yet no ablation isolating the probabilistic weighting against a deterministic edge construction on the same datasets is provided, leaving open the risk of systematic bias in switch-error rates.

    Authors: We have added the requested ablation study to the revised manuscript as Supplementary Note 1 and Figure S1. On the same MHC datasets, the quality-weighted probabilistic edges produce switch-error rates equal to or lower than deterministic binary edges (average 8% reduction), with no evidence of systematic bias introduced by the weighting scheme. This supports that the reformulation preserves accuracy. revision: yes

  3. Referee: [Abstract] Abstract (SNP-based method): because the ballistic simulated-bifurcation solver is a heuristic, the manuscript must quantify how closely the obtained cut matches the global optimum of the weighted graph and whether any mismatch increases switch errors relative to exact solvers on the same instances.

    Authors: We have added a new subsection (4.4) and Table S2 quantifying heuristic performance. On benchmark instances (up to 500 SNPs) solvable by exact ILP solvers, the ballistic simulated bifurcation achieves cuts within 0.8% of the global optimum on average. The resulting switch errors match those from exact solutions, indicating the small optimality gap does not increase biological error rates. Larger instances report achieved objective values relative to theoretical bounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; Max-Cut reformulation and external solver are independent of fitted inputs

full rationale

The derivation chain consists of a standard reformulation of haplotype phasing as a Max-Cut problem followed by application of a pre-existing ballistic simulated-bifurcation solver. No equation or claim reduces to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation whose validity depends on the present work. Performance numbers are reported from direct empirical runs on external datasets rather than by algebraic identity with the input graph construction. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the Max-Cut objective accurately encodes haplotype consistency and that the simulated bifurcation solver converges to biologically valid solutions without additional regularization.

free parameters (1)
  • quality weight scaling factor
    Used in probabilistic edge construction for SNP-based method; value not specified in abstract.
axioms (1)
  • domain assumption Haplotype phasing can be exactly represented as a Max-Cut problem on a graph of reads or variants.
    Invoked in the reformulation step; standard in some prior phasing work but requires that switch errors map directly to cut edges.

pith-pipeline@v0.9.0 · 5571 in / 1273 out tokens · 23567 ms · 2026-05-15T00:29:39.487234+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Nurk, S., Koren, S.et al.The complete sequence of a human genome.Science376, 44–53 (2022)

  2. [2]

    Liao, W.-W., Asri, M.et al.A draft human pangenome reference.Nature617, 312–324 (2023)

  3. [3]

    Sun, Q. & Li, Y . Advances in haplotype phasing and geno- type imputation.Nat. Rev. Genet.27, 155–169 (2026)

  4. [4]

    J.et al.Improving population scale statistical phasing with whole-genome sequencing data.PLoS Genet.20, e1011092 (2024)

    Wertenbroek, R., Hofmeister, R. J.et al.Improving population scale statistical phasing with whole-genome sequencing data.PLoS Genet.20, e1011092 (2024)

  5. [5]

    Huang, N. & Li, H. SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads.Nat. Methods1–6 (2026)

  6. [6]

    Sci.e19314 (2026)

    Cao, S., Liu, Y .et al.cuteHap: Haplotype-aware struc- tural variant detection in phased long-read sequencing data.Adv. Sci.e19314 (2026)

  7. [7]

    M., Peluso, P.et al.Accurate circular con- sensus long-read sequencing improves variant detection and assembly of a human genome.Nat

    Wenger, A. M., Peluso, P.et al.Accurate circular con- sensus long-read sequencing improves variant detection and assembly of a human genome.Nat. Biotechnol.37, 1155–1162 (2019)

  8. [8]

    Biotechnol.38, 1044–1053 (2020)

    Shafin, K., Pesout, T.et al.Nanopore sequencing and the shasta toolkit enable efficient de novo assembly of eleven human genomes.Nat. Biotechnol.38, 1044–1053 (2020)

  9. [9]

    Liang, H., Zou, Y .et al.Efficiently constructing complete genomes with CycloneSEQ to fill gaps in bacterial draft assemblies.Gigabyte2025, gigabyte154 (2025)

  10. [10]

    New strategies to improve minimap2 alignment accuracy.Bioinformatics37, 4572–4574 (2021)

    Li, H. New strategies to improve minimap2 alignment accuracy.Bioinformatics37, 4572–4574 (2021)

  11. [11]

    A., V ollger, M

    Logsdon, G. A., V ollger, M. R.et al.Long-read human genome sequencing and its applications.Nat. Rev. Genet. 21, 597–614 (2020)

  12. [12]

    J.et al.Scalable nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.Nat

    Kolmogorov, M., Billingsley, K. J.et al.Scalable nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.Nat. Methods20, 1483–1492 (2023)

  13. [13]

    & Bafna, V

    Bansal, V . & Bafna, V . HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.Bioinfor- matics24, i153–i159 (2008)

  14. [14]

    Martin, M., Patterson, M.et al.WhatsHap: fast and accurate read-based phasing.bioRxiv085050 (2016)

  15. [15]

    Genome Res.27, 801–812 (2017)

    Edge, P., Bafna, V .et al.HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res.27, 801–812 (2017)

  16. [16]

    Genet.54, 518–525 (2022)

    Ebler, J., Ebert, P.et al.Pangenome-based genome in- ference allows efficient and accurate genotyping across a 14/15 wide spectrum of variant classes.Nat. Genet.54, 518–525 (2022)

  17. [17]

    D., Formenti, G.et al.Semi-automated as- sembly of high-quality diploid human reference genomes

    Jarvis, E. D., Formenti, G.et al.Semi-automated as- sembly of high-quality diploid human reference genomes. Nature611, 519–531 (2022)

  18. [18]

    Zhou, Q., Ji, F.et al.KSNP: a fast de Bruijn graph- based haplotyping tool approaching data-in time cost. Nat. Commun.15, 3126 (2024)

  19. [19]

    Lin, Y ., Wang, K.et al.Enhanced distributed varia- tional quantum eigensolver for large-scale maxcut prob- lem.arXiv preprint arXiv:2512.22056(2025)

  20. [20]

    Phys.7, 249 (2024)

    Zeng, Q.-G., Cui, X.-P.et al.Performance of quantum annealing inspired algorithms for combinatorial optimiza- tion problems.Commun. Phys.7, 249 (2024)

  21. [21]

    Okawa, H., Zeng, Q.-G.et al.Quantum-annealing- inspired algorithms for track reconstruction at high- energy colliders.Comput. Softw. for Big Sci.8, 16 (2024)

  22. [22]

    Adv.5, eaav2372 (2019)

    Goto, H., Tatsumura, K.et al.Combinatorial optimiza- tion by simulating adiabatic bifurcations in nonlinear Hamiltonian systems.Sci. Adv.5, eaav2372 (2019)

  23. [23]

    L.et al.Ising machines as hardware solvers of combinatorial optimization problems

    Mohseni, N., McMahon, P. L.et al.Ising machines as hardware solvers of combinatorial optimization problems. Nat. Rev. Phys.4, 363–379 (2022)

  24. [24]

    R.et al.FPGA-based simu- lated bifurcation machine

    Tatsumura, K., Dixon, A. R.et al.FPGA-based simu- lated bifurcation machine. In2019 29th International Conference on Field Programmable Logic and Applica- tions (FPL), 59–66 (IEEE, 2019)

  25. [25]

    S., Ulahannan, N.et al.Identifying syn- ergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing.Nat

    Deshpande, A. S., Ulahannan, N.et al.Identifying syn- ergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing.Nat. Biotechnol.40, 1488–1499 (2022)

  26. [26]

    Commun.14, 1250 (2023)

    Zhong, J.-Y ., Niu, L.et al.High-throughput Pore-C reveals the single-allele topology and cell type-specificity of 3D genome folding.Nat. Commun.14, 1250 (2023)

  27. [27]

    M., McDaniel, J.et al.An open resource for ac- curately benchmarking small variant and reference calls

    Zook, J. M., McDaniel, J.et al.An open resource for ac- curately benchmarking small variant and reference calls. Nat. Biotechnol.37, 561–566 (2019)

  28. [28]

    & Knight, J

    Trowsdale, J. & Knight, J. C. Major histocompatibil- ity complex genomics and human disease.Annu. Rev. Genomics Hum. Genet.14, 301–323 (2013)

  29. [29]

    Adv.7, eabe7953 (2021)

    Goto, H., Endo, K.et al.High-performance combinatorial optimization based on classical mechanics.Sci. Adv.7, eabe7953 (2021)

  30. [30]

    Goto, H., Hidaka, R.et al.Edge-of-chaos-enhanced quantum-inspired algorithm for combinatorial optimiza- tion.Phys. Rev. Appl.25, 044011 (2026)

  31. [31]

    Phys.(2026)

    Tao, X.-Z., Zeng, Q.-G.et al.Tabu-enhanced simulated bifurcation for combinatorial optimization.Commun. Phys.(2026)

  32. [32]

    Dunham, I., Hunt, A.et al.The DNA sequence of human chromosome 22.Nature402, 489–495 (1999)

  33. [33]

    & Sedlazeck, F

    Majidian, S. & Sedlazeck, F. J. PhaseME: Automatic rapid assessment of phasing quality and phasing improve- ment.GigaScience9, giaa078 (2020)

  34. [34]

    Biotechnol.39, 309–312 (2021)

    Garg, S., Fungtammasan, A.et al.Chromosome-scale, haplotype-resolved assembly of human genomes.Nat. Biotechnol.39, 309–312 (2021)

  35. [35]

    Biotechnol.39, 302–308 (2021)

    Porubsky, D., Ebert, P.et al.Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.Nat. Biotechnol.39, 302–308 (2021)

  36. [36]

    Koren, S., Bao, Z.et al.Gapless assembly of complete human and plant chromosomes using only nanopore se- quencing.Genome Res.34, 1919–1930 (2024)

  37. [37]

    R., Yee, M.-C.et al.HLA-Resolve: High- resolution HLA haplotyping using long-read hybrid cap- ture.medRxiv2026–03 (2026)

    Glasenapp, M. R., Yee, M.-C.et al.HLA-Resolve: High- resolution HLA haplotyping using long-read hybrid cap- ture.medRxiv2026–03 (2026)

  38. [38]

    Xu, X., Cui, J.et al.MindSpore Quantum: a user-friendly, high-performance, and AI-compatible quantum comput- ing framework.arXiv preprint arXiv:2406.17248(2024). 15/15