pith. sign in

arxiv: 2601.01860 · v2 · pith:OK3PRUHJnew · submitted 2026-01-05 · 💻 cs.LG · quant-ph

High-Order Epistasis Detection Using Factorization Machine with Quadratic Optimization Annealing and MDR-Based Evaluation

Pith reviewed 2026-05-16 17:33 UTC · model grok-4.3

classification 💻 cs.LG quant-ph
keywords epistasis detectionfactorization machinequadratic optimization annealingmultifactor dimensionality reductionblack-box optimizationhigh-order interactionsgenetic association studiessimulated case-control data
0
0 comments X

The pith

Factorization machine quadratic annealing recovers ground-truth high-order epistasis by optimizing MDR error rates as a black-box objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames high-order epistasis detection as a black-box optimization problem whose objective is the classification error rate produced by multifactor dimensionality reduction. It solves the resulting search with factorization machine quadratic optimization annealing so that the combinatorial explosion of locus combinations need not be enumerated. A sympathetic reader cares because exhaustive MDR evaluation quickly becomes infeasible once interaction order or locus count grows. On simulated case-control data containing known planted interactions the method locates the correct combinations across tested orders and locus numbers in a modest number of iterations.

Core claim

We define the epistasis detection problem as a black-box optimization problem and solve it with a factorization machine with quadratic-optimization annealing (FMQA). The classification error rate (CER) computed by MDR is used as a black-box objective function. Experimental evaluations were conducted using simulated case-control datasets with predefined high-order epistasis. The results demonstrate that the proposed method successfully identified ground-truth epistasis across various interaction orders and the numbers of genetic loci within a limited number of iterations.

What carries the argument

factorization machine with quadratic-optimization annealing (FMQA) that treats MDR classification error rate as a black-box objective to be minimized

If this is right

  • High-order interactions become searchable without enumerating every possible locus combination.
  • The number of loci and interaction order that can be examined grows beyond the reach of exhaustive MDR.
  • MDR error rates integrate directly into an iterative optimizer that converges in limited steps on simulated data.
  • Detection performance holds across multiple tested interaction orders and total locus counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same black-box formulation could be applied to other evaluation functions beyond MDR if they can be computed for candidate subsets.
  • Real-world genetic studies would still need separate validation on data with unknown ground truth to confirm transfer from simulation.
  • Hybrid pipelines that seed FMQA with prior biological knowledge could further reduce the number of required evaluations.

Load-bearing premise

Success on simulated data that contain artificially planted high-order epistasis will translate to real genetic data whose interaction patterns and noise structures are unknown.

What would settle it

The optimizer fails to recover the planted loci when run on a new collection of simulated datasets generated by the same process but with higher noise levels or altered interaction strengths.

Figures

Figures reproduced from arXiv: 2601.01860 by Shuta Kikuchi, Shu Tanaka.

Figure 1
Figure 1. Figure 1: An example of the case–control dataset. Each column corresponds [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Flow of the proposed method. The black arrows and gray triangles denote the flow for the proposed method and for CER computation using MDR, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Detecting high-order epistasis is a fundamental challenge in genetic association studies due to the combinatorial explosion of candidate locus combinations. Although multifactor dimensionality reduction (MDR) is a widely used method for evaluating epistasis, exhaustive MDR-based searches become computationally infeasible as the number of loci or the interaction order increases. In this paper, we define the epistasis detection problem as a black-box optimization problem and solve it with a factorization machine with quadratic-optimization annealing (FMQA). We propose an efficient epistasis detection method based on FMQA, in which the classification error rate (CER) computed by MDR is used as a black-box objective function. Experimental evaluations were conducted using simulated case-control datasets with predefined high-order epistasis. The results demonstrate that the proposed method successfully identified ground-truth epistasis across various interaction orders and the numbers of genetic loci within a limited number of iterations. These results indicate that the proposed method is effective and computationally efficient for high-order epistasis detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript frames high-order epistasis detection as a black-box combinatorial optimization problem and proposes solving it via factorization machine with quadratic optimization annealing (FMQA), using the classification error rate (CER) obtained from multifactor dimensionality reduction (MDR) as the objective function. Experiments on simulated case-control datasets containing predefined high-order epistatic interactions report that the method recovers the planted ground-truth combinations across varying interaction orders and numbers of loci within a limited iteration budget.

Significance. If the recovery performance is shown to be robust and superior to standard optimizers, the approach could offer a scalable heuristic for high-order interaction searches that are otherwise intractable by exhaustive MDR enumeration, potentially aiding genetic association studies.

major comments (3)
  1. [Abstract / Experimental evaluations] Abstract and Experimental evaluations section: the central claim that the method 'successfully identified ground-truth epistasis' is presented without any quantitative recovery rates, success fractions over repeated trials, mean or median iteration counts, or failure-mode statistics, leaving the empirical support for the claim unquantified.
  2. [Experimental evaluations] Experimental evaluations section: no baseline comparisons are reported against standard black-box optimizers (e.g., plain simulated annealing, genetic algorithms, or random sampling) or against exhaustive MDR where computationally feasible, so it is impossible to determine whether the observed recoveries are attributable to FMQA or to the simulation design.
  3. [Experimental evaluations] Experimental evaluations section: all reported results use only simulated datasets with artificially planted interactions; the absence of any real GWAS panel experiments means the method's behavior under realistic noise structures, linkage disequilibrium, and unknown interaction patterns remains untested.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions have been made to strengthen the empirical support and clarify the scope of the work.

read point-by-point responses
  1. Referee: [Abstract / Experimental evaluations] Abstract and Experimental evaluations section: the central claim that the method 'successfully identified ground-truth epistasis' is presented without any quantitative recovery rates, success fractions over repeated trials, mean or median iteration counts, or failure-mode statistics, leaving the empirical support for the claim unquantified.

    Authors: We agree that the original presentation lacked quantitative backing for the recovery claim. In the revised manuscript, we have added explicit metrics including success fractions (recovery rate over 50 repeated trials per setting), mean and median iteration counts to first recovery of the ground-truth combination, and failure-mode analysis (e.g., convergence to non-ground-truth local optima). These are now reported in the Experimental evaluations section and summarized in the abstract. revision: yes

  2. Referee: [Experimental evaluations] Experimental evaluations section: no baseline comparisons are reported against standard black-box optimizers (e.g., plain simulated annealing, genetic algorithms, or random sampling) or against exhaustive MDR where computationally feasible, so it is impossible to determine whether the observed recoveries are attributable to FMQA or to the simulation design.

    Authors: We acknowledge the need for baselines to isolate the contribution of FMQA. We have incorporated new experiments comparing FMQA against plain simulated annealing, genetic algorithms, and uniform random sampling on identical simulated datasets and iteration budgets. For the smallest instances (where exhaustive MDR remains tractable), we also report direct recovery comparisons against exhaustive enumeration. The revised results show FMQA achieving higher recovery rates and lower iteration counts than the baselines. revision: yes

  3. Referee: [Experimental evaluations] Experimental evaluations section: all reported results use only simulated datasets with artificially planted interactions; the absence of any real GWAS panel experiments means the method's behavior under realistic noise structures, linkage disequilibrium, and unknown interaction patterns remains untested.

    Authors: The study is deliberately scoped to simulated data with known ground-truth interactions to enable precise quantitative recovery evaluation. We do not include real GWAS results in the current manuscript, as such validation would require separate handling of unknown interactions, population structure, and linkage disequilibrium and is beyond the scope of this work. We have expanded the discussion section to explicitly note this limitation and identify real-data application as future research. revision: partial

standing simulated objections not resolved
  • Real GWAS panel experiments, as the manuscript is designed around controlled simulations with known ground truth and adding such experiments would require substantial new data access, preprocessing, and evaluation methodology.

Circularity Check

0 steps flagged

No circularity; MDR objective is independent of FMQA optimizer

full rationale

The paper frames epistasis detection as a black-box optimization task solved by FMQA, using classification error rate (CER) computed via the established MDR procedure as the objective function. This objective is supplied externally and is not derived from or fitted within the FMQA component. No equations reduce by construction to their own inputs, no parameters are fitted on a subset and then relabeled as predictions, and no load-bearing claims rest on self-citations whose content is unverified or equivalent to the present result. The reported success consists of empirical recovery on simulated datasets with planted interactions; this is an application of an external optimizer to an independent metric rather than a self-referential derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the standard use of MDR and FMQA.

pith-pipeline@v0.9.0 · 5469 in / 1024 out tokens · 27028 ms · 2026-05-16T17:33:13.923564+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding

    cs.LG 2026-05 unverdicted novelty 6.0

    Ensuring complete marginal bit coverage in initial data for one-hot encoded FMQA improves mean optimization performance on wing-shape benchmarks with 17 and 32 variables.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    An introduction to variable and feature selection,

    I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,”J. Mach. Learn. Res., vol. 3, no. Mar, pp. 1157–1182, 2003

  2. [2]

    A review of feature selection techniques in bioinformatics,

    Y . Saeys, I. Inza, and P. Larranaga, “A review of feature selection techniques in bioinformatics,”Bioinform., vol. 23, no. 19, pp. 2507– 2517, 2007

  3. [3]

    Research techniques made simple: feature selection for biomarker discovery,

    R. Torres and R. L. Judson-Torres, “Research techniques made simple: feature selection for biomarker discovery,”J. Invest. Dermatol., vol. 139, no. 10, pp. 2068–2074, 2019

  4. [4]

    Genetic interactions involving five or more genes contribute to a complex trait in yeast,

    M. B. Taylor and I. M. Ehrenreich, “Genetic interactions involving five or more genes contribute to a complex trait in yeast,”PLOS Genet., vol. 10, no. 5, p. e1004324, 2014

  5. [5]

    Higher-order genetic interactions and their contribution to com- plex traits,

    ——, “Higher-order genetic interactions and their contribution to com- plex traits,”Trends. Genet., vol. 31, no. 1, pp. 34–40, 2015

  6. [6]

    A survey about methods dedicated to epistasis detection,

    C. Niel, C. Sinoquet, C. Dina, and G. Rocheleau, “A survey about methods dedicated to epistasis detection,”Front. Genet., vol. 6, p. 285, 2015

  7. [7]

    Considerations in the search for epistasis,

    M. Balvert, J. Cooper-Knock, J. Stamp, R. P. Byrne, S. Mourragui, J. van Gils, S. Benonisdottir, J. Schl ¨uter, K. Kenna, S. Abelnet al., “Considerations in the search for epistasis,”Genome Biol., vol. 25, no. 1, p. 296, 2024

  8. [8]

    Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer,

    M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, and J. H. Moore, “Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer,”Am. J. Hum. Genet., vol. 69, pp. 138–147, 2001

  9. [9]

    Performance analysis of novel methods for detecting epistasis,

    J. Shang, J. Zhang, Y . Sun, D. Liu, D. Ye, and Y . Yin, “Performance analysis of novel methods for detecting epistasis,”BMC Bioinform., vol. 12, no. 1, p. 475, 2011

  10. [10]

    An efficiency analysis of high-order combinations of gene–gene interactions using multifactor-dimensionality reduction,

    C.-H. Yang, Y .-D. Lin, C.-S. Yang, and L.-Y . Chuang, “An efficiency analysis of high-order combinations of gene–gene interactions using multifactor-dimensionality reduction,”BMC Genomics, vol. 16, no. 1, p. 489, 2015

  11. [11]

    Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases,

    J. H. Moore, P. C. Andrews, R. S. Olson, S. E. Carlson, C. R. Larock, M. J. Bulhoes, J. P. O’Connor, E. M. Greytak, and S. L. Armentrout, “Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases,”BioData Min., vol. 10, no. 1, p. 19, 2017

  12. [12]

    Designing metamaterials with quantum annealing and factorization machines,

    K. Kitai, J. Guo, S. Ju, S. Tanaka, K. Tsuda, J. Shiomi, and R. Tamura, “Designing metamaterials with quantum annealing and factorization machines,”Phys. Rev. Res., vol. 2, 2020, Art. no. 013319

  13. [13]

    Factorization machines,

    S. Rendle, “Factorization machines,” inProc. IEEE Int. Conf. Data Min. IEEE, 2010, pp. 995–1000

  14. [14]

    Ising machines as hardware solvers of combinatorial optimization problems,

    N. Mohseni, P. L. McMahon, and T. Byrnes, “Ising machines as hardware solvers of combinatorial optimization problems,”Nat. Rev. Phys., vol. 4, no. 6, pp. 363–379, 2022

  15. [15]

    Effectiveness of hybrid op- timization method for quantum annealing machines,

    S. Kikuchi, N. Togawa, and S. Tanaka, “Effectiveness of hybrid op- timization method for quantum annealing machines,”arXiv preprint arXiv:2507.15544, 2025

  16. [16]

    Optimization by simulated annealing,

    S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,”Science, vol. 220, no. 4598, pp. 671–680, 1983

  17. [17]

    Quantum annealing in the transverse Ising model,

    T. Kadowaki and H. Nishimori, “Quantum annealing in the transverse Ising model,”Phys. Rev. E, vol. 58, pp. 5355–5363, 1998

  18. [18]

    Black-box optimization using factorization and Ising machines

    R. Tamura, Y . Seki, Y . Minamoto, K. Kitai, Y . Matsuda, S. Tanaka, and K. Tsuda, “Black-box optimization using factorization and ising machines,”arXiv preprint arXiv:2507.18003, 2025

  19. [19]

    Optimization perfor- mance of factorization machine with annealing under limited training data,

    M. Nakano, Y . Seki, S. Kikuchi, and S. Tanaka, “Optimization perfor- mance of factorization machine with annealing under limited training data,”arXiv preprint arXiv:2507.21024, 2025

  20. [20]

    Towards optimization of photonic-crystal surface-emitting lasers via quantum annealing,

    T. Inoue, Y . Seki, S. Tanaka, N. Togawa, K. Ishizaki, and S. Noda, “Towards optimization of photonic-crystal surface-emitting lasers via quantum annealing,”Opt. Express, vol. 30, no. 24, pp. 43 503–43 512, 2022

  21. [21]

    Quantum annealing designs nonhemolytic antimicrobial peptides in a discrete latent space,

    A. Tucs, F. Berenger, A. Yumoto, R. Tamura, T. Uzawa, and K. Tsuda, “Quantum annealing designs nonhemolytic antimicrobial peptides in a discrete latent space,”ACS Med. Chem. Lett., vol. 14, no. 5, pp. 577– 582, 2023

  22. [22]

    A feasibility study for quantum com- puting methodologies in automotive advanced material investigation,

    Y . Suga, A. Maruo, and H. Jippo, “A feasibility study for quantum com- puting methodologies in automotive advanced material investigation,” Trans. Soc. Automot. Eng. Jpn., vol. 55, no. 3, 2024

  23. [23]

    Simultaneous structure design optimization of multiple car models using fmqa,

    T. Kondo, T. Kohira, and Y . Minamoto, “Simultaneous structure design optimization of multiple car models using fmqa,”Trans. Soc. Automot. Eng. Jpn., vol. 56, no. 2, 2025

  24. [24]

    A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction,

    D. R. Velez, B. C. White, A. A. Motsinger, W. S. Bush, M. D. Ritchie, S. M. Williams, and J. H. Moore, “A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction,”Genet. Epidemiol., vol. 31, no. 4, pp. 306– 315, 2007

  25. [25]

    MDR-ER: balancing functions for adjusting the ratio in risk classes and classification errors for imbalanced cases and controls using multifactor- dimensionality reduction,

    C.-H. Yang, Y .-D. Lin, L.-Y . Chuang, J.-B. Chen, and H.-W. Chang, “MDR-ER: balancing functions for adjusting the ratio in risk classes and classification errors for imbalanced cases and controls using multifactor- dimensionality reduction,”PLOS ONE, vol. 8, no. 11, p. e79387, 2013

  26. [26]

    Application of QUBO solver using black-box optimization to structural design for resonance avoidance,

    T. Matsumori, M. Taki, and T. Kadowaki, “Application of QUBO solver using black-box optimization to structural design for resonance avoidance,”Sci. Rep., vol. 12, 2022, Art. no. 12143

  27. [27]

    Toxo: a library for calculating penetrance tables of high-order epistasis models,

    C. Ponte-Fern ´andez, J. Gonz ´alez-Dom´ınguez, A. Carvajal-Rodr ´ıguez, and M. J. Mart ´ın, “Toxo: a library for calculating penetrance tables of high-order epistasis models,”BMC Bioinform., vol. 21, 2020, Art. no. 138

  28. [28]

    GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures,

    R. J. Urbanowicz, J. Kiralis, N. A. Sinnott-Armstrong, T. Heberling, J. M. Fisher, and J. H. Moore, “GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures,” BioData Min., vol. 5, 2012, Art. no. 16

  29. [29]

    Fixstars Amplify Annealing Engine: Fixstars Amplify,

    “Fixstars Amplify Annealing Engine: Fixstars Amplify,” [https://amplify. fixstars.com/en/]