SpliceCombo: A Hybrid Technique efficiently use for Principal Component Analysis of Splice Site Prediction

Soumen Kanrar; Srabanti Maji

arxiv: 1907.09401 · v1 · pith:CBKL23QUnew · submitted 2019-07-19 · 🧬 q-bio.GN

SpliceCombo: A Hybrid Technique efficiently use for Principal Component Analysis of Splice Site Prediction

Srabanti Maji , Soumen Kanrar This is my paper

Pith reviewed 2026-05-24 18:58 UTC · model grok-4.3

classification 🧬 q-bio.GN

keywords splice site predictionprincipal component analysiscase-based reasoningsupport vector machinegene predictionhybrid modeldonor siteacceptor site

0 comments

The pith

A three-stage hybrid pipeline of PCA feature extraction, case-based reasoning selection, and polynomial SVM classification predicts donor splice sites at 97.25 percent sensitivity and 97.46 percent specificity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SpliceCombo as a method to raise the accuracy of splice site detection, the key step that separates exons from introns in eukaryotic gene sequences. It processes input DNA through principal component analysis to reduce and extract features, applies case-based reasoning to select the most useful ones, and finishes with a support vector machine that uses a polynomial kernel for the final donor or acceptor classification. The authors state that this combination produces higher prediction accuracies than earlier models. A sympathetic reader would care because reliable splice site calls directly improve the reconstruction of gene structures from raw genomic data.

Core claim

SpliceCombo improves splice site prediction by combining PCA-based feature extraction, case-based reasoning for feature selection, and polynomial-kernel SVM classification, achieving 97.25 percent sensitivity and 97.46 percent specificity for donor sites and 96.51 percent sensitivity and 94.48 percent specificity for acceptor sites.

What carries the argument

The three-stage SpliceCombo pipeline that extracts features via principal component analysis, selects them via case-based reasoning, and classifies via polynomial-kernel support vector machine.

Load-bearing premise

The claim that the pipeline outperforms other methods rests on the assumption that the chosen training and test data, baseline comparisons, and validation procedure do not systematically favor the new combination.

What would settle it

Running the identical three-stage pipeline on an independent public splice-site benchmark and obtaining sensitivities below 90 percent would show that the reported gains do not hold.

read the original abstract

The primary step in search of the gene prediction is an identification of the coding region from genomic DNA sequence. Gene structure in the case of a eukaryotic organism is composed of promoter, intron, start codon, exons, stop codon, etc. Splice site prediction, which separates the junction between exon and intron, though the sequence beside. The splice sites have huge preservation, however, the precision of the tool exhibits less than 90%. The main objective of this work to exhibits a hybrid technique that efficiently improves the existing gene recognition technique. Therefore to enhance the identification of splice sites, the respective algorithm needs to be improved. Over the last decade, the researcher paid more attention to improve the accuracy of a predicted model in this domain. Our proposed method, SpliceCombo involves three stages. At initial stage, which considers the principal Component Analysis, based on the feature extracted. In the intermediate stage, i.e.,, the second stage Case- Based Reasoning is done, i.e., feature selection. The third stage uses support vector machine based along with polynomial kernel function for final classification. In comparison with other methods, the proposed SpliceCombo model outperforms other prediction models with respect to prediction accuracies. Particularly for donor splice site the methodology exhibits sensitivity is 97.25% accurate and specificity is 97.46% accurate. For acceptor Splice Site the sensitivity is 96.51% and Specificity is 94.48% correct.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SpliceCombo claims 97% range accuracies on splice sites from a PCA-CBR-polynomial SVM stack but supplies no dataset, splits, baselines, or CV protocol to support the numbers.

read the letter

The main thing to know is that this paper states strong numerical gains for donor and acceptor splice site prediction but gives no experimental details that would let anyone check those gains. The central claim therefore cannot be evaluated from what is written. The method itself is a three-stage pipeline: PCA on sequence features, case-based reasoning for selection, and a polynomial SVM for classification. That combination is the only concrete contribution. It applies standard tools in sequence to the splice site task, which is a narrow but established sub-problem in gene finding. No new mathematics or first-principles derivation appears. The paper does at least target a real bottleneck where older tools often sit below 90% accuracy, and the authors cite prior work on hybrids in the area. The soft spots are large and load-bearing. The abstract and results give no corpus name or size, no train-test partition sizes or seed, no list of comparator algorithms with their scores on the same data, and no description of how the number of principal components or the SVM degree and C were chosen. Without those controls the reported sensitivities and specificities cannot be attributed to the pipeline rather than to data selection. The free parameters are left without an independent tuning protocol. A reader already working on computational splice site tools might skim the pipeline description for ideas, but the missing controls make the performance claims unusable for citation or extension. I would not send this to peer review. The experimental section needs concrete data, splits, tables, and a reproducible protocol before any referee could assess whether the hybrid actually improves on existing methods.

Referee Report

2 major / 2 minor

Summary. The paper proposes SpliceCombo, a three-stage hybrid pipeline for splice-site prediction consisting of PCA-based feature extraction, case-based reasoning for feature selection, and polynomial-kernel SVM classification. It claims that this method outperforms prior approaches, reporting donor-site sensitivity 97.25 % and specificity 97.46 %, and acceptor-site sensitivity 96.51 % and specificity 94.48 %.

Significance. A rigorously validated hybrid method that demonstrably improves splice-site accuracy on standard corpora would be useful for eukaryotic gene annotation pipelines. The manuscript, however, supplies none of the experimental controls required to substantiate the numerical claims, so no assessment of significance is possible at present.

major comments (2)

[Abstract] Abstract: the headline performance figures (97.25 % / 97.46 % donor; 96.51 % / 94.48 % acceptor) are presented without any description of the underlying splice-site corpus, its size, the train/test partition, the cross-validation protocol, or the exact list of comparator algorithms together with their scores on the identical partition. These omissions render the superiority claim unverifiable.
[Abstract] Abstract: the pipeline contains multiple free parameters (number of retained principal components, CBR case-base size, SVM regularization C and polynomial degree) whose selection procedure is not described. Without evidence that these choices were made independently of the reported test numbers, the accuracy margins cannot be attributed to the method rather than to data-dependent tuning.

minor comments (2)

[Abstract] Abstract: the sentence 'i.e.,, the second stage' contains a duplicated comma.
[Abstract] Abstract: the phrasing 'exhibits sensitivity is 97.25% accurate' is grammatically awkward and should be reworded for clarity (e.g., 'achieves a sensitivity of 97.25 %').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate planned changes to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the headline performance figures (97.25 % / 97.46 % donor; 96.51 % / 94.48 % acceptor) are presented without any description of the underlying splice-site corpus, its size, the train/test partition, the cross-validation protocol, or the exact list of comparator algorithms together with their scores on the identical partition. These omissions render the superiority claim unverifiable.

Authors: We agree that these details are required for verification. In the revised manuscript we will expand the abstract and add a methods subsection specifying the splice-site corpus (source, size, and composition), the train/test partition, the cross-validation protocol, and a table of comparator algorithms evaluated on the identical partition. revision: yes
Referee: [Abstract] Abstract: the pipeline contains multiple free parameters (number of retained principal components, CBR case-base size, SVM regularization C and polynomial degree) whose selection procedure is not described. Without evidence that these choices were made independently of the reported test numbers, the accuracy margins cannot be attributed to the method rather than to data-dependent tuning.

Authors: We accept the point. The revised manuscript will describe the parameter selection procedure, including the use of inner cross-validation on the training set to choose the number of principal components, CBR case-base size, SVM C, and polynomial degree, thereby separating tuning from final test evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims are not derivations

full rationale

The paper presents an empirical three-stage pipeline (PCA feature extraction, case-based reasoning for feature selection, polynomial SVM classification) and reports measured accuracies (e.g., 97.25% sensitivity for donor sites). These numbers are outcomes of applying the method to data, not inputs, self-definitions, or quantities forced by construction. No equations, uniqueness theorems, or ansatzes are described that reduce to their own inputs. No self-citations are invoked as load-bearing justification for the central claim. The performance figures are standard empirical results whose validity depends on unreported experimental details (data, splits, baselines), but that is a reproducibility issue, not circularity in the derivation chain.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central performance claim rests on standard machine-learning assumptions (SVM margin maximization works for sequence classification, PCA preserves relevant variance, case-based reasoning selects informative neighbors) plus the unstated premise that the chosen data split and hyperparameter search do not favor the proposed pipeline. No new entities are postulated.

free parameters (2)

number of retained principal components
Chosen to reduce feature space before case-based reasoning; value not stated in abstract.
SVM regularization parameter C and polynomial degree
Standard kernel hyperparameters whose selection affects the reported sensitivity and specificity.

axioms (2)

domain assumption Principal component analysis on sequence-derived features yields a lower-dimensional representation that retains splice-site discriminative information.
Invoked in the first stage of the pipeline.
domain assumption Case-based reasoning can reliably rank and retain the most predictive features from the PCA output.
Invoked in the second stage.

pith-pipeline@v0.9.0 · 5796 in / 1486 out tokens · 33763 ms · 2026-05-24T18:58:25.831817+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

Nature 431(7011):931-945

Collins F, Lander E, Rogers J, Waterston R, Conso I (2004) Finishing the euchromatic sequence of the human genome. Nature 431(7011):931-945

work page 2004
[2]

Journal of Applied Sciences 12(15):1518

Maji S, Garg D(2012) Gene Finding Using Hidden Markov Model. Journal of Applied Sciences 12(15):1518

work page 2012
[3]

Current Bioinformatics 8(2):226-243

Maji S, Garg D (2013) Progress in gene prediction : principles and challenges. Current Bioinformatics 8(2):226-243

work page 2013
[4]

Current Bioinformatics 8(3):369-379

Maji S, Garg D (2013) Hidden markov model for splicing junction sites identification in DNA sequences. Current Bioinformatics 8(3):369-379

work page 2013
[5]

Nucleic acids research 28(21):4364-4375

Burset M, Seledtsov I, Solovyev V (2000) A nalysis of canonical and non -canonical splice sites in mammalian genomes. Nucleic acids research 28(21):4364-4375

work page 2000
[6]

COLD SPRING HARBOR MONOGRAPH SERIES 37:525-560

Burge CB, Tuschl T, Sharp PA (1999) Splicing of precursors to mRNAs by the spliceosomes. COLD SPRING HARBOR MONOGRAPH SERIES 37:525-560

work page 1999
[7]

Journal of molecular biology 268(1):78-94

Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. Journal of molecular biology 268(1):78-94

work page 1997
[8]

Current Bioinformatics 9(1):76-85

Maji S, Garg D (2014) Hybrid approach using SVM and MM2 in splice site junction identification. Current Bioinformatics 9(1):76-85

work page 2014
[9]

Computers & chemistry 26(1):51-56

Reese MG (2001) Application of a time -delay neural network to promoter annotation in the Drosophila melanogaster genome. Computers & chemistry 26(1):51-56. 24

work page 2001
[10]

Journal of computational biology 4(3):311-323

Reese MG, Eeckman FH, Kulp D, Haussler D (1997) Improved splice site detection in Genie. Journal of computational biology 4(3):311-323

work page 1997
[11]

Computer applications in the biosciences: CABIOS 13(4):365-376

Salzberg SL (1997) A method for identifying splice sites and translational start sites in eukaryotic mRNA. Computer applications in the biosciences: CABIOS 13(4):365-376

work page 1997
[12]

Bioinformatics 21(8):1332-1338

Degroeve S, Saeys Y, De Baets B, Rouzé P, Van de Peer Y (2005) SpliceMachine: predicting splice sites from high-dimensional local context representations. Bioinformatics 21(8):1332-1338

work page 2005
[13]

Computers in biology and medicine 33(1):17-29

Sun Y-F, Fan X -D, Li Y -D (2003) Identifying splicing sites in eukaryotic RNA: support vector machine approach. Computers in biology and medicine 33(1):17-29

work page 2003
[14]

Geno me Research 13(12):2637 - 2650

Zhang XH, Heller KA, Hefter I, Leslie CS, Chasin LA (2003) Sequence information for the splicing of human pre -mRNA identified by support vector machine classification. Geno me Research 13(12):2637 - 2650

work page 2003
[15]

Nucleic acids research 29(5):1185-1190

Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic acids research 29(5):1185-1190

work page 2001
[16]

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 2(2):131-142

Rajapakse JC, Ho LS (2005) Markov encoding for detecting signals in geno mic sequences. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 2(2):131-142

work page 2005
[17]

Bioinformatics 18(suppl 2):S27-S34

Arita M, Tsuda K, Asai K (2002) Modeling splicing sites with pairwise correlations. Bioinformatics 18(suppl 2):S27-S34

work page 2002
[18]

Bioinformatics 22(1):13-20

Zhang M, Gish W (2006) Im proved spliced alignment from an information theoretic approach. Bioinformatics 22(1):13-20

work page 2006
[19]

Nucleic acids research 24(17):3439-3452

Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouzé P, Brunak S (1996) Splice site prediction in Arabidopsis thaliana pre -mRNA by combining local and g lobal sequence information. Nucleic acids research 24(17):3439-3452

work page 1996
[20]

In: Proc Int Conf on Intelligent Systems for Molecular Biology, St Louis: 134-142

Haussler DKD, Eeckman MGRFH (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In: Proc Int Conf on Intelligent Systems for Molecular Biology, St Louis: 134-142

work page 1996
[21]

Wiley interdisciplinary reviews: computational statistics 2(4):433-459

Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2(4):433-459

work page 2010
[22]

Bioinformatics 21(4):471-482

Chen T-M, Lu C-C, Li W-H (2005) Prediction of splice sites with dependency graphs and their expanded bayesian networks. Bioinformatics 21(4):471-482

work page 2005
[23]

Genome research 10(4):483-501

Reese MG, Hartzell G, Harris NL, Ohler U, Abril JF, Lewis SE (2000) Genome annotation assessment in Drosophila melanogaster. Genome research 10(4):483-501

work page 2000
[24]

Genome Research 10(4):529-538

Reese MG, Kulp D, Tammana H, Haussler D (2000 ) Genie—gene finding in Drosophila melanogaster. Genome Research 10(4):529-538

work page 2000
[25]

Nature medicine 7(6):673

Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C (2001) Classification and diagnostic prediction of cancers usin g gene expression profiling and artificial neural networks. Nature medicine 7(6):673

work page 2001
[26]

Jolliffe I (2002) Principal component analysis: Wiley Online Library

work page 2002
[27]

Bioinformatics 23(19):2528-2535

Noy K, Fasulo D (2007) Improved model -based, platform -independent feature extraction for mass spectrometry. Bioinformatics 23(19):2528-2535

work page 2007
[28]

BMC bioinformatics 6(1):115

Hibbs MA, Dirksen NC, Li K, Troyanskaya OG (2005) Visualization methods for statistical analysis of microarray clusters. BMC bioinformatics 6(1):115

work page 2005
[29]

Hogg RV, McKean J, Craig AT (2005) Introduction to mathematical statistics: Pearson Education

work page 2005
[30]

Machine learning 61(1):129-150

Neumann J, Schnörr C, Steidl G (2005) Combined SVM -based feature selection and classification. Machine learning 61(1):129-150

work page 2005
[31]

AI communications 7(1):39-59

Aamodt A, Plaza E (1994) Case -based reasoning: Foundational issues, m ethodological variations, and system approaches. AI communications 7(1):39-59

work page 1994
[32]

Cunningham P, Delany SJ (2007) Featureless Similarity

work page 2007
[33]

IEEE Transactions on Neural networks 10(5):1048-1054

Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Transactions on Neural networks 10(5):1048-1054

work page 1999
[34]

In.: Taylor & Francis

Sain SR (1996) The nature of statistical learning theory. In.: Taylor & Francis. 25

work page 1996
[35]

PLoS Comput Biol 4(10):e1000173

Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G (2008) Support vector machines and kernels for computational biology. PLoS Comput Biol 4(10):e1000173

work page 2008
[36]

International journal of data mining and bioinformatics 4(3):348-355

Zhang Y, Wang D, Li T (2010) LIBGS: A MATLAB software package for gene selection. International journal of data mining and bioinformatics 4(3):348-355

work page 2010
[37]

Journal of computational biology 11(2-3):377-394

Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of computational biology 11(2-3):377-394

work page 2004
[38]

Statistics surveys 4:40-79

Arlot S, Celisse A (2010) A survey of cross -validation procedures for model selection. Statistics surveys 4:40-79

work page 2010
[39]

Bioinformatics 30(12):i105-i112

Bernau C, Riester M, Boulesteix A -L, Parmigiani G, Huttenhower C, Waldron L, Trippa L (2014) Cross - study validation for the assessment of prediction algorithms. Bioinformatics 30(12):i105-i112

work page 2014

[1] [1]

Nature 431(7011):931-945

Collins F, Lander E, Rogers J, Waterston R, Conso I (2004) Finishing the euchromatic sequence of the human genome. Nature 431(7011):931-945

work page 2004

[2] [2]

Journal of Applied Sciences 12(15):1518

Maji S, Garg D(2012) Gene Finding Using Hidden Markov Model. Journal of Applied Sciences 12(15):1518

work page 2012

[3] [3]

Current Bioinformatics 8(2):226-243

Maji S, Garg D (2013) Progress in gene prediction : principles and challenges. Current Bioinformatics 8(2):226-243

work page 2013

[4] [4]

Current Bioinformatics 8(3):369-379

Maji S, Garg D (2013) Hidden markov model for splicing junction sites identification in DNA sequences. Current Bioinformatics 8(3):369-379

work page 2013

[5] [5]

Nucleic acids research 28(21):4364-4375

Burset M, Seledtsov I, Solovyev V (2000) A nalysis of canonical and non -canonical splice sites in mammalian genomes. Nucleic acids research 28(21):4364-4375

work page 2000

[6] [6]

COLD SPRING HARBOR MONOGRAPH SERIES 37:525-560

Burge CB, Tuschl T, Sharp PA (1999) Splicing of precursors to mRNAs by the spliceosomes. COLD SPRING HARBOR MONOGRAPH SERIES 37:525-560

work page 1999

[7] [7]

Journal of molecular biology 268(1):78-94

Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. Journal of molecular biology 268(1):78-94

work page 1997

[8] [8]

Current Bioinformatics 9(1):76-85

Maji S, Garg D (2014) Hybrid approach using SVM and MM2 in splice site junction identification. Current Bioinformatics 9(1):76-85

work page 2014

[9] [9]

Computers & chemistry 26(1):51-56

Reese MG (2001) Application of a time -delay neural network to promoter annotation in the Drosophila melanogaster genome. Computers & chemistry 26(1):51-56. 24

work page 2001

[10] [10]

Journal of computational biology 4(3):311-323

Reese MG, Eeckman FH, Kulp D, Haussler D (1997) Improved splice site detection in Genie. Journal of computational biology 4(3):311-323

work page 1997

[11] [11]

Computer applications in the biosciences: CABIOS 13(4):365-376

Salzberg SL (1997) A method for identifying splice sites and translational start sites in eukaryotic mRNA. Computer applications in the biosciences: CABIOS 13(4):365-376

work page 1997

[12] [12]

Bioinformatics 21(8):1332-1338

Degroeve S, Saeys Y, De Baets B, Rouzé P, Van de Peer Y (2005) SpliceMachine: predicting splice sites from high-dimensional local context representations. Bioinformatics 21(8):1332-1338

work page 2005

[13] [13]

Computers in biology and medicine 33(1):17-29

Sun Y-F, Fan X -D, Li Y -D (2003) Identifying splicing sites in eukaryotic RNA: support vector machine approach. Computers in biology and medicine 33(1):17-29

work page 2003

[14] [14]

Geno me Research 13(12):2637 - 2650

Zhang XH, Heller KA, Hefter I, Leslie CS, Chasin LA (2003) Sequence information for the splicing of human pre -mRNA identified by support vector machine classification. Geno me Research 13(12):2637 - 2650

work page 2003

[15] [15]

Nucleic acids research 29(5):1185-1190

Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic acids research 29(5):1185-1190

work page 2001

[16] [16]

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 2(2):131-142

Rajapakse JC, Ho LS (2005) Markov encoding for detecting signals in geno mic sequences. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 2(2):131-142

work page 2005

[17] [17]

Bioinformatics 18(suppl 2):S27-S34

Arita M, Tsuda K, Asai K (2002) Modeling splicing sites with pairwise correlations. Bioinformatics 18(suppl 2):S27-S34

work page 2002

[18] [18]

Bioinformatics 22(1):13-20

Zhang M, Gish W (2006) Im proved spliced alignment from an information theoretic approach. Bioinformatics 22(1):13-20

work page 2006

[19] [19]

Nucleic acids research 24(17):3439-3452

Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouzé P, Brunak S (1996) Splice site prediction in Arabidopsis thaliana pre -mRNA by combining local and g lobal sequence information. Nucleic acids research 24(17):3439-3452

work page 1996

[20] [20]

In: Proc Int Conf on Intelligent Systems for Molecular Biology, St Louis: 134-142

Haussler DKD, Eeckman MGRFH (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In: Proc Int Conf on Intelligent Systems for Molecular Biology, St Louis: 134-142

work page 1996

[21] [21]

Wiley interdisciplinary reviews: computational statistics 2(4):433-459

Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2(4):433-459

work page 2010

[22] [22]

Bioinformatics 21(4):471-482

Chen T-M, Lu C-C, Li W-H (2005) Prediction of splice sites with dependency graphs and their expanded bayesian networks. Bioinformatics 21(4):471-482

work page 2005

[23] [23]

Genome research 10(4):483-501

Reese MG, Hartzell G, Harris NL, Ohler U, Abril JF, Lewis SE (2000) Genome annotation assessment in Drosophila melanogaster. Genome research 10(4):483-501

work page 2000

[24] [24]

Genome Research 10(4):529-538

Reese MG, Kulp D, Tammana H, Haussler D (2000 ) Genie—gene finding in Drosophila melanogaster. Genome Research 10(4):529-538

work page 2000

[25] [25]

Nature medicine 7(6):673

Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C (2001) Classification and diagnostic prediction of cancers usin g gene expression profiling and artificial neural networks. Nature medicine 7(6):673

work page 2001

[26] [26]

Jolliffe I (2002) Principal component analysis: Wiley Online Library

work page 2002

[27] [27]

Bioinformatics 23(19):2528-2535

Noy K, Fasulo D (2007) Improved model -based, platform -independent feature extraction for mass spectrometry. Bioinformatics 23(19):2528-2535

work page 2007

[28] [28]

BMC bioinformatics 6(1):115

Hibbs MA, Dirksen NC, Li K, Troyanskaya OG (2005) Visualization methods for statistical analysis of microarray clusters. BMC bioinformatics 6(1):115

work page 2005

[29] [29]

Hogg RV, McKean J, Craig AT (2005) Introduction to mathematical statistics: Pearson Education

work page 2005

[30] [30]

Machine learning 61(1):129-150

Neumann J, Schnörr C, Steidl G (2005) Combined SVM -based feature selection and classification. Machine learning 61(1):129-150

work page 2005

[31] [31]

AI communications 7(1):39-59

Aamodt A, Plaza E (1994) Case -based reasoning: Foundational issues, m ethodological variations, and system approaches. AI communications 7(1):39-59

work page 1994

[32] [32]

Cunningham P, Delany SJ (2007) Featureless Similarity

work page 2007

[33] [33]

IEEE Transactions on Neural networks 10(5):1048-1054

Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Transactions on Neural networks 10(5):1048-1054

work page 1999

[34] [34]

In.: Taylor & Francis

Sain SR (1996) The nature of statistical learning theory. In.: Taylor & Francis. 25

work page 1996

[35] [35]

PLoS Comput Biol 4(10):e1000173

Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G (2008) Support vector machines and kernels for computational biology. PLoS Comput Biol 4(10):e1000173

work page 2008

[36] [36]

International journal of data mining and bioinformatics 4(3):348-355

Zhang Y, Wang D, Li T (2010) LIBGS: A MATLAB software package for gene selection. International journal of data mining and bioinformatics 4(3):348-355

work page 2010

[37] [37]

Journal of computational biology 11(2-3):377-394

Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of computational biology 11(2-3):377-394

work page 2004

[38] [38]

Statistics surveys 4:40-79

Arlot S, Celisse A (2010) A survey of cross -validation procedures for model selection. Statistics surveys 4:40-79

work page 2010

[39] [39]

Bioinformatics 30(12):i105-i112

Bernau C, Riester M, Boulesteix A -L, Parmigiani G, Huttenhower C, Waldron L, Trippa L (2014) Cross - study validation for the assessment of prediction algorithms. Bioinformatics 30(12):i105-i112

work page 2014