A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data

Zhuphua Cao

arxiv: 2606.07128 · v1 · pith:BINVTSZMnew · submitted 2026-06-05 · 💻 cs.LG

A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data

Zhuphua Cao This is my paper

Pith reviewed 2026-06-27 22:24 UTC · model grok-4.3

classification 💻 cs.LG

keywords digit randomnessdata integrity screeningmachine learningnumerical data analysisfabrication detectionstatistical testsrisk scoringsemi-supervised learning

0 comments

The pith

A statistical and machine learning framework detects non-random digit patterns in raw numerical research data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the Fabrication-risk Digit Randomness Screening model to check decimal digit distributions in numerical datasets for irregularities that might indicate fabrication. It combines single-digit and joint-digit frequency tests, association measures, entropy and divergence calculations, digit preference scores, progressive subsampling, and machine learning to produce an overall risk grade. On a clean instrument dataset and a blinded simulated irregular dataset the model separated the two with high accuracy and assigned distinct grades. Real external datasets from questioned papers received higher grades than clean ones. The approach supplies an auxiliary screening step that can flag datasets for closer inspection.

Core claim

FDRS integrates single- and joint-decimal-digit tests, Cramer's V, entropy metrics, Kullback-Leibler divergence, digit-preference indices, progressive subsampling, and semi-supervised risk scoring. Evaluated on an enzymatic absorbance dataset and a manually simulated irregular dataset, Elastic-net Logistic Regression reached an AUC of 0.98395 while the irregular set received a markedly higher ensemble risk score and Grade 3 versus Grade 0 for the clean set. External real-world benchmarks aligned with the graded stratification.

What carries the argument

The Fabrication-risk Digit Randomness Screening (FDRS) model, which fuses multiple digit-randomness statistical tests with machine-learning classifiers to generate ensemble risk scores and grades.

If this is right

Clean datasets receive low ensemble risk scores and Grade 0 while irregular datasets receive higher scores and Grade 3.
Elastic-net Logistic Regression yields the highest AUC and lowest Brier score among the classifiers tested.
The framework can prioritize raw numerical datasets for further review as an auxiliary tool.
Datasets from articles with public post-publication concerns receive Grade 2 or 3 while clean datasets receive Grade 0 or 1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the separation generalizes, data repositories could apply the screening automatically on upload.
The progressive subsampling step might be adapted to monitor digit patterns during ongoing data collection.
Pairing digit-structure screening with existing checks on summary statistics could produce layered integrity pipelines.

Load-bearing premise

The manually simulated ErrData with introduced irregularities accurately represents the digit-pattern signatures that would appear in real cases of data fabrication or manipulation.

What would settle it

Running FDRS on a larger collection of published datasets where fabrication or manipulation has been independently confirmed by other means and checking whether the risk scores and grades separate them from matched clean datasets.

Figures

Figures reproduced from arXiv: 2606.07128 by Zhuphua Cao.

**Figure 1.** Figure 1: Overall workflow of the Fabrication-risk Digit Randomness Screening model. The FDRS framework was designed to screen non-random digit-pattern irregularities in raw numerical research data. Raw numerical values arranged in a single-column input file were first processed for decimal digit extraction. Single-decimal-digit distributions and multi-decimal joint digit combinations were then evaluated using multi… view at source ↗

**Figure 2.** Figure 2: Statistical and progressive evaluation of digit [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗

**Figure 3.** Figure 3: Semi-supervised machine-learning modeling and digit-pattern irregularity risk prediction. (A) Receiver operating characteristic curves of the machine-learning models in the internal validation set. AUC values are shown for Random Forest, Elastic-net Logistic Regression, SVM radial, Isolation Forest, and the integrated ensemble model. (B) Calibration performance of the machine-learning models, evaluated usi… view at source ↗

read the original abstract

Raw numerical datasets remain less systematically examined in integrity screening than images, plagiarism, or summary-statistic inconsistencies. We developed the Fabrication-risk Digit Randomness Screening model (FDRS), a statistical and machine-learning framework for detecting non-random digit-pattern irregularities in numerical research data. FDRS integrates single- and joint-decimal-digit tests, Cramer's V, entropy metrics, Kullback-Leibler divergence, digit-preference indices, progressive subsampling, and semi-supervised risk scoring. It was evaluated using an instrument-derived enzymatic absorbance dataset (RawData, n=253) and a blinded manually simulated irregular dataset (ErrData, n=255). RawData showed no significant deviation in single third-decimal-digit analysis, whereas ErrData showed a significant deviation. In joint third-fourth decimal digit analysis, ErrData showed higher Cramer's V, lower normalized entropy, higher KL divergence, and a more persistent progressive-subsampling deviation signal. In internal validation, Elastic-net Logistic Regression achieved the highest AUC (0.98395) and lowest Brier score (0.048439), while Random Forest achieved the highest accuracy (0.926667) and balanced accuracy (0.935). RawData received a low ensemble risk score of 0.124627 and was classified as Grade 0; ErrData received a score of 0.740760 and was classified as Grade 3. External real-world benchmarks supported graded risk stratification: three datasets without identified public post-publication concerns were classified as Grade 0 or 1, whereas two datasets from publicly questioned or institutionally handled articles were classified as Grade 2 or 3. FDRS can prioritize raw numerical datasets for further review by integrating interpretable statistical and machine-learning features. It is an auxiliary digit-structure screening tool, not standalone evidence of fabrication or misconduct.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FDRS combines standard digit tests with ML scoring and gets clean separation on a hand-made simulation, but that simulation's match to real fabrication patterns is the key untested assumption.

read the letter

The main thing here is that FDRS shows good separation between clean instrument data and a hand-crafted irregular set using a mix of digit stats and ML, but whether that separation would appear in real cases of data issues is the open question.

The paper combines single- and joint-decimal tests with Cramer's V, entropy, KL, and progressive subsampling, then uses those as features for models like elastic net logistic regression. It gets high AUC on the internal contrast and grades external sets in a way that matches public concerns. That's useful as an auxiliary screen.

It does well at packaging existing tools into a graded risk score with some interpretability. The external application is a plus.

The soft spot is the reliance on the simulated ErrData. The irregularities were introduced manually, so the feature profile might be specific to how it was made rather than typical of actual manipulation. Five external cases help but aren't enough to confirm generalization. No mention of code or full protocol details in the abstract.

This paper is for researchers building screening tools for numerical data integrity. It deserves peer review to sort out the simulation validity and see if more real-world validation can be added.

Referee Report

2 major / 2 minor

Summary. The paper presents the Fabrication-risk Digit Randomness Screening (FDRS) framework, which integrates single- and joint-decimal-digit frequency tests, Cramer's V, normalized entropy, Kullback-Leibler divergence, digit-preference indices, progressive subsampling, and semi-supervised ensemble risk scoring to flag non-random patterns in raw numerical research data. It evaluates the approach on an instrument-derived enzymatic absorbance dataset (RawData, n=253) versus a blinded manually simulated irregular dataset (ErrData, n=255), reports Elastic-net logistic regression achieving AUC 0.98395 on internal validation, assigns low risk (Grade 0) to RawData and high risk (Grade 3) to ErrData, and shows graded stratification on five external real-world benchmarks.

Significance. If the digit-pattern signatures engineered into the ErrData simulation prove representative of actual data manipulation and the ML evaluation avoids circularity, FDRS could provide a useful auxiliary, interpretable screening tool for prioritizing raw datasets for further integrity review. The combination of multiple statistical features with progressive subsampling and external benchmark results offers a concrete starting point for data-forensics methods in the numerical domain.

major comments (2)

[Abstract / evaluation] Abstract and evaluation section: the central performance claim (Elastic-net LR AUC 0.98395, ErrData risk score 0.740760 Grade 3 vs RawData 0.124627 Grade 0) rests on a single manually simulated ErrData set (n=255) whose construction details—specifically how the joint third-fourth decimal deviations, Cramer's V elevation, entropy drop, KL increase, and progressive-subsampling persistence were introduced—are not described, so it is impossible to assess whether these engineered signatures match patterns in documented real-world fabrication cases.
[Methods / internal validation] Methods / ML validation paragraph: the feature vector for the classifiers includes statistics (Cramer's V, entropy, KL divergence, digit-preference indices) computed directly on the same raw data being scored; without an explicit out-of-sample protocol, cross-validation scheme, or shipped code, the reported AUC and Brier score may reflect in-sample fitting rather than genuine detection of fabrication signatures.

minor comments (2)

[Abstract] The abstract states 'internal validation' but supplies no sample sizes for the train/test split, no error bars on AUC, and no exclusion criteria for the progressive subsampling; these details belong in the main text.
[Results / external benchmarks] External benchmark results are summarized only as 'three datasets ... Grade 0 or 1' and 'two datasets ... Grade 2 or 3'; listing the actual risk scores and grades for each of the five named datasets would strengthen the stratification claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the FDRS framework. Below we respond point-by-point to the major comments, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract / evaluation] Abstract and evaluation section: the central performance claim (Elastic-net LR AUC 0.98395, ErrData risk score 0.740760 Grade 3 vs RawData 0.124627 Grade 0) rests on a single manually simulated ErrData set (n=255) whose construction details—specifically how the joint third-fourth decimal deviations, Cramer's V elevation, entropy drop, KL increase, and progressive-subsampling persistence were introduced—are not described, so it is impossible to assess whether these engineered signatures match patterns in documented real-world fabrication cases.

Authors: We agree that additional detail on the ErrData simulation protocol is required for readers to judge how closely the introduced signatures align with documented fabrication patterns. In the revised manuscript we will expand the Methods section with a step-by-step description of the simulation procedure, including the specific manipulations used to generate the observed joint-digit deviations, elevated Cramer's V, reduced entropy, increased KL divergence, and persistent progressive-subsampling signal, while preserving the blinded character of the exercise. revision: yes
Referee: [Methods / internal validation] Methods / ML validation paragraph: the feature vector for the classifiers includes statistics (Cramer's V, entropy, KL divergence, digit-preference indices) computed directly on the same raw data being scored; without an explicit out-of-sample protocol, cross-validation scheme, or shipped code, the reported AUC and Brier score may reflect in-sample fitting rather than genuine detection of fabrication signatures.

Authors: The reported AUC was obtained via cross-validation, but the original text did not sufficiently document the protocol. We will revise the Methods section to specify the cross-validation design (including fold count and the manner in which features were recomputed within each training fold to prevent leakage), the separation between feature extraction and model evaluation, and the Brier-score calculation. We will also indicate that the analysis code will be made available upon acceptance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper computes a fixed set of statistical features (single/joint digit tests, Cramer's V, entropy, KL divergence, digit-preference indices, progressive subsampling) on two input collections (instrument RawData and manually simulated ErrData), trains standard classifiers on those features, reports internal-validation AUC, and applies the fitted ensemble to produce risk scores on held-out external datasets. No equation or procedure reduces a claimed output to an input by definition, renames a fitted parameter as a prediction, or rests on a self-citation chain; the external benchmarks are independent of the training split and the simulation is presented as an evaluation construct rather than a derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; full manuscript unavailable for audit.

pith-pipeline@v0.9.1-grok · 5864 in / 1265 out tokens · 34179 ms · 2026-06-27T22:24:24.633024+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 31 canonical work pages

[1]

Definition of Research Misconduct

Integrity OoR. Definition of Research Misconduct. Rockville, MD: U.S. Department of Health and Human Services; Accessed 30 May 2026

2026
[2]

Federal Research Misconduct Policy

Integrity OoR. Federal Research Misconduct Policy. Rockville, MD: U.S. Department of Health and Human Services; 2000 Accessed 30 May 2026

2000
[3]

How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data

Fanelli D. How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 2009;4(5):e5738. Epub 20090529. doi: 10.1371/journal.pone.0005738. PubMed PMID: 19478950; PubMed Central PMCID: PMCPMC2685008

work page doi:10.1371/journal.pone.0005738 2009
[4]

Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis

Xie Y , Wang K, Kong Y . Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis. Sci Eng Ethics. 2021;27(4):41. Epub 20210629. doi: 10.1007/s11948-021-00314-9. PubMed PMID: 34189653

work page doi:10.1007/s11948-021-00314-9 2021
[5]

COPE Flowcharts and Infographics: Fabricated Data in a Submitted Manuscript

Council C. COPE Flowcharts and Infographics: Fabricated Data in a Submitted Manuscript. Version 1, April 2023 ed. London, UK: Committee on Publication Ethics; 2023 Accessed 30 May 2026

2023
[6]

COPE Flowcharts and Infographics: Fabricated Data in a Published Article

Council C. COPE Flowcharts and Infographics: Fabricated Data in a Published Article. Version 1, April 2023 ed. London, UK: Committee on Publication Ethics; 2023 Accessed 30 May 2026

2023
[7]

The prevalence of statistical reporting errors in psychology (1985-2013)

Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods. 2016;48(4):1205-26. doi: 10.3758/s13428-015-0664-2

work page doi:10.3758/s13428-015-0664-2 1985
[8]

The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology

Brown NJL, Heathers JAJ. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science. 2017;8(4):363-9. doi: 10.1177/1948550616673876

work page doi:10.1177/1948550616673876 2017
[9]

statcheck: Extract Statistics from Articles and Recompute P Values

Epskamp S, Nuijten MB. statcheck: Extract Statistics from Articles and Recompute P Values. Vienna, Austria: Comprehensive R Archive Network
[10]

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics. 2009;3(4):1309-34. doi: 10.1214/09-aoas291

work page doi:10.1214/09-aoas291 2009
[11]

Investigating and preventing scientific misconduct using Benford's Law

Eckhartt GM, Ruxton GD. Investigating and preventing scientific misconduct using Benford's Law. Res Integr Peer Rev. 2023;8(1):1. Epub 20230411. doi: 10.1186/s41073- 022- 00126-w. PubMed PMID: 37041616; PubMed Central PMCID: PMCPMC10088595

work page doi:10.1186/s41073- 2023
[12]

Detecting fabrication in large-scale molecular omics data

Bradshaw MS, Payne SH. Detecting fabrication in large-scale molecular omics data. PLoS One. 2021;16(11):e0260395. Epub 20211130. doi: 10.1371/journal.pone.0260395. PubMed PMID: 34847169; PubMed Central PMCID: PMCPMC8631639

work page doi:10.1371/journal.pone.0260395 2021
[13]

Data fabrication: Can people generate random digits? Accountability in Research

Mosimann JE, Wiseman CV , Edelman RE. Data fabrication: Can people generate random digits? Accountability in Research. 1995;4(1):31-55. doi: 10.1080/08989629508573866

work page doi:10.1080/08989629508573866 1995
[14]

Terminal Digits and the Examination of Questioned Data

Mosimann J, Dahlberg J, Davidian N, Krueger J. Terminal Digits and the Examination of Questioned Data. Accountability in Research. 2002;9(2):75-92. doi: 10.1080/08989620212969

work page doi:10.1080/08989620212969 2002
[15]

Statistical Forensics: Check Rightmost Digits for Uniform Distribution

Integrity USOoR. Statistical Forensics: Check Rightmost Digits for Uniform Distribution. Rockville, MD: U.S. Department of Health and Human Services; Accessed 30 May 2026

2026
[16]

Statistical Forensics

Integrity USOoR. Statistical Forensics. Rockville, MD: U.S. Department of Health and Human Services; Accessed 30 May 2026

2026
[17]

Are these data real? Statistical methods for the detection of data fabrication in clinical trials

Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. Bmj. 2005;331(7511):267 -70. doi: 10.1136/bmj.331.7511.267. PubMed PMID: 16052019; PubMed Central PMCID: PMCPMC1181267

work page doi:10.1136/bmj.331.7511.267 2005
[18]

Data fabrication and other reasons for non -random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals

Carlisle JB. Data fabrication and other reasons for non -random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-52. Epub 20170604. doi: 10.1111/anae.13938. PubMed PMID: 28580651

work page doi:10.1111/anae.13938 2017
[19]

Methods to assess research misconduct in health -related research: A scoping review

Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, et al. Methods to assess research misconduct in health -related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. doi: https://doi.org/10.1016/j.jclinepi.2021.05.012

work page doi:10.1016/j.jclinepi.2021.05.012 2021
[20]

Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance

Zheng H, Liu J, Cheng Q, Zhang Q, Zhang Y , Jiang L, et al. Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance. Nature Cancer. 2024;5(4):572-89. doi: 10.1038/s43018-023-00715-8

work page doi:10.1038/s43018-023-00715-8 2024
[22]

Author Correction: Human HDAC6 senses valine abundancy to regulate DNA damage

Jin J, Meng T, Yu Y , Wu S, Jiao C-C, Song S, et al. Author Correction: Human HDAC6 senses valine abundancy to regulate DNA damage. Nature. 2025;644(8076):E34-E. doi: 10.1038/s41586-025-09409-w

work page doi:10.1038/s41586-025-09409-w 2025
[23]

Human HDAC6 senses valine abundancy to regulate DNA damage

Foundation P. PubPeer comments: Human HDAC6 senses valine abundancy to regulate DNA damage: PubPeer; 2026 [PubPeer comments page for the article "Human HDAC6 senses valine abundancy to regulate DNA damage"; Nature; DOI: 10.1038/s41586- 024-08248-5; PubMed ID: 39567688.]. 2026-06- 01:[Available from: https://pubpeer.com/publications/429F23C68462E5C1A09175C3CD8B07

work page doi:10.1038/s41586- 2026
[24]

Targeted activation of ferroptosis in colorectal cancer vi a LGR4 targeting overcomes acquired drug resistance

Foundation P. PubPeer comments: Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance: PubPeer; 2026 [PubPeer comments page for the article "Targeted activation of ferroptosis in colorectal cancer vi a LGR4 targeting overcomes acquired drug resistance"; Nature Cancer; DOI: 10.1038/s43018 -023-00715-8...

work page doi:10.1038/s43018 2026
[25]

Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance

University N. Situation report: Nankai University; 2026 [updated 2026-05 -30Official institutional notice concerning data-integrity issues related to the article "Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance".]. 2026-06-01:[Available from: https://www.nankai.edu.cn/2026/0530/c17471a596704/page.htm

2026
[26]

Human HDAC6 senses valine abundancy to regulate DNA damage

University T. Situation report: Tongji University News Center; 2026 [updated 2026-05 - 06Official institutional notice concerning data -integrity issues related to the article "Human HDAC6 senses valine abundancy to regulate DNA damage".]. 2026 -06-01:[Available from: https://news.tongji.edu.cn/info/1008/94355.htm

2026
[27]

Interferon restores replication fork stability and cell viability in BRCA -defective cells via ISG15

Moro RN, Biswas U, Kharat SS, Duzanic FD, Das P, Stavrou M, et al. Interferon restores replication fork stability and cell viability in BRCA -defective cells via ISG15. Nature Communications. 2023;14(1):6140. doi: 10.1038/s41467-023-41801-w

work page doi:10.1038/s41467-023-41801-w 2023
[28]

Clonal barcoding with qPCR detection enables live cell functional analyses for cancer research

Guo Q, Spasic M, Maynard AG, Goreczny GJ, Bizuayehu A, Olive JF, et al. Clonal barcoding with qPCR detection enables live cell functional analyses for cancer research. Nature Communications. 2022;13(1):3837. doi: 10.1038/s41467-022-31536-5

work page doi:10.1038/s41467-022-31536-5 2022
[29]

DKC1 promotes colorectal cancer progression and therapy resistance by dysregulating sphingolipid biosynthesis

Khan UK, Goel A, Nigam S, Chaudhary N, Praveen A, Roy A, et al. DKC1 promotes colorectal cancer progression and therapy resistance by dysregulating sphingolipid biosynthesis. Nature Communications. 2026;17(1):4406. doi: 10.1038/s41467-026-72800-2

work page doi:10.1038/s41467-026-72800-2 2026
[30]

Pearson K. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophi cal Magazine and Journal of Science. 1900;50(302):157-75. doi: 10.1080/14786440009463897

work page doi:10.1080/14786440009463897 1900
[31]

Mathematical Methods of Statistics

Cramér H. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press; 1946

1946
[32]

A bias-correction for Cramér's V and Tschuprow's T

Bergsma W. A bias-correction for Cramér's V and Tschuprow's T. Journal of the Korean Statistical Society. 2013;42(3):323-8. doi: 10.1016/j.jkss.2012.10.002

work page doi:10.1016/j.jkss.2012.10.002 2013
[33]

1948 , journal =

Shannon CE. A mathematical theory of communication. The Bell System Technical Journal. 1948;27(3):379-423. doi: 10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948
[34]

The measurement of diversity in different types of biological collections

Pielou EC. The measurement of diversity in different types of biological collections. Journal of Theoretical Biology. 1966;13:131-44. doi: 10.1016/0022-5193(66)90013-0

work page doi:10.1016/0022-5193(66)90013-0 1966
[35]

, title =

Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22(1):79-86. doi: 10.1214/aoms/1177729694

work page doi:10.1214/aoms/1177729694 1951
[36]

Breiman, Random forests, Mach

Breiman L. Random forests. Machine Learning. 2001;45(1):5-32. doi: 10.1023/a:1010933404324

work page doi:10.1023/a:1010933404324 2001
[37]

Journal of the Royal Statistical Society Series B: Statistical Methodology , author=

Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005;67(2):301 -20. doi: 10.1111/j.1467-9868.2005.00503.x

work page doi:10.1111/j.1467-9868.2005.00503.x 2005
[38]

Regularization paths for generalized linear models via coordinate descent

Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010;33(1):1-22. doi: 10.18637/jss.v033.i01

work page doi:10.18637/jss.v033.i01 2010
[39]

Support-vector networks

Cortes C, Vapnik V . Support-vector networks. Machine Learning. 1995;20(3):273-97. doi: 10.1007/bf00994018

work page doi:10.1007/bf00994018 1995
[40]

Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods

Platt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D, editors. Advances in Large Margin Classifiers. Cambridge, MA: MIT Press; 1999. p. 61-74

1999
[41]

Isolation Forest

Liu FT, Ting KM, Zhou Z-H. Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining; Pisa, Italy: IEEE; 2008. p. 413-22

2008
[42]

Stacked generalization

Wolpert DH. Stacked generalization. Neural Networks. 1992;5(2):241 -59. doi: 10.1016/s0893-6080(05)80023-1

work page doi:10.1016/s0893-6080(05)80023-1 1992
[43]

A study of cross -validation and bootstrap for accuracy estimation and model selection

Kohavi R. A study of cross -validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence; Montreal, Canada: Morgan Kaufmann; 1995. p. 1137-43

1995
[44]

An introduction to ROC analysis

Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006;27(8):861-

2006
[45]

doi: 10.1016/j.patrec.2005.10.010

work page doi:10.1016/j.patrec.2005.10.010 2005
[46]

Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation

Powers DMW. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies. 2011;2(1):37-63

2011
[47]

Verification of forecasts expressed in terms of probability

Brier GW. Verification of forecasts expressed in terms of probability. Monthly Weather Review. 1950;78(1):1-3. doi: 10.1175/1520-0493(1950)078<0001:V ofeit>2.0.Co;2. Figure 1 Figure 2 A B C DE F G H I Figure 3 A D K FG I K B CD E HJ Figure S1 AB C DE F GH I JK L MN O AB C DE F GH I JK L MN O Figure S2 InterpretationRisk gradeIntegrated risk score No appar...

work page doi:10.1175/1520-0493(1950)078 1950

[1] [1]

Definition of Research Misconduct

Integrity OoR. Definition of Research Misconduct. Rockville, MD: U.S. Department of Health and Human Services; Accessed 30 May 2026

2026

[2] [2]

Federal Research Misconduct Policy

Integrity OoR. Federal Research Misconduct Policy. Rockville, MD: U.S. Department of Health and Human Services; 2000 Accessed 30 May 2026

2000

[3] [3]

How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data

Fanelli D. How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 2009;4(5):e5738. Epub 20090529. doi: 10.1371/journal.pone.0005738. PubMed PMID: 19478950; PubMed Central PMCID: PMCPMC2685008

work page doi:10.1371/journal.pone.0005738 2009

[4] [4]

Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis

Xie Y , Wang K, Kong Y . Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis. Sci Eng Ethics. 2021;27(4):41. Epub 20210629. doi: 10.1007/s11948-021-00314-9. PubMed PMID: 34189653

work page doi:10.1007/s11948-021-00314-9 2021

[5] [5]

COPE Flowcharts and Infographics: Fabricated Data in a Submitted Manuscript

Council C. COPE Flowcharts and Infographics: Fabricated Data in a Submitted Manuscript. Version 1, April 2023 ed. London, UK: Committee on Publication Ethics; 2023 Accessed 30 May 2026

2023

[6] [6]

COPE Flowcharts and Infographics: Fabricated Data in a Published Article

Council C. COPE Flowcharts and Infographics: Fabricated Data in a Published Article. Version 1, April 2023 ed. London, UK: Committee on Publication Ethics; 2023 Accessed 30 May 2026

2023

[7] [7]

The prevalence of statistical reporting errors in psychology (1985-2013)

Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods. 2016;48(4):1205-26. doi: 10.3758/s13428-015-0664-2

work page doi:10.3758/s13428-015-0664-2 1985

[8] [8]

The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology

Brown NJL, Heathers JAJ. The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science. 2017;8(4):363-9. doi: 10.1177/1948550616673876

work page doi:10.1177/1948550616673876 2017

[9] [9]

statcheck: Extract Statistics from Articles and Recompute P Values

Epskamp S, Nuijten MB. statcheck: Extract Statistics from Articles and Recompute P Values. Vienna, Austria: Comprehensive R Archive Network

[10] [10]

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

Baggerly KA, Coombes KR. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics. 2009;3(4):1309-34. doi: 10.1214/09-aoas291

work page doi:10.1214/09-aoas291 2009

[11] [11]

Investigating and preventing scientific misconduct using Benford's Law

Eckhartt GM, Ruxton GD. Investigating and preventing scientific misconduct using Benford's Law. Res Integr Peer Rev. 2023;8(1):1. Epub 20230411. doi: 10.1186/s41073- 022- 00126-w. PubMed PMID: 37041616; PubMed Central PMCID: PMCPMC10088595

work page doi:10.1186/s41073- 2023

[12] [12]

Detecting fabrication in large-scale molecular omics data

Bradshaw MS, Payne SH. Detecting fabrication in large-scale molecular omics data. PLoS One. 2021;16(11):e0260395. Epub 20211130. doi: 10.1371/journal.pone.0260395. PubMed PMID: 34847169; PubMed Central PMCID: PMCPMC8631639

work page doi:10.1371/journal.pone.0260395 2021

[13] [13]

Data fabrication: Can people generate random digits? Accountability in Research

Mosimann JE, Wiseman CV , Edelman RE. Data fabrication: Can people generate random digits? Accountability in Research. 1995;4(1):31-55. doi: 10.1080/08989629508573866

work page doi:10.1080/08989629508573866 1995

[14] [14]

Terminal Digits and the Examination of Questioned Data

Mosimann J, Dahlberg J, Davidian N, Krueger J. Terminal Digits and the Examination of Questioned Data. Accountability in Research. 2002;9(2):75-92. doi: 10.1080/08989620212969

work page doi:10.1080/08989620212969 2002

[15] [15]

Statistical Forensics: Check Rightmost Digits for Uniform Distribution

Integrity USOoR. Statistical Forensics: Check Rightmost Digits for Uniform Distribution. Rockville, MD: U.S. Department of Health and Human Services; Accessed 30 May 2026

2026

[16] [16]

Statistical Forensics

Integrity USOoR. Statistical Forensics. Rockville, MD: U.S. Department of Health and Human Services; Accessed 30 May 2026

2026

[17] [17]

Are these data real? Statistical methods for the detection of data fabrication in clinical trials

Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. Bmj. 2005;331(7511):267 -70. doi: 10.1136/bmj.331.7511.267. PubMed PMID: 16052019; PubMed Central PMCID: PMCPMC1181267

work page doi:10.1136/bmj.331.7511.267 2005

[18] [18]

Data fabrication and other reasons for non -random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals

Carlisle JB. Data fabrication and other reasons for non -random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-52. Epub 20170604. doi: 10.1111/anae.13938. PubMed PMID: 28580651

work page doi:10.1111/anae.13938 2017

[19] [19]

Methods to assess research misconduct in health -related research: A scoping review

Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, et al. Methods to assess research misconduct in health -related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. doi: https://doi.org/10.1016/j.jclinepi.2021.05.012

work page doi:10.1016/j.jclinepi.2021.05.012 2021

[20] [20]

Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance

Zheng H, Liu J, Cheng Q, Zhang Q, Zhang Y , Jiang L, et al. Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance. Nature Cancer. 2024;5(4):572-89. doi: 10.1038/s43018-023-00715-8

work page doi:10.1038/s43018-023-00715-8 2024

[21] [22]

Author Correction: Human HDAC6 senses valine abundancy to regulate DNA damage

Jin J, Meng T, Yu Y , Wu S, Jiao C-C, Song S, et al. Author Correction: Human HDAC6 senses valine abundancy to regulate DNA damage. Nature. 2025;644(8076):E34-E. doi: 10.1038/s41586-025-09409-w

work page doi:10.1038/s41586-025-09409-w 2025

[22] [23]

Human HDAC6 senses valine abundancy to regulate DNA damage

Foundation P. PubPeer comments: Human HDAC6 senses valine abundancy to regulate DNA damage: PubPeer; 2026 [PubPeer comments page for the article "Human HDAC6 senses valine abundancy to regulate DNA damage"; Nature; DOI: 10.1038/s41586- 024-08248-5; PubMed ID: 39567688.]. 2026-06- 01:[Available from: https://pubpeer.com/publications/429F23C68462E5C1A09175C3CD8B07

work page doi:10.1038/s41586- 2026

[23] [24]

Targeted activation of ferroptosis in colorectal cancer vi a LGR4 targeting overcomes acquired drug resistance

Foundation P. PubPeer comments: Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance: PubPeer; 2026 [PubPeer comments page for the article "Targeted activation of ferroptosis in colorectal cancer vi a LGR4 targeting overcomes acquired drug resistance"; Nature Cancer; DOI: 10.1038/s43018 -023-00715-8...

work page doi:10.1038/s43018 2026

[24] [25]

Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance

University N. Situation report: Nankai University; 2026 [updated 2026-05 -30Official institutional notice concerning data-integrity issues related to the article "Targeted activation of ferroptosis in colorectal cancer via LGR4 targeting overcomes acquired drug resistance".]. 2026-06-01:[Available from: https://www.nankai.edu.cn/2026/0530/c17471a596704/page.htm

2026

[25] [26]

Human HDAC6 senses valine abundancy to regulate DNA damage

University T. Situation report: Tongji University News Center; 2026 [updated 2026-05 - 06Official institutional notice concerning data -integrity issues related to the article "Human HDAC6 senses valine abundancy to regulate DNA damage".]. 2026 -06-01:[Available from: https://news.tongji.edu.cn/info/1008/94355.htm

2026

[26] [27]

Interferon restores replication fork stability and cell viability in BRCA -defective cells via ISG15

Moro RN, Biswas U, Kharat SS, Duzanic FD, Das P, Stavrou M, et al. Interferon restores replication fork stability and cell viability in BRCA -defective cells via ISG15. Nature Communications. 2023;14(1):6140. doi: 10.1038/s41467-023-41801-w

work page doi:10.1038/s41467-023-41801-w 2023

[27] [28]

Clonal barcoding with qPCR detection enables live cell functional analyses for cancer research

Guo Q, Spasic M, Maynard AG, Goreczny GJ, Bizuayehu A, Olive JF, et al. Clonal barcoding with qPCR detection enables live cell functional analyses for cancer research. Nature Communications. 2022;13(1):3837. doi: 10.1038/s41467-022-31536-5

work page doi:10.1038/s41467-022-31536-5 2022

[28] [29]

DKC1 promotes colorectal cancer progression and therapy resistance by dysregulating sphingolipid biosynthesis

Khan UK, Goel A, Nigam S, Chaudhary N, Praveen A, Roy A, et al. DKC1 promotes colorectal cancer progression and therapy resistance by dysregulating sphingolipid biosynthesis. Nature Communications. 2026;17(1):4406. doi: 10.1038/s41467-026-72800-2

work page doi:10.1038/s41467-026-72800-2 2026

[29] [30]

Pearson K. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophi cal Magazine and Journal of Science. 1900;50(302):157-75. doi: 10.1080/14786440009463897

work page doi:10.1080/14786440009463897 1900

[30] [31]

Mathematical Methods of Statistics

Cramér H. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press; 1946

1946

[31] [32]

A bias-correction for Cramér's V and Tschuprow's T

Bergsma W. A bias-correction for Cramér's V and Tschuprow's T. Journal of the Korean Statistical Society. 2013;42(3):323-8. doi: 10.1016/j.jkss.2012.10.002

work page doi:10.1016/j.jkss.2012.10.002 2013

[32] [33]

1948 , journal =

Shannon CE. A mathematical theory of communication. The Bell System Technical Journal. 1948;27(3):379-423. doi: 10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948

[33] [34]

The measurement of diversity in different types of biological collections

Pielou EC. The measurement of diversity in different types of biological collections. Journal of Theoretical Biology. 1966;13:131-44. doi: 10.1016/0022-5193(66)90013-0

work page doi:10.1016/0022-5193(66)90013-0 1966

[34] [35]

, title =

Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22(1):79-86. doi: 10.1214/aoms/1177729694

work page doi:10.1214/aoms/1177729694 1951

[35] [36]

Breiman, Random forests, Mach

Breiman L. Random forests. Machine Learning. 2001;45(1):5-32. doi: 10.1023/a:1010933404324

work page doi:10.1023/a:1010933404324 2001

[36] [37]

Journal of the Royal Statistical Society Series B: Statistical Methodology , author=

Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005;67(2):301 -20. doi: 10.1111/j.1467-9868.2005.00503.x

work page doi:10.1111/j.1467-9868.2005.00503.x 2005

[37] [38]

Regularization paths for generalized linear models via coordinate descent

Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010;33(1):1-22. doi: 10.18637/jss.v033.i01

work page doi:10.18637/jss.v033.i01 2010

[38] [39]

Support-vector networks

Cortes C, Vapnik V . Support-vector networks. Machine Learning. 1995;20(3):273-97. doi: 10.1007/bf00994018

work page doi:10.1007/bf00994018 1995

[39] [40]

Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods

Platt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D, editors. Advances in Large Margin Classifiers. Cambridge, MA: MIT Press; 1999. p. 61-74

1999

[40] [41]

Isolation Forest

Liu FT, Ting KM, Zhou Z-H. Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining; Pisa, Italy: IEEE; 2008. p. 413-22

2008

[41] [42]

Stacked generalization

Wolpert DH. Stacked generalization. Neural Networks. 1992;5(2):241 -59. doi: 10.1016/s0893-6080(05)80023-1

work page doi:10.1016/s0893-6080(05)80023-1 1992

[42] [43]

A study of cross -validation and bootstrap for accuracy estimation and model selection

Kohavi R. A study of cross -validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence; Montreal, Canada: Morgan Kaufmann; 1995. p. 1137-43

1995

[43] [44]

An introduction to ROC analysis

Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006;27(8):861-

2006

[44] [45]

doi: 10.1016/j.patrec.2005.10.010

work page doi:10.1016/j.patrec.2005.10.010 2005

[45] [46]

Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation

Powers DMW. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies. 2011;2(1):37-63

2011

[46] [47]

Verification of forecasts expressed in terms of probability

Brier GW. Verification of forecasts expressed in terms of probability. Monthly Weather Review. 1950;78(1):1-3. doi: 10.1175/1520-0493(1950)078<0001:V ofeit>2.0.Co;2. Figure 1 Figure 2 A B C DE F G H I Figure 3 A D K FG I K B CD E HJ Figure S1 AB C DE F GH I JK L MN O AB C DE F GH I JK L MN O Figure S2 InterpretationRisk gradeIntegrated risk score No appar...

work page doi:10.1175/1520-0493(1950)078 1950