The Trustworthy Pal: Controlling the False Discovery Rate in Boolean Matrix Factorization

Katharina Morik; Nico Piatkowski; Sibylle Hess

arxiv: 1907.00697 · v1 · pith:2VFJRJX6new · submitted 2019-07-01 · 💻 cs.LG · stat.ML

The Trustworthy Pal: Controlling the False Discovery Rate in Boolean Matrix Factorization

Sibylle Hess , Nico Piatkowski , Katharina Morik This is my paper

Pith reviewed 2026-05-25 12:20 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords Boolean matrix factorizationfalse discovery ratepattern miningnoise detectionunsupervised learningprobabilistic boundsapproximation error minimization

0 comments

The pith

Boolean matrix factorization can bound the probability that its patterns arise from random noise by using properties of the error-minimizing factorization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Boolean matrix factorization extracts patterns from binary data as the disjunction of rank-1 matrices, yet offers no guarantee that any pattern is real rather than noise. The paper proves two bounds on the probability that a returned pattern consists entirely of independent Bernoulli noise. Both bounds rest on distinct properties that hold only for factorizations that minimize the Boolean approximation error. This supplies a statistical control on the false discovery rate in the unsupervised setting. The result replaces heuristic rank selection with a principled method for deciding how many patterns to keep.

Core claim

The paper establishes two probabilistic bounds on the chance that a pattern found by Boolean matrix factorization is generated by pure random Bernoulli noise. Each bound exploits a different property that is true precisely when the factorization is an exact minimizer of the Boolean approximation error, thereby giving new insight into the structure of such minimizers and enabling FDR control without labeled data.

What carries the argument

Two bounds on the probability that a pattern is pure noise, each derived from a distinct optimality property of the Boolean approximation error minimizer.

If this is right

Rank selection in BMF can be performed by retaining only patterns whose noise probability falls below a chosen FDR threshold.
Different candidate factorizations can be compared by the tightness of the noise bounds they achieve.
BMF algorithms can be redesigned to search directly for factorizations whose patterns satisfy the derived probability bounds.
The same optimality properties used for the bounds may yield further analytic results on the geometry of Boolean matrix factorizations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bounds could be adapted to other discrete matrix factorization problems that minimize an approximation error.
In streaming or online settings the bounds might serve as a stopping criterion once additional patterns no longer improve the FDR-controlled quality.
Empirical checks on synthetic noise data would directly test whether the minimizer property holds tightly enough for the bounds to be useful in practice.

Load-bearing premise

Observed data entries are independent Bernoulli random variables under the null model of pure noise, and the returned factorization is an exact minimizer of the Boolean approximation error.

What would settle it

Generate purely random Bernoulli data, run the BMF algorithm to its claimed minimum error, and check whether the fraction of patterns declared significant exceeds the bound-predicted rate.

Figures

Figures reproduced from arXiv: 1907.00697 by Katharina Morik, Nico Piatkowski, Sibylle Hess.

**Figure 2.** Figure 2: Minimum relative size |Y·s|/m, depending on |X·s|/n, for which the P(Zs = 1) ≤ 0.01, based on density (blue) and coherence (green). against the relative size |X·s|/n such that the probability P(Zs = 1) ≤ 0.01. The blue curve displays the minimum tile size, assessing the false discovery probability by Corollary 3.1, while green refers to Corollary 3.2. Thereby, we assume that the tile is 1/2-dense in N+ an… view at source ↗

**Figure 3.** Figure 3: Variation of uniform noise for 1600 × 500 and 1000×800 dimensional data. Comparison of F-measures (the higher the better) and the estimated rank of the calculated tiling (the closer to 25 the better) for varying levels of noise indicated on the x-axis. planted factorizations are [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Variation of density and overlap influencing [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Boolean matrix factorization (BMF) is a popular and powerful technique for inferring knowledge from data. The mining result is the Boolean product of two matrices, approximating the input dataset. The Boolean product is a disjunction of rank-1 binary matrices, each describing a feature-relation, called pattern, for a group of samples. Yet, there are no guarantees that any of the returned patterns do not actually arise from noise, i.e., are false discoveries. In this paper, we propose and discuss the usage of the false discovery rate in the unsupervised BMF setting. We prove two bounds on the probability that a found pattern is constituted of random Bernoulli-distributed noise. Each bound exploits a specific property of the factorization which minimizes the approximation error---yielding new insights on the minimizers of Boolean matrix factorization. This leads to improved BMF algorithms by replacing heuristic rank selection techniques with a theoretically well-based approach. Our empirical demonstration shows that both bounds deliver excellent results in various practical settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The two FDR bounds apply only to exact error-minimizing factorizations, but since exact BMF is NP-hard the bounds do not transfer to any practical algorithm without additional work.

read the letter

The paper derives two probability bounds showing that patterns returned by an error-minimizing BMF are unlikely to be pure Bernoulli noise. The bounds follow from the minimization property itself and the external noise model, which is a clean way to get statistical control in an unsupervised setting where rank choice has usually been ad hoc. That is the actual novelty: prior BMF work did not supply explicit false-discovery guarantees tied to the objective. The empirical section reports that the bounds perform well across several datasets, which at least shows the idea is not immediately vacuous. The derivations also yield some side observations about what properties a good minimizer must satisfy. Those are useful even if one never uses the FDR procedure. The central limitation is the one flagged in the stress test. The proofs require a globally optimal factorization, yet every feasible BMF routine is a heuristic. Nothing in the abstract or the reported experiments indicates that the bounds remain valid or conservative once the factorization is only locally optimal. Without that link the procedure cannot replace heuristic rank selection in real use. The independent-Bernoulli null is also strong; many binary datasets have dependence structure that would invalidate the probability statements. A reader working on theoretical extensions of BMF or on statistical pattern mining would find the bounds and the new view of minimizers worth reading. Someone needing a drop-in method for noisy data would get less immediate value until the approximation gap is closed. The work shows clear engagement with the literature and the math is internally consistent on its own terms, so it merits a full referee report to check the proofs and to see whether the experiments actually test the bounds under realistic algorithmic conditions.

Referee Report

1 major / 0 minor

Summary. The paper claims to prove two bounds on the probability that a pattern discovered by Boolean matrix factorization (BMF) arises purely from independent Bernoulli noise. Each bound exploits a property of the factorization that exactly minimizes the Boolean approximation error. These bounds are proposed as a theoretically grounded replacement for heuristic rank selection in BMF, enabling control of the false discovery rate, and are supported by empirical demonstrations across practical settings.

Significance. If the bounds apply to the factorizations actually returned by algorithms, the work supplies a principled, non-heuristic method for assessing pattern reliability in unsupervised BMF and yields new characterizations of error-minimizing factorizations. This would strengthen trustworthiness of BMF results in data-mining applications.

major comments (1)

[Abstract and proof sections] The two probability bounds are derived by exploiting properties that hold specifically for a factorization that exactly minimizes the Boolean approximation error under the independent Bernoulli null model (see abstract and the sections presenting the proofs). Exact BMF is NP-hard, yet the paper does not supply a guarantee that any practical algorithm returns a globally optimal factorization, nor does it demonstrate that the bounds continue to hold (even approximately) for the heuristic outputs used in the empirical section. This directly affects the central claim that the bounds enable improved, theoretically well-based BMF algorithms.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for highlighting the important distinction between exact and heuristic factorizations. We address the major comment below.

read point-by-point responses

Referee: [Abstract and proof sections] The two probability bounds are derived by exploiting properties that hold specifically for a factorization that exactly minimizes the Boolean approximation error under the independent Bernoulli null model (see abstract and the sections presenting the proofs). Exact BMF is NP-hard, yet the paper does not supply a guarantee that any practical algorithm returns a globally optimal factorization, nor does it demonstrate that the bounds continue to hold (even approximately) for the heuristic outputs used in the empirical section. This directly affects the central claim that the bounds enable improved, theoretically well-based BMF algorithms.

Authors: We agree that the two bounds are derived under the assumption of an exact minimizer of the Boolean approximation error, as stated in the abstract and proof sections. The manuscript does not provide any guarantee that practical (heuristic) algorithms return globally optimal factorizations, nor does it include a formal or empirical demonstration that the bounds hold approximately for heuristic outputs. This is a substantive limitation for the claim that the bounds directly enable improved, theoretically grounded BMF algorithms in practice. In the revision we will add an explicit discussion of this scope limitation in the introduction and conclusions, clarifying that the bounds apply to exact minimizers and may be used conservatively with near-optimal heuristics; we will also report any relevant observations from the existing experiments on how close the employed heuristics came to optimality. revision: partial

Circularity Check

0 steps flagged

No circularity; bounds derived from external Bernoulli model and minimizer properties

full rationale

The paper states it proves two bounds on false-discovery probability by exploiting properties of an exact error-minimizing factorization under an independent Bernoulli noise model. No equations reduce a claimed prediction to a fitted input by construction, no self-citations are load-bearing for the central claim, and no ansatz or uniqueness result is smuggled in via prior work. The derivation is therefore self-contained against the stated external statistical assumptions and does not collapse to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that data entries behave as independent Bernoulli trials under the null and that the factorization exactly minimizes Boolean approximation error; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Data matrix entries are independent Bernoulli random variables under the null hypothesis of pure noise.
Invoked to derive the probability that a pattern arises from noise.
domain assumption The returned factorization is an exact minimizer of the Boolean approximation error.
Used to obtain the two specific bounds on false-pattern probability.

pith-pipeline@v0.9.0 · 5706 in / 1237 out tokens · 22547 ms · 2026-05-25T12:20:25.115858+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Benjamini and Y

Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological) , pages 289–300, 1995

work page 1995
[2]

Bolte, S

J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming , 146(1-2):459–494, 2014

work page 2014
[3]

Duivesteijn and A

W. Duivesteijn and A. Knobbe. Exploiting false discoveries — statistical validation of patterns and quality measures in subgroup discovery. In Proc. ICDM, pages 151–160, 2011

work page 2011
[4]

Foucart and H

S. Foucart and H. Rauhut. A mathematical intro- duction to compressive sensing , volume 1. Birkh¨ auser Basel, 2013

work page 2013
[5]

S. Hess, K. Morik, and N. Piatkowski. The PRIMPING routine - tiling through proximal alternating linearized minimization. Data Min. Knowl. Discov. , 31(4):1090– 1131, 2017

work page 2017
[6]

Komiyama, M

J. Komiyama, M. Ishihata, H. Arimura, T. Nishibayashi, and S. Minato. Statistical emerging pattern mining with multiple testing correction. In Proc. KDD, pages 897–906, 2017

work page 2017
[7]

Llinares-L´ opez, M

F. Llinares-L´ opez, M. Sugiyama, L. Papaxanthos, and K. M. Borgwardt. Fast and memory-eﬃcient signiﬁ- cant pattern mining via permutation testing. In Proc. KDD, pages 725–734, 2015

work page 2015
[8]

Miettinen, T

P. Miettinen, T. Mielikainen, A. Gionis, G. Das, and H. Mannila. The discrete basis problem. TKDE, 20(10):1348–1362, 2008

work page 2008
[9]

Miettinen and J

P. Miettinen and J. Vreeken. Model order selection for boolean matrix factorization. In Proc. KDD, pages 51–59. ACM, 2011

work page 2011
[10]

M. P. Paciﬁco, C. Genovese, I. Verdinelli, and L. Wasserman. Scan clustering: A false discovery ap- proach. Journal of Multivariate Analysis , 98(7):1441 – 1469, 2007

work page 2007
[11]

Wang and Y.-J

Y.-X. Wang and Y.-J. Zhang. Nonnegative ma- trix factorization: A comprehensive review. TKDE, 25(6):1336–1353, 2013

work page 2013
[12]

G. I. Webb. Discovering signiﬁcant patterns. Mach. Learn., 68(1):1–33, 2007

work page 2007
[13]

Zhang, C

Z. Zhang, C. Ding, T. Li, and X. Zhang. Binary matrix factorization with applications. In Proc. ICDM, pages 391–400, 2007. Copyright c⃝ 2018 by SIAM Unauthorized reproduction of this article is prohibited

work page 2007

[1] [1]

Benjamini and Y

Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological) , pages 289–300, 1995

work page 1995

[2] [2]

Bolte, S

J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming , 146(1-2):459–494, 2014

work page 2014

[3] [3]

Duivesteijn and A

W. Duivesteijn and A. Knobbe. Exploiting false discoveries — statistical validation of patterns and quality measures in subgroup discovery. In Proc. ICDM, pages 151–160, 2011

work page 2011

[4] [4]

Foucart and H

S. Foucart and H. Rauhut. A mathematical intro- duction to compressive sensing , volume 1. Birkh¨ auser Basel, 2013

work page 2013

[5] [5]

S. Hess, K. Morik, and N. Piatkowski. The PRIMPING routine - tiling through proximal alternating linearized minimization. Data Min. Knowl. Discov. , 31(4):1090– 1131, 2017

work page 2017

[6] [6]

Komiyama, M

J. Komiyama, M. Ishihata, H. Arimura, T. Nishibayashi, and S. Minato. Statistical emerging pattern mining with multiple testing correction. In Proc. KDD, pages 897–906, 2017

work page 2017

[7] [7]

Llinares-L´ opez, M

F. Llinares-L´ opez, M. Sugiyama, L. Papaxanthos, and K. M. Borgwardt. Fast and memory-eﬃcient signiﬁ- cant pattern mining via permutation testing. In Proc. KDD, pages 725–734, 2015

work page 2015

[8] [8]

Miettinen, T

P. Miettinen, T. Mielikainen, A. Gionis, G. Das, and H. Mannila. The discrete basis problem. TKDE, 20(10):1348–1362, 2008

work page 2008

[9] [9]

Miettinen and J

P. Miettinen and J. Vreeken. Model order selection for boolean matrix factorization. In Proc. KDD, pages 51–59. ACM, 2011

work page 2011

[10] [10]

M. P. Paciﬁco, C. Genovese, I. Verdinelli, and L. Wasserman. Scan clustering: A false discovery ap- proach. Journal of Multivariate Analysis , 98(7):1441 – 1469, 2007

work page 2007

[11] [11]

Wang and Y.-J

Y.-X. Wang and Y.-J. Zhang. Nonnegative ma- trix factorization: A comprehensive review. TKDE, 25(6):1336–1353, 2013

work page 2013

[12] [12]

G. I. Webb. Discovering signiﬁcant patterns. Mach. Learn., 68(1):1–33, 2007

work page 2007

[13] [13]

Zhang, C

Z. Zhang, C. Ding, T. Li, and X. Zhang. Binary matrix factorization with applications. In Proc. ICDM, pages 391–400, 2007. Copyright c⃝ 2018 by SIAM Unauthorized reproduction of this article is prohibited

work page 2007