arxiv: 2604.00763 · v2 · submitted 2026-04-01 · 📊 stat.ME · q-bio.GN· stat.AP

Recognition: 1 theorem link

· Lean Theorem

Non-ignorable fuzziness in granular counts: the case of RNA-seq data

Antonio Calcagn\`i , Arianna Consiglio , Przemyslaw Grzegorzewski , Corrado Mencar

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:15 UTC · model grok-4.3

classification 📊 stat.ME q-bio.GNstat.AP

keywords RNA-seqgranular countsfuzzy datacoarsening not at randomhierarchical modelalignment ambiguityignorability

0 comments

The pith

When RNA-seq reporting uses graded membership for alignment ambiguity, standard ignorability fails and produces coarsening-not-at-random data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

RNA-seq count data frequently contain alignment ambiguity that turns exact counts into fuzzy-valued granular observations. The paper shows that whenever the reporting process relies on graded membership rather than crisp assignment, the usual ignorability assumption breaks down generically and the observed data follow a coarsening-not-at-random structure. A hierarchical model is introduced to represent both the latent true counts and the fuzzy reporting mechanism in a single tractable framework. This model is then fitted to real RNA-seq datasets to demonstrate how the non-ignorable fuzziness can be accounted for in practice. A reader should care because downstream tasks such as differential expression analysis rest on the counts being treated as missing at random; violating that assumption can systematically distort biological conclusions.

Core claim

When the reporting of latent discrete counts exploits graded membership, ignorability fails generically, leading to a coarsening-not-at-random structure. A hierarchical model supplies a tractable instance of this construction for RNA-seq data.

What carries the argument

The fuzzy-reporting mechanism that maps latent integer counts to granular (fuzzy-valued) observations and thereby induces a coarsening-not-at-random structure.

If this is right

Standard count models that assume ignorable coarsening will be biased when graded membership governs the reporting step.
Joint estimation of latent counts and reporting parameters becomes necessary to recover unbiased inferences.
The hierarchical construction supplies a concrete way to propagate uncertainty from alignment ambiguity into downstream analyses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same non-ignorability pattern may appear in any counting process whose observation mechanism uses continuous membership grades, such as species-abundance surveys or sensor-event tallies.
Extensions could replace the current hierarchical specification with nonparametric membership functions while retaining the coarsening-not-at-random logic.
Simulation studies that generate data from known graded mechanisms would provide a direct check on whether the model recovers the latent counts accurately.

Load-bearing premise

The fuzzy-reporting mechanism in RNA-seq can be adequately captured by a tractable hierarchical model without introducing new untestable biases in the latent count distribution.

What would settle it

If estimates obtained from the hierarchical model on real RNA-seq data coincide with those from a standard Poisson or negative-binomial model that ignores the fuzziness, the generic failure of ignorability would be contradicted.

Figures

Figures reproduced from arXiv: 2604.00763 by Antonio Calcagn\`i, Arianna Consiglio, Corrado Mencar, Przemyslaw Grzegorzewski.

read the original abstract

RNA-seq count data are often affected by read-to-gene alignment ambiguity, especially in high-dimensional transcriptomics. This type of ambiguity can be conveniently expressed through granular counts, namely fuzzy-valued observations of latent discrete quantities. We study a class of fuzzy-reporting mechanisms and show that, when reporting exploits graded membership, ignorability fails generically, leading to a coarsening-not-at-random structure. A hierarchical model is then introduced as a tractable instance of this construction and illustrated using RNA-seq data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags that graded fuzzy counts in RNA-seq generically break ignorability and sketches a hierarchical model to handle it, but the details stay thin.

read the letter

The core point is that when RNA-seq reports use graded membership for ambiguous alignments, standard ignorability assumptions fail and the data follow a coarsening-not-at-random pattern. They then build a hierarchical model as a workable version of that structure and show it on some RNA-seq examples. That link between fuzzy reporting and CNAR mechanisms is the main new piece; it takes existing ideas from missing-data work and fuzzy sets and applies them directly to granular transcriptomic counts. The practical angle is useful because alignment ambiguity is common in high-dimensional data, and treating the counts as fuzzy rather than forcing crisp assignments is a reasonable move. The illustration with real data helps show the setup is meant to be usable rather than purely theoretical. The soft spots are mostly around verification. The abstract gives no equations, no derivation of the hierarchy, and no simulation or fit checks, so it is hard to see whether the model keeps the latent count distribution free of new bias or whether the conditional independence assumptions actually match how sequence similarity and read length create the fuzziness in practice. The stress-test concern about unverified form of the membership function lands because RNA-seq ambiguity is driven by biology and technology, not by an abstract graded mechanism. Without seeing the full derivations or any sensitivity checks, it is unclear if the hierarchy solves the original problem or just relocates the untestable parts. This is for statisticians and bioinformaticians who already work on uncertain count data and missing-data methods. A reader looking for a concrete fix in transcriptomics might find the framing helpful, but anyone needing reproducible code or validated performance would have to wait for the full manuscript. The idea is coherent enough on its own terms to deserve referee time; the math and the data application need checking, but the question it raises is worth the effort.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that read-to-gene alignment ambiguity in RNA-seq produces granular (fuzzy-valued) counts, and that reporting mechanisms exploiting graded membership generically violate ignorability, inducing a coarsening-not-at-random (CNAR) structure. A hierarchical model is introduced as a tractable instance of this class and is illustrated on RNA-seq data.

Significance. If the generic CNAR result holds under the stated membership functions and the hierarchical model recovers unbiased latent counts without injecting new selection bias, the work would supply a principled modeling route for a pervasive source of ambiguity in transcriptomics. The explicit linkage between graded membership and non-ignorable coarsening is a useful conceptual contribution for count-data analysis more broadly.

major comments (2)

[Theoretical development / hierarchical model] The central claim that graded-membership reporting produces CNAR 'generically' (abstract and theoretical section) requires an explicit derivation showing that the membership function violates the missing-at-random condition for every non-degenerate latent count distribution; the provided abstract and skeptic note give no such derivation or counter-example check, leaving the scope of the result unclear.
[Hierarchical model section] The hierarchical model (introduced after the generic claim) imposes conditional independence assumptions between the latent counts and the observed granular reports; these may be violated by sequence-similarity and read-length effects that drive alignment ambiguity in RNA-seq, potentially reintroducing untestable bias into the inferred count distribution. A simulation study or sensitivity analysis under realistic alignment scenarios is needed to confirm the model remains unbiased.

minor comments (1)

[Abstract] The abstract would be strengthened by a one-sentence description of the hierarchical model's key structure (e.g., the form of the membership function or the latent hierarchy) so readers can immediately gauge the modeling assumptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and robustness of our results. We address each major point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Theoretical development / hierarchical model] The central claim that graded-membership reporting produces CNAR 'generically' (abstract and theoretical section) requires an explicit derivation showing that the membership function violates the missing-at-random condition for every non-degenerate latent count distribution; the provided abstract and skeptic note give no such derivation or counter-example check, leaving the scope of the result unclear.

Authors: We agree that an explicit derivation is needed to establish the generic nature of the CNAR result. In the revised manuscript we will add a formal subsection deriving that, for the stated class of graded membership functions, the reporting mechanism violates the MAR condition for any non-degenerate distribution on the latent counts. The derivation will proceed by showing that the conditional probability of the observed granular report given the latent count cannot factor in a manner independent of the latent value except in degenerate cases; we will also include a brief counter-example check for the boundary (degenerate) distributions to delineate the result's scope. revision: yes
Referee: [Hierarchical model section] The hierarchical model (introduced after the generic claim) imposes conditional independence assumptions between the latent counts and the observed granular reports; these may be violated by sequence-similarity and read-length effects that drive alignment ambiguity in RNA-seq, potentially reintroducing untestable bias into the inferred count distribution. A simulation study or sensitivity analysis under realistic alignment scenarios is needed to confirm the model remains unbiased.

Authors: The conditional independence between latent counts and granular reports is an explicit modeling choice made for tractability within the hierarchical construction; we do not claim it holds universally. Sequence-similarity and read-length effects can indeed induce additional dependence. To address this concern we will add a simulation study in the revision that generates synthetic alignment ambiguity under realistic sequence-similarity profiles (drawn from typical RNA-seq read-length and homology distributions) and evaluates whether the hierarchical model recovers the latent count distribution without introducing detectable bias relative to the true generating process. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper claims to derive that graded-membership fuzzy reporting generically produces a coarsening-not-at-random (CNAR) structure from the definition of the reporting mechanism, then introduces a hierarchical model as a tractable instance. No equations or steps reduce a prediction to a fitted parameter by construction, no self-citation is load-bearing for the central claim, and the hierarchical structure is presented as an independent modeling choice rather than a renaming or ansatz smuggled from prior self-work. The derivation chain is self-contained against external benchmarks and does not collapse to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that fuzzy reporting in RNA-seq follows a graded membership mechanism that can be hierarchically modeled; no free parameters or invented entities are identifiable from the abstract alone.

axioms (2)

domain assumption Alignment ambiguity in RNA-seq produces fuzzy-valued observations of latent discrete counts
Stated as the starting point for the granular counts framework
domain assumption Graded membership reporting creates a coarsening-not-at-random structure
Core theoretical result claimed for the class of fuzzy-reporting mechanisms

pith-pipeline@v0.9.0 · 5390 in / 1255 out tokens · 37692 ms · 2026-05-13T22:15:00.442697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Estimating latent linear correlations from fuzzy frequency tables.Commu- nications in Mathematics and Statistics, 12(3):435–461, 2024

Antonio Calcagnì. Estimating latent linear correlations from fuzzy frequency tables.Commu- nications in Mathematics and Statistics, 12(3):435–461, 2024

work page 2024
[2]

Bayesianize fuzziness in the statistical analysis of fuzzy data.International Journal of Approximate Reasoning, page 109495, 2025

Antonio Calcagnì, Przemysław Grzegorzewski, and Maciej Romaniuk. Bayesianize fuzziness in the statistical analysis of fuzzy data.International Journal of Approximate Reasoning, page 109495, 2025

work page 2025
[3]

A fuzzy method for rna-seq differential expression analysis in presence of multireads.BMC bioinformatics, 17(Suppl 12):345, 2016

Arianna Consiglio, Corrado Mencar, Giorgio Grillo, Flaviana Marzano, Mariano Francesco Caratozzolo, and Sabino Liuni. A fuzzy method for rna-seq differential expression analysis in presence of multireads.BMC bioinformatics, 17(Suppl 12):345, 2016

work page 2016
[4]

Rna- seq data science: From raw data to effective interpretation.Frontiers in Genetics, 14:997383, 2023

Dhrithi Deshpande, Karishma Chhugani, Yutong Chang, Aaron Karlsberg, Caitlin Loeffler, Jinyang Zhang, Agata Muszyńska, Viorel Munteanu, Harry Yang, Jeremy Rotman, et al. Rna- seq data science: From raw data to effective interpretation.Frontiers in Genetics, 14:997383, 2023

work page 2023
[5]

João Fadista, Petter Vikman, Emilia Ottosson Laakso, Inês Guerra Mollet, Jonathan Lou Es- guerra, Jalal Taneera, Petter Storm, Peter Osmark, Claes Ladenvall, Rashmi B Prasad, et al. Global genomic and transcriptomic analysis of human pancreatic islets reveals novel genes in- fluencing glucose metabolism.Proceedings of the National Academy of Sciences, 111...

work page 2014
[6]

Posterior predictive assessment of model fitness via realized discrepancies.Statistica sinica, pages 733–760, 1996

Andrew Gelman, Xiao-Li Meng, and Hal Stern. Posterior predictive assessment of model fitness via realized discrepancies.Statistica sinica, pages 733–760, 1996

work page 1996
[7]

An operative extension of the likelihood ratio test from fuzzy data.Statistical Papers, 29(1):191–203, 1988

Maria A Gil and Maria Rosa Casals. An operative extension of the likelihood ratio test from fuzzy data.Statistical Papers, 29(1):191–203, 1988

work page 1988
[8]

Statistical management of fuzzy elements in random experiments

María Angeles Gil. Statistical management of fuzzy elements in random experiments. part 1: A discussion on treating fuzziness as a kind of randomness.Information sciences, 69(3):229–242, 1993

work page 1993
[9]

An algorithmic and a geometric characterization of coarsening at random.The Annals of Statistics, 36(5):2409–2422, 2008

RD Gill and PD Grünwald. An algorithmic and a geometric characterization of coarsening at random.The Annals of Statistics, 36(5):2409–2422, 2008

work page 2008
[10]

Coarsening at random: Char- acterizations, conjectures, counter-examples

Richard D Gill, Mark J Van Der Laan, and James M Robins. Coarsening at random: Char- acterizations, conjectures, counter-examples. InProceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, pages 255–294. Springer, 1997

work page 1997
[11]

Updating probabilities.Journal of Artificial Intelli- gence Research, 19:243–278, 2003

Peter D Grunwald and Joseph Y Halpern. Updating probabilities.Journal of Artificial Intelli- gence Research, 19:243–278, 2003

work page 2003
[12]

Bm-map: Bayesian mapping of multireads for next-generation sequencing data

Yuan Ji, Yanxun Xu, Qiong Zhang, Kam-Wah Tsui, Yuan Yuan, Clift Norris Jr, Shoudan Liang, and Han Liang. Bm-map: Bayesian mapping of multireads for next-generation sequencing data. Biometrics, 67(4):1215–1224, 2011

work page 2011
[13]

Rsem: accurate transcript quantification from rna-seq data with or without a reference genome.BMC bioinformatics, 12(1):323, 2011

Bo Li and Colin N Dewey. Rsem: accurate transcript quantification from rna-seq data with or without a reference genome.BMC bioinformatics, 12(1):323, 2011

work page 2011
[14]

Constructive definitions of fuzzy random variables

Miguel López-Diaz and Maria Angeles Gil. Constructive definitions of fuzzy random variables. Statistics & probability letters, 36(2):135–143, 1997. 9

work page 1997
[15]

Granular counting of uncertain data.Fuzzy Sets and Systems, 387:108–126, 2020

Corrado Mencar and Witold Pedrycz. Granular counting of uncertain data.Fuzzy Sets and Systems, 387:108–126, 2020

work page 2020
[16]

Springer, 2005

Geert Molenberghs and Geert Verbeke.Models for discrete longitudinal data. Springer, 2005

work page 2005
[17]

Accounting for uncertainty in dna se- quencing data.Trends in Genetics, 31(2):61–66, 2015

Jason A O’Rawe, Scott Ferson, and Gholson J Lyon. Accounting for uncertainty in dna se- quencing data.Trends in Genetics, 31(2):61–66, 2015

work page 2015
[18]

Probability measures of fuzzy events.Journal of mathematical analysis and applications, 23(2):421–427, 1968

Lotfi Asker Zadeh. Probability measures of fuzzy events.Journal of mathematical analysis and applications, 23(2):421–427, 1968. 10

work page 1968