pith. sign in

arxiv: 2604.22088 · v1 · submitted 2026-04-23 · 📊 stat.ME · stat.AP

Zero-inflated modeling with smoothing on counting tensors

Pith reviewed 2026-05-09 20:32 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords zero-inflated Poissontensor decompositionsingle-cell Hi-Clow-rank CPstructural zerosidentifiabilityconsistency ratesgenomic smoothness
0
0 comments X

The pith

A zero-inflated Poisson tensor model with low-rank CP structure and smoothness separates structural from technical zeros in sparse count data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a probabilistic model for three-way count tensors that have many zeros, as seen in single-cell Hi-C experiments measuring interactions between genomic loci across cells. It combines a zero-inflated Poisson distribution with a low-rank canonical polyadic decomposition, latent cluster embeddings for cells, and smoothing constraints along the ordered genomic positions. This setup allows a Bayes-optimal way to tell apart biologically meaningful structural zeros from technical artifacts due to measurement limits, while providing uncertainty measures and consistent parameter estimates in high dimensions. The approach is shown to enhance zero detection and recover latent patterns better than alternatives in simulations and real data, supporting tasks like cell clustering and 3D chromatin structure inference.

Core claim

We introduce a zero-inflated Poisson tensor model that integrates low-rank CP structure, cluster-specific latent embeddings, and smoothness along ordered genomic loci, thereby jointly capturing multiway dependence, heterogeneity, and structured variation. We develop a Bayes-optimal procedure for distinguishing structural from technical zeros, enabling principled inference and uncertainty quantification. We establish identifiability of the model parameters and derive consistency rates for the proposed estimators in a high-dimensional regime.

What carries the argument

Zero-inflated Poisson tensor model that combines low-rank CP decomposition with cluster-specific latent embeddings and smoothness along genomic loci to model the mean and separate zero types.

If this is right

  • Model parameters are identifiable under the stated assumptions.
  • Proposed estimators achieve consistency rates in high-dimensional regimes.
  • The procedure improves zero detection accuracy and latent structure recovery compared to alternatives.
  • Outputs support better performance on downstream tasks including cell clustering and 3D chromatin organization inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tensor modeling approach could apply directly to other multiway sparse count data with excess zeros and ordered indices outside genomics.
  • Future mixture extensions for cell populations would address additional layers of heterogeneity without changing the core zero-inflation mechanism.
  • Scalable approximations or tensor sketching would be required to scale the method beyond current single-cell Hi-C dataset sizes.

Load-bearing premise

The observed count tensor is generated from a zero-inflated Poisson whose mean structure exactly matches a low-rank CP decomposition that includes cluster embeddings and smoothness along ordered loci.

What would settle it

Simulated data generated from a mean structure that violates the low-rank CP plus smoothness assumption, where the Bayes-optimal zero separation rule performs no better than random classification on held-out tensors.

Figures

Figures reproduced from arXiv: 2604.22088 by Elena Tuzhilina, Yaoming Zhen.

Figure 1
Figure 1. Figure 1: Averaged relative errors achieved by various initialization methods over 50 simu [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance of the fitting procedure as the number of cells [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ARI vs. number of clusters R for different recovered quantities. Different panels represent different sparsity levels. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance of ZITS on the human’s single-cell Hi-C dataset. Left: ARI versus embedding dimension L for chromosome 1, averaged over 10 subsamples. Right: ARI distri￾butions across chromosomes with L = 10. We next compare ZITS with two structural-zero-aware imputation methods, HiCImpute (Xie et al., 2022) and scHiCSRS (Xu et al., 2022). We use the publicly available R im￾plementations of both methods and ru… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of ZITS with competing methods using ARI. Diamonds indicate the mean ARI over 10 subsamples. Left: count-based imputation methods that detect structural zeros. Middle: imputation methods for binarized contact matrices. Right: performance based on cell embeddings. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance of the fitting procedure as the number of clusters [PITH_FULL_IMAGE:figures/full_fig_p039_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of embeddings obtained via ZITS and CP chromosome 3. 39 [PITH_FULL_IMAGE:figures/full_fig_p039_7.png] view at source ↗
read the original abstract

We propose a unified probabilistic framework for sparse count tensors with excess zeros, motivated by single-cell Hi-C data. The observed data are naturally represented as a three-way tensor indexed by genomic loci pairs and cells, exhibiting pronounced sparsity, zero inflation, and cell-to-cell heterogeneity. We introduce a zero-inflated Poisson tensor model that integrates low-rank CP structure, cluster-specific latent embeddings, and smoothness along ordered genomic loci, thereby jointly capturing multiway dependence, heterogeneity, and structured variation. We develop a Bayes-optimal procedure for distinguishing structural from technical zeros, enabling principled inference and uncertainty quantification. We establish identifiability of the model parameters and derive consistency rates for the proposed estimators in a high-dimensional regime. Simulation studies and analyses of single-cell Hi-C data demonstrate improved performance in zero detection, latent structure recovery, and downstream tasks such as clustering and 3D chromatin organization inference. The proposed framework provides a flexible approach for multiway count data with excess zeros and structured dependencies, and suggests several directions for future work, including mixture-based modeling of cell populations and scalable computation for large-scale applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes a unified probabilistic framework for sparse count tensors with excess zeros, motivated by single-cell Hi-C data. The observed data are naturally represented as a three-way tensor indexed by genomic loci pairs and cells, exhibiting pronounced sparsity, zero inflation, and cell-to-cell heterogeneity. It introduces a zero-inflated Poisson tensor model that integrates low-rank CP structure, cluster-specific latent embeddings, and smoothness along ordered genomic loci, thereby jointly capturing multiway dependence, heterogeneity, and structured variation. A Bayes-optimal procedure is developed for distinguishing structural from technical zeros, enabling principled inference and uncertainty quantification. The authors establish identifiability of the model parameters and derive consistency rates for the proposed estimators in a high-dimensional regime. Simulation studies and analyses of single-cell Hi-C data demonstrate improved performance in zero detection, latent structure recovery, and downstream tasks such as clustering and 3D chromatin organization inference.

Significance. If the identifiability and consistency results are established as claimed, this work offers a valuable contribution to statistical modeling of multiway count data with excess zeros and structured dependencies. The integration of tensor factorization, clustering via latent embeddings, and smoothness priors is well-motivated for genomic applications like single-cell Hi-C. The Bayes-optimal zero classification procedure is a notable feature for handling the distinction between structural and technical zeros. The empirical validation supports the practical utility of the approach.

minor comments (2)
  1. The abstract is quite dense; consider breaking it into shorter sentences for improved readability.
  2. More details on the specific form of the smoothness penalty or prior along genomic loci would aid in understanding the model's implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work on the zero-inflated Poisson tensor model for sparse count tensors, as well as for recommending minor revision. We are pleased that the integration of CP structure, latent embeddings, smoothness, Bayes-optimal zero classification, identifiability, and consistency results is viewed as a valuable contribution to statistical modeling of multiway count data.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's abstract and summary present a zero-inflated Poisson tensor model with CP structure, embeddings, and smoothness, along with claims of identifiability and consistency rates derived under stated assumptions. No equations, self-citations, or derivations are visible that reduce predictions to fitted inputs by construction. The theoretical results are positioned as independent contributions based on the model matching the data-generating process, with no load-bearing steps that collapse to self-definition or renaming. This is the standard case of a self-contained probabilistic framework without internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the model implicitly assumes a zero-inflated Poisson generative process, low-rank CP structure, cluster embeddings, and genomic smoothness, but none are quantified or justified here.

pith-pipeline@v0.9.0 · 5486 in / 1453 out tokens · 37512 ms · 2026-05-09T20:32:54.684616+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages

  1. [1]

    Cameron, A. C. and Trivedi, P. K. (2013). Regression Analysis of Count Data . Econometric Society Monographs. Cambridge University Press, 2 edition

  2. [2]

    De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications , 21(4):1253--1278

  3. [3]

    Dolgov, S., Kressner, D., and Strössner, C. (2021). Functional T ucker approximation using C hebyshev interpolation. SIAM Journal on Scientific Computing , 43(3):A2190--A2210

  4. [4]

    Duan, Z., Andronescu, M., Schutz, K., et al. (2010). A three-dimensional model of the yeast genome. Nature , 465:363--367

  5. [5]

    Han, R., Shi, P., and Zhang, A. R. (2024). Guaranteed functional tensor singular value decomposition. Journal of the American Statistical Association , 119(546):995--1007

  6. [6]

    C., Townes, F

    Hicks, S. C., Townes, F. W., Teng, M., and Irizarry, R. A. (2018). Missing data and technical variability in single-cell RNA -sequencing experiments. Biostatistics , 19(4):562--578

  7. [7]

    Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data . Bayesian Analysis , 6(2):179 -- 196

  8. [8]

    Hu, M., Deng, K., Qin, Z., Dixon, J., Selvaraj, S., Fang, J., Ren, B., and Liu, J. S. (2013). Bayesian inference of spatial organizations of chromosomes. PLOS Computational Biology , 9(1):1--14

  9. [9]

    Hu, Y., Koren, Y., and Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining , pages 263--272

  10. [10]

    Jing, B.-Y., Li, T., Lyu, Z., and Xia, D. (2021). Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics , 49(6):3181--3205

  11. [11]

    Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review , 51(3):455--500

  12. [12]

    Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics , 34(1):1--14

  13. [13]

    L., Williams, L., et al

    Lieberman-Aiden, E., van Berkum, N. L., Williams, L., et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science , 326(5950):289--293

  14. [14]

    K., and Meng, D

    Luo, Y., Zhao, X., Li, Z., Ng, M. K., and Meng, D. (2023). Low-rank tensor function representation for multi-dimensional data recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence , 46(5):3351--3369

  15. [15]

    Lyu, Z., Xia, D., and Zhang, Y. (2023). Latent space model for higher-order networks and generalized tensor decomposition. Journal of Computational and Graphical Statistics , 32(4):1320--1336

  16. [16]

    Ma, Z., Ma, Z., and Yuan, H. (2020). Universal latent space model fitting for large networks with edge covariates. Journal of Machine Learning Research , 21(4):1--67

  17. [17]

    G., Wintle, B

    Martin, T. G., Wintle, B. A., Rhodes, J. R., Kuhnert, P. M., Field, S. A., Low-Choy, S. J., Tyre, A. J., and Possingham, H. P. (2005). Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology Letters , 8(11):1235--1246

  18. [18]

    Mullahy, J. (1986). Specification and testing of some modified count data models. Journal of Econometrics , 33(3):341--365

  19. [19]

    J., et al

    Nagano, T., Lubling, Y., Stevens, T. J., et al. (2017). Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature , 547(7661):61--67

  20. [20]

    Ni, C., Duan, Y., Dahleh, M., Wang, M., and Zhang, A. R. (2023). Learning good state and action representations for M arkov decision process via tensor decomposition. Journal of Machine Learning Research , 24(115):1--53

  21. [21]

    and Yau, C

    Pierson, E. and Yau, C. (2015). ZIFA: Dimensionality Reduction for Zero-Inflated Single-Cell Gene Expression Analysis . Genome Biology , 16(1):241

  22. [22]

    Ramani, V., Deng, X., Qiu, R., et al. (2017). Massively multiplex single-cell H i- C . Nature Methods , 14(3):263--266

  23. [23]

    Rao, S. S. P., Huntley, M. H., Durand, N. C., et al. (2014). A 3 D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell , 159(7):1665--1680

  24. [24]

    J., Lando, D., Basu, S., et al

    Stevens, T. J., Lando, D., Basu, S., et al. (2017). 3 D structures of individual mammalian genomes studied by single-cell H i- C . Nature , 544:59--64

  25. [25]

    Tuzhilina, E., Hastie, T., and Segal, M. (2024). Statistical curve models for inferring 3D chromatin architecture . The Annals of Applied Statistics , 18(4):2979 -- 3006

  26. [26]

    J., and Segal, M

    Tuzhilina, E., Hastie, T. J., and Segal, M. R. (2022). Principal curve approaches for inferring 3 D chromatin architecture. Biostatistics , 23(2):626--642

  27. [27]

    Wood, S. N. (2017). Generalized Additive Models: An Introduction with R . Chapman and Hall/CRC, 2 edition

  28. [28]

    X., and Lin, S

    Xie, Q., Han, C., Jin, V. X., and Lin, S. (2022). HiCImpute : A B ayesian hierarchical model for identifying structural zeros and enhancing single-cell H i- C data. PLOS Computational Biology , 18(6):e1010129

  29. [29]

    Xu, D., Ren, J., Wang, M., et al. (2022). ScHiCSRS : A self-representation smoothing method for imputation of single-cell H i- C data. Bioinformatics , 38(7):1892--1898

  30. [30]

    Xu, S., Zhen, Y., and Wang, J. (2023). Covariate-assisted community detection in multi-layer networks. Journal of Business & Economic Statistics , 41(3):915--926

  31. [31]

    Zhang, R., Zhou, T., and Ma, J. (2022). Multiscale and integrative single-cell H i- C analysis with Higashi . Nature Biotechnology , 40:254--261

  32. [32]

    Zhang, X., Xue, S., and Zhu, J. (2020). A flexible latent space model for multilayer networks. In International Conference on Machine Learning , pages 11288--11297. PMLR

  33. [33]

    and Wang, J

    Zhen, Y. and Wang, J. (2023). Community detection in general hypergraph via graph embedding. Journal of the American Statistical Association , 118(543):1620--1629

  34. [34]

    Zhen, Y., Xu, S., and Wang, J. (2026). Consistent community detection in multi-layer networks with heterogeneous differential privacy. Statistica Sinica , just-accepted

  35. [35]

    Zhou, T., Zhang, R., and Ma, J. (2019). Robust single-cell H i- C clustering by convolution- and random-walk-based imputation. Genome Biology , 20:240

  36. [36]

    F., Ieno, E

    Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., and Smith, G. M. (2009). Mixed Effects Models and Extensions in Ecology with R . Springer

  37. [37]

    The Annals of Statistics , volume=

    Community detection on mixture multilayer networks via regularized tensor decomposition , author=. The Annals of Statistics , volume=. 2021 , publisher=

  38. [38]

    International Conference on Machine Learning , pages=

    A flexible latent space model for multilayer networks , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  39. [39]

    Journal of Computational and Graphical Statistics , volume=

    Latent space model for higher-order networks and generalized tensor decomposition , author=. Journal of Computational and Graphical Statistics , volume=. 2023 , publisher=

  40. [40]

    Journal of Machine Learning Research , year =

    Zhuang Ma and Zongming Ma and Hongsong Yuan , title =. Journal of Machine Learning Research , year =

  41. [41]

    Principal curve approaches for inferring 3

    Tuzhilina, Elena and Hastie, Trevor J and Segal, Mark R , journal=. Principal curve approaches for inferring 3. 2022 , publisher=

  42. [42]

    The Annals of Applied Statistics , number =

    Elena Tuzhilina and Trevor Hastie and Mark Segal , title =. The Annals of Applied Statistics , number =. 2024 , doi =

  43. [43]

    Statistica Sinica , volume =

    Consistent community detection in multi-layer networks with heterogeneous differential privacy , author=. Statistica Sinica , volume =

  44. [44]

    Journal of Business & Economic Statistics , volume=

    Covariate-assisted community detection in multi-layer networks , author=. Journal of Business & Economic Statistics , volume=. 2023 , publisher=

  45. [45]

    Functional

    Dolgov, Sergey and Kressner, Daniel and Strössner, Christoph , journal=. Functional. 2021 , publisher=

  46. [46]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Low-rank tensor function representation for multi-dimensional data recovery , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=

  47. [47]

    Learning Good State and Action Representations for

    Ni, Chengzhuo and Duan, Yaqi and Dahleh, Munther and Wang, Mengdi and Zhang, Anru R , journal=. Learning Good State and Action Representations for

  48. [48]

    Electronic Journal of Probability , number =

    Radoslaw Adamczak , title =. Electronic Journal of Probability , number =. 2008 , doi =

  49. [49]

    Journal of the American Statistical Association , volume=

    Community detection in general hypergraph via graph embedding , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

  50. [50]

    Journal of Chemometrics: A Journal of the Chemometrics Society , volume=

    On the uniqueness of multilinear decomposition of N-way arrays , author=. Journal of Chemometrics: A Journal of the Chemometrics Society , volume=. 2000 , publisher=

  51. [51]

    Journal of the American Statistical Association , volume=

    Guaranteed functional tensor singular value decomposition , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

  52. [52]

    and Williams, Louise and others , title =

    Lieberman-Aiden, Erez and van Berkum, Nynke L. and Williams, Louise and others , title =. Science , volume =

  53. [53]

    Nature , volume =

    Duan, Zhijun and Andronescu, Mirela and Schutz, Kevin and others , title =. Nature , volume =

  54. [54]

    Rao, Suhas S. P. and Huntley, Miriam H. and Durand, Neva C. and others , title =. Cell , volume =

  55. [55]

    and Selvaraj, Siddarth and Yue, Feng and others , title =

    Dixon, Jesse R. and Selvaraj, Siddarth and Yue, Feng and others , title =. Nature , volume =

  56. [56]

    and Chiang, Zhen and Reginato, Pierluigi and others , title =

    Payne, Alexander C. and Chiang, Zhen and Reginato, Pierluigi and others , title =. Science , volume =

  57. [57]

    Trends in Neurosciences , volume =

    Marco, Antonio and others , title =. Trends in Neurosciences , volume =

  58. [58]

    and Varoquaux, Nelle and others , title =

    Ay, Ferhat and Bunnik, Evelien M. and Varoquaux, Nelle and others , title =. Genome Research , volume =

  59. [59]

    Nature Communications , volume =

    Lee, Chang-Hyun and Wang, Jing and Zhang, Yuhang and others , title =. Nature Communications , volume =

  60. [60]

    Bioinformatics , volume =

    Capurso, Daniel and Bengtsson, Henrik , title =. Bioinformatics , volume =

  61. [61]

    and others , title =

    Nagano, Takashi and Lubling, Yaniv and Stevens, Timothy J. and others , title =. Nature , volume =

  62. [62]

    and Lando, David and Basu, Subhajit and others , title =

    Stevens, Timothy J. and Lando, David and Basu, Subhajit and others , title =. Nature , volume =

  63. [63]

    Nature Methods , volume =

    Ramani, Vijay and Deng, Xinxian and Qiu, Ruochi and others , title =. Nature Methods , volume =

  64. [64]

    and Lin, Susanna , title =

    Xie, Qian and Han, Chen and Jin, Vincent X. and Lin, Susanna , title =. PLOS Computational Biology , volume =

  65. [65]

    Bioinformatics , volume =

    Xu, Di and Ren, Jie and Wang, Min and others , title =. Bioinformatics , volume =

  66. [66]

    Genome Biology , volume =

    Zhou, Tianyi and Zhang, Rui and Ma, Jian , title =. Genome Biology , volume =

  67. [67]

    Nature Biotechnology , volume =

    Zhang, Rui and Zhou, Tianyi and Ma, Jian , title =. Nature Biotechnology , volume =

  68. [68]

    Genome Biology , year =

    Pierson, Emma and Yau, Christopher , title =. Genome Biology , year =

  69. [69]

    and Wintle, Brendan A

    Martin, Tara G. and Wintle, Brendan A. and Rhodes, Jonathan R. and Kuhnert, Petra M. and Field, Scott A. and Low-Choy, Samantha J. and Tyre, Andrew J. and Possingham, Hugh P. , title =. Ecology Letters , volume =

  70. [70]

    and Ieno, Elena N

    Zuur, Alain F. and Ieno, Elena N. and Walker, Neil J. and Saveliev, Anatoly A. and Smith, Graham M. , title =. 2009 , publisher =

  71. [71]

    Collaborative Filtering for Implicit Feedback Datasets , year=

    Hu, Yifan and Koren, Yehuda and Volinsky, Chris , booktitle=. Collaborative Filtering for Implicit Feedback Datasets , year=

  72. [72]

    Biostatistics , volume =

    Hicks, Stephanie C and Townes, F William and Teng, Mingxiang and Irizarry, Rafael A , title =. Biostatistics , volume =. 2018 , month =

  73. [73]

    1986 , issn =

    Specification and testing of some modified count data models , journal =. 1986 , issn =. doi:https://doi.org/10.1016/0304-4076(86)90002-3 , url =

  74. [74]

    Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing , volume =

    Diane Lambert , journal =. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing , volume =

  75. [75]

    Colin and Trivedi, Pravin K

    Cameron, A. Colin and Trivedi, Pravin K. , year=. Regression Analysis of Count Data , place=

  76. [76]

    and Bader, Brett W

    Kolda, Tamara G. and Bader, Brett W. , title =. SIAM Review , volume =

  77. [77]

    SIAM Journal on Matrix Analysis and Applications , volume =

    De Lathauwer, Lieven and De Moor, Bart and Vandewalle, Joos , title =. SIAM Journal on Matrix Analysis and Applications , volume =

  78. [78]

    Hoff , title =

    Peter D. Hoff , title =. Bayesian Analysis , number =. 2011 , doi =

  79. [79]

    APACrefauthors \ 2017

    Wood, Simon N. , title =. 2017 , publisher =. doi:10.1201/9781315370279 , url =

  80. [80]

    , journal =

    Hu, Ming AND Deng, Ke AND Qin, Zhaohui AND Dixon, Jesse AND Selvaraj, Siddarth AND Fang, Jennifer AND Ren, Bing AND Liu, Jun S. , journal =. Bayesian Inference of Spatial Organizations of Chromosomes , year =

Showing first 80 references.