Zero-inflated modeling with smoothing on counting tensors
Pith reviewed 2026-05-09 20:32 UTC · model grok-4.3
The pith
A zero-inflated Poisson tensor model with low-rank CP structure and smoothness separates structural from technical zeros in sparse count data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a zero-inflated Poisson tensor model that integrates low-rank CP structure, cluster-specific latent embeddings, and smoothness along ordered genomic loci, thereby jointly capturing multiway dependence, heterogeneity, and structured variation. We develop a Bayes-optimal procedure for distinguishing structural from technical zeros, enabling principled inference and uncertainty quantification. We establish identifiability of the model parameters and derive consistency rates for the proposed estimators in a high-dimensional regime.
What carries the argument
Zero-inflated Poisson tensor model that combines low-rank CP decomposition with cluster-specific latent embeddings and smoothness along genomic loci to model the mean and separate zero types.
If this is right
- Model parameters are identifiable under the stated assumptions.
- Proposed estimators achieve consistency rates in high-dimensional regimes.
- The procedure improves zero detection accuracy and latent structure recovery compared to alternatives.
- Outputs support better performance on downstream tasks including cell clustering and 3D chromatin organization inference.
Where Pith is reading between the lines
- The same tensor modeling approach could apply directly to other multiway sparse count data with excess zeros and ordered indices outside genomics.
- Future mixture extensions for cell populations would address additional layers of heterogeneity without changing the core zero-inflation mechanism.
- Scalable approximations or tensor sketching would be required to scale the method beyond current single-cell Hi-C dataset sizes.
Load-bearing premise
The observed count tensor is generated from a zero-inflated Poisson whose mean structure exactly matches a low-rank CP decomposition that includes cluster embeddings and smoothness along ordered loci.
What would settle it
Simulated data generated from a mean structure that violates the low-rank CP plus smoothness assumption, where the Bayes-optimal zero separation rule performs no better than random classification on held-out tensors.
Figures
read the original abstract
We propose a unified probabilistic framework for sparse count tensors with excess zeros, motivated by single-cell Hi-C data. The observed data are naturally represented as a three-way tensor indexed by genomic loci pairs and cells, exhibiting pronounced sparsity, zero inflation, and cell-to-cell heterogeneity. We introduce a zero-inflated Poisson tensor model that integrates low-rank CP structure, cluster-specific latent embeddings, and smoothness along ordered genomic loci, thereby jointly capturing multiway dependence, heterogeneity, and structured variation. We develop a Bayes-optimal procedure for distinguishing structural from technical zeros, enabling principled inference and uncertainty quantification. We establish identifiability of the model parameters and derive consistency rates for the proposed estimators in a high-dimensional regime. Simulation studies and analyses of single-cell Hi-C data demonstrate improved performance in zero detection, latent structure recovery, and downstream tasks such as clustering and 3D chromatin organization inference. The proposed framework provides a flexible approach for multiway count data with excess zeros and structured dependencies, and suggests several directions for future work, including mixture-based modeling of cell populations and scalable computation for large-scale applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a unified probabilistic framework for sparse count tensors with excess zeros, motivated by single-cell Hi-C data. The observed data are naturally represented as a three-way tensor indexed by genomic loci pairs and cells, exhibiting pronounced sparsity, zero inflation, and cell-to-cell heterogeneity. It introduces a zero-inflated Poisson tensor model that integrates low-rank CP structure, cluster-specific latent embeddings, and smoothness along ordered genomic loci, thereby jointly capturing multiway dependence, heterogeneity, and structured variation. A Bayes-optimal procedure is developed for distinguishing structural from technical zeros, enabling principled inference and uncertainty quantification. The authors establish identifiability of the model parameters and derive consistency rates for the proposed estimators in a high-dimensional regime. Simulation studies and analyses of single-cell Hi-C data demonstrate improved performance in zero detection, latent structure recovery, and downstream tasks such as clustering and 3D chromatin organization inference.
Significance. If the identifiability and consistency results are established as claimed, this work offers a valuable contribution to statistical modeling of multiway count data with excess zeros and structured dependencies. The integration of tensor factorization, clustering via latent embeddings, and smoothness priors is well-motivated for genomic applications like single-cell Hi-C. The Bayes-optimal zero classification procedure is a notable feature for handling the distinction between structural and technical zeros. The empirical validation supports the practical utility of the approach.
minor comments (2)
- The abstract is quite dense; consider breaking it into shorter sentences for improved readability.
- More details on the specific form of the smoothness penalty or prior along genomic loci would aid in understanding the model's implementation.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our work on the zero-inflated Poisson tensor model for sparse count tensors, as well as for recommending minor revision. We are pleased that the integration of CP structure, latent embeddings, smoothness, Bayes-optimal zero classification, identifiability, and consistency results is viewed as a valuable contribution to statistical modeling of multiway count data.
Circularity Check
No significant circularity detected
full rationale
The paper's abstract and summary present a zero-inflated Poisson tensor model with CP structure, embeddings, and smoothness, along with claims of identifiability and consistency rates derived under stated assumptions. No equations, self-citations, or derivations are visible that reduce predictions to fitted inputs by construction. The theoretical results are positioned as independent contributions based on the model matching the data-generating process, with no load-bearing steps that collapse to self-definition or renaming. This is the standard case of a self-contained probabilistic framework without internal circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Cameron, A. C. and Trivedi, P. K. (2013). Regression Analysis of Count Data . Econometric Society Monographs. Cambridge University Press, 2 edition
work page 2013
-
[2]
De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications , 21(4):1253--1278
work page 2000
-
[3]
Dolgov, S., Kressner, D., and Strössner, C. (2021). Functional T ucker approximation using C hebyshev interpolation. SIAM Journal on Scientific Computing , 43(3):A2190--A2210
work page 2021
-
[4]
Duan, Z., Andronescu, M., Schutz, K., et al. (2010). A three-dimensional model of the yeast genome. Nature , 465:363--367
work page 2010
-
[5]
Han, R., Shi, P., and Zhang, A. R. (2024). Guaranteed functional tensor singular value decomposition. Journal of the American Statistical Association , 119(546):995--1007
work page 2024
-
[6]
Hicks, S. C., Townes, F. W., Teng, M., and Irizarry, R. A. (2018). Missing data and technical variability in single-cell RNA -sequencing experiments. Biostatistics , 19(4):562--578
work page 2018
-
[7]
Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data . Bayesian Analysis , 6(2):179 -- 196
work page 2011
-
[8]
Hu, M., Deng, K., Qin, Z., Dixon, J., Selvaraj, S., Fang, J., Ren, B., and Liu, J. S. (2013). Bayesian inference of spatial organizations of chromosomes. PLOS Computational Biology , 9(1):1--14
work page 2013
-
[9]
Hu, Y., Koren, Y., and Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining , pages 263--272
work page 2008
-
[10]
Jing, B.-Y., Li, T., Lyu, Z., and Xia, D. (2021). Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics , 49(6):3181--3205
work page 2021
-
[11]
Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review , 51(3):455--500
work page 2009
-
[12]
Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics , 34(1):1--14
work page 1992
-
[13]
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science , 326(5950):289--293
work page 2009
-
[14]
Luo, Y., Zhao, X., Li, Z., Ng, M. K., and Meng, D. (2023). Low-rank tensor function representation for multi-dimensional data recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence , 46(5):3351--3369
work page 2023
-
[15]
Lyu, Z., Xia, D., and Zhang, Y. (2023). Latent space model for higher-order networks and generalized tensor decomposition. Journal of Computational and Graphical Statistics , 32(4):1320--1336
work page 2023
-
[16]
Ma, Z., Ma, Z., and Yuan, H. (2020). Universal latent space model fitting for large networks with edge covariates. Journal of Machine Learning Research , 21(4):1--67
work page 2020
-
[17]
Martin, T. G., Wintle, B. A., Rhodes, J. R., Kuhnert, P. M., Field, S. A., Low-Choy, S. J., Tyre, A. J., and Possingham, H. P. (2005). Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology Letters , 8(11):1235--1246
work page 2005
-
[18]
Mullahy, J. (1986). Specification and testing of some modified count data models. Journal of Econometrics , 33(3):341--365
work page 1986
- [19]
-
[20]
Ni, C., Duan, Y., Dahleh, M., Wang, M., and Zhang, A. R. (2023). Learning good state and action representations for M arkov decision process via tensor decomposition. Journal of Machine Learning Research , 24(115):1--53
work page 2023
-
[21]
Pierson, E. and Yau, C. (2015). ZIFA: Dimensionality Reduction for Zero-Inflated Single-Cell Gene Expression Analysis . Genome Biology , 16(1):241
work page 2015
-
[22]
Ramani, V., Deng, X., Qiu, R., et al. (2017). Massively multiplex single-cell H i- C . Nature Methods , 14(3):263--266
work page 2017
-
[23]
Rao, S. S. P., Huntley, M. H., Durand, N. C., et al. (2014). A 3 D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell , 159(7):1665--1680
work page 2014
-
[24]
J., Lando, D., Basu, S., et al
Stevens, T. J., Lando, D., Basu, S., et al. (2017). 3 D structures of individual mammalian genomes studied by single-cell H i- C . Nature , 544:59--64
work page 2017
-
[25]
Tuzhilina, E., Hastie, T., and Segal, M. (2024). Statistical curve models for inferring 3D chromatin architecture . The Annals of Applied Statistics , 18(4):2979 -- 3006
work page 2024
-
[26]
Tuzhilina, E., Hastie, T. J., and Segal, M. R. (2022). Principal curve approaches for inferring 3 D chromatin architecture. Biostatistics , 23(2):626--642
work page 2022
-
[27]
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R . Chapman and Hall/CRC, 2 edition
work page 2017
-
[28]
Xie, Q., Han, C., Jin, V. X., and Lin, S. (2022). HiCImpute : A B ayesian hierarchical model for identifying structural zeros and enhancing single-cell H i- C data. PLOS Computational Biology , 18(6):e1010129
work page 2022
-
[29]
Xu, D., Ren, J., Wang, M., et al. (2022). ScHiCSRS : A self-representation smoothing method for imputation of single-cell H i- C data. Bioinformatics , 38(7):1892--1898
work page 2022
-
[30]
Xu, S., Zhen, Y., and Wang, J. (2023). Covariate-assisted community detection in multi-layer networks. Journal of Business & Economic Statistics , 41(3):915--926
work page 2023
-
[31]
Zhang, R., Zhou, T., and Ma, J. (2022). Multiscale and integrative single-cell H i- C analysis with Higashi . Nature Biotechnology , 40:254--261
work page 2022
-
[32]
Zhang, X., Xue, S., and Zhu, J. (2020). A flexible latent space model for multilayer networks. In International Conference on Machine Learning , pages 11288--11297. PMLR
work page 2020
-
[33]
Zhen, Y. and Wang, J. (2023). Community detection in general hypergraph via graph embedding. Journal of the American Statistical Association , 118(543):1620--1629
work page 2023
-
[34]
Zhen, Y., Xu, S., and Wang, J. (2026). Consistent community detection in multi-layer networks with heterogeneous differential privacy. Statistica Sinica , just-accepted
work page 2026
-
[35]
Zhou, T., Zhang, R., and Ma, J. (2019). Robust single-cell H i- C clustering by convolution- and random-walk-based imputation. Genome Biology , 20:240
work page 2019
-
[36]
Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., and Smith, G. M. (2009). Mixed Effects Models and Extensions in Ecology with R . Springer
work page 2009
-
[37]
The Annals of Statistics , volume=
Community detection on mixture multilayer networks via regularized tensor decomposition , author=. The Annals of Statistics , volume=. 2021 , publisher=
work page 2021
-
[38]
International Conference on Machine Learning , pages=
A flexible latent space model for multilayer networks , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[39]
Journal of Computational and Graphical Statistics , volume=
Latent space model for higher-order networks and generalized tensor decomposition , author=. Journal of Computational and Graphical Statistics , volume=. 2023 , publisher=
work page 2023
-
[40]
Journal of Machine Learning Research , year =
Zhuang Ma and Zongming Ma and Hongsong Yuan , title =. Journal of Machine Learning Research , year =
-
[41]
Principal curve approaches for inferring 3
Tuzhilina, Elena and Hastie, Trevor J and Segal, Mark R , journal=. Principal curve approaches for inferring 3. 2022 , publisher=
work page 2022
-
[42]
The Annals of Applied Statistics , number =
Elena Tuzhilina and Trevor Hastie and Mark Segal , title =. The Annals of Applied Statistics , number =. 2024 , doi =
work page 2024
-
[43]
Consistent community detection in multi-layer networks with heterogeneous differential privacy , author=. Statistica Sinica , volume =
-
[44]
Journal of Business & Economic Statistics , volume=
Covariate-assisted community detection in multi-layer networks , author=. Journal of Business & Economic Statistics , volume=. 2023 , publisher=
work page 2023
-
[45]
Dolgov, Sergey and Kressner, Daniel and Strössner, Christoph , journal=. Functional. 2021 , publisher=
work page 2021
-
[46]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Low-rank tensor function representation for multi-dimensional data recovery , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=
work page 2023
-
[47]
Learning Good State and Action Representations for
Ni, Chengzhuo and Duan, Yaqi and Dahleh, Munther and Wang, Mengdi and Zhang, Anru R , journal=. Learning Good State and Action Representations for
-
[48]
Electronic Journal of Probability , number =
Radoslaw Adamczak , title =. Electronic Journal of Probability , number =. 2008 , doi =
work page 2008
-
[49]
Journal of the American Statistical Association , volume=
Community detection in general hypergraph via graph embedding , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
work page 2023
-
[50]
Journal of Chemometrics: A Journal of the Chemometrics Society , volume=
On the uniqueness of multilinear decomposition of N-way arrays , author=. Journal of Chemometrics: A Journal of the Chemometrics Society , volume=. 2000 , publisher=
work page 2000
-
[51]
Journal of the American Statistical Association , volume=
Guaranteed functional tensor singular value decomposition , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=
work page 2024
-
[52]
and Williams, Louise and others , title =
Lieberman-Aiden, Erez and van Berkum, Nynke L. and Williams, Louise and others , title =. Science , volume =
-
[53]
Duan, Zhijun and Andronescu, Mirela and Schutz, Kevin and others , title =. Nature , volume =
-
[54]
Rao, Suhas S. P. and Huntley, Miriam H. and Durand, Neva C. and others , title =. Cell , volume =
-
[55]
and Selvaraj, Siddarth and Yue, Feng and others , title =
Dixon, Jesse R. and Selvaraj, Siddarth and Yue, Feng and others , title =. Nature , volume =
-
[56]
and Chiang, Zhen and Reginato, Pierluigi and others , title =
Payne, Alexander C. and Chiang, Zhen and Reginato, Pierluigi and others , title =. Science , volume =
-
[57]
Trends in Neurosciences , volume =
Marco, Antonio and others , title =. Trends in Neurosciences , volume =
-
[58]
and Varoquaux, Nelle and others , title =
Ay, Ferhat and Bunnik, Evelien M. and Varoquaux, Nelle and others , title =. Genome Research , volume =
-
[59]
Nature Communications , volume =
Lee, Chang-Hyun and Wang, Jing and Zhang, Yuhang and others , title =. Nature Communications , volume =
-
[60]
Capurso, Daniel and Bengtsson, Henrik , title =. Bioinformatics , volume =
-
[61]
Nagano, Takashi and Lubling, Yaniv and Stevens, Timothy J. and others , title =. Nature , volume =
-
[62]
and Lando, David and Basu, Subhajit and others , title =
Stevens, Timothy J. and Lando, David and Basu, Subhajit and others , title =. Nature , volume =
-
[63]
Ramani, Vijay and Deng, Xinxian and Qiu, Ruochi and others , title =. Nature Methods , volume =
-
[64]
Xie, Qian and Han, Chen and Jin, Vincent X. and Lin, Susanna , title =. PLOS Computational Biology , volume =
-
[65]
Xu, Di and Ren, Jie and Wang, Min and others , title =. Bioinformatics , volume =
-
[66]
Zhou, Tianyi and Zhang, Rui and Ma, Jian , title =. Genome Biology , volume =
-
[67]
Nature Biotechnology , volume =
Zhang, Rui and Zhou, Tianyi and Ma, Jian , title =. Nature Biotechnology , volume =
-
[68]
Pierson, Emma and Yau, Christopher , title =. Genome Biology , year =
-
[69]
Martin, Tara G. and Wintle, Brendan A. and Rhodes, Jonathan R. and Kuhnert, Petra M. and Field, Scott A. and Low-Choy, Samantha J. and Tyre, Andrew J. and Possingham, Hugh P. , title =. Ecology Letters , volume =
-
[70]
Zuur, Alain F. and Ieno, Elena N. and Walker, Neil J. and Saveliev, Anatoly A. and Smith, Graham M. , title =. 2009 , publisher =
work page 2009
-
[71]
Collaborative Filtering for Implicit Feedback Datasets , year=
Hu, Yifan and Koren, Yehuda and Volinsky, Chris , booktitle=. Collaborative Filtering for Implicit Feedback Datasets , year=
-
[72]
Hicks, Stephanie C and Townes, F William and Teng, Mingxiang and Irizarry, Rafael A , title =. Biostatistics , volume =. 2018 , month =
work page 2018
-
[73]
Specification and testing of some modified count data models , journal =. 1986 , issn =. doi:https://doi.org/10.1016/0304-4076(86)90002-3 , url =
-
[74]
Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing , volume =
Diane Lambert , journal =. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing , volume =
-
[75]
Cameron, A. Colin and Trivedi, Pravin K. , year=. Regression Analysis of Count Data , place=
- [76]
-
[77]
SIAM Journal on Matrix Analysis and Applications , volume =
De Lathauwer, Lieven and De Moor, Bart and Vandewalle, Joos , title =. SIAM Journal on Matrix Analysis and Applications , volume =
- [78]
-
[79]
Wood, Simon N. , title =. 2017 , publisher =. doi:10.1201/9781315370279 , url =
-
[80]
Hu, Ming AND Deng, Ke AND Qin, Zhaohui AND Dixon, Jesse AND Selvaraj, Siddarth AND Fang, Jennifer AND Ren, Bing AND Liu, Jun S. , journal =. Bayesian Inference of Spatial Organizations of Chromosomes , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.