Time series cluster kernels to exploit informative missingness and incomplete label information
Pith reviewed 2026-05-24 23:36 UTC · model grok-4.3
The pith
A kernel for time series clustering exploits informative missingness by representing missing patterns inside mixed-mode mixture models and adds a semi-supervised version that uses incomplete labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors create an informative-missingness kernel by constructing a representation of the missing pattern and incorporating it into mixed-mode mixture models so that the information provided by the missing patterns is effectively exploited, together with a semi-supervised kernel that takes advantage of incomplete label information to learn more accurate similarities. Both kernels are formed as ensembles of Bayesian mixture models and therefore inherit the original TCK properties of handling missing values without imputation and remaining robust to hyperparameter choice.
What carries the argument
Mixed-mode mixture models that receive both the observed time-series values and an explicit representation of the missing pattern as joint inputs to the base learners of an ensemble kernel.
If this is right
- Clustering can proceed on incomplete multivariate time series without any imputation step.
- Missingness patterns themselves become part of the similarity measure and can separate subgroups that standard kernels would merge.
- Partial label information can be used during kernel learning to sharpen the similarity matrix even when most labels are absent.
- The ensemble construction keeps performance stable across choices of the number of mixture components and other hyperparameters.
Where Pith is reading between the lines
- The same missing-pattern representation could be inserted into other kernel families or distance measures that currently assume ignorable missingness.
- In domains where missingness arises from clinical decisions rather than random failure, the kernel may surface previously hidden patient strata.
- Controlled synthetic experiments that vary the strength of the missingness–label association would quantify how much signal is recovered.
Load-bearing premise
The missingness mechanism is informative and a representation of the missing pattern can be incorporated into mixed-mode mixture models without introducing bias or requiring further assumptions on the data-generating process.
What would settle it
On the same electronic-health-record cohort, a direct comparison in which missing patterns are randomly shuffled before kernel construction would show no gain in clustering accuracy for the new kernel over the original TCK.
Figures
read the original abstract
The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning. However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities. Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends the time series cluster kernel (TCK), which uses an ensemble of Bayesian mixture models to handle missing values without imputation, by incorporating a representation of missing patterns into mixed-mode mixture models to exploit informative (non-ignorable) missingness. It also introduces a semi-supervised variant that leverages incomplete label information for improved similarity learning. Effectiveness is claimed via experiments on benchmark datasets and a real-world case study using longitudinal electronic health record data for hospital-acquired infection detection.
Significance. If the central construction holds, the work provides a practical kernel-based approach for time series clustering that directly uses missingness patterns rather than assuming they are ignorable (MAR), which is relevant for domains like medicine where missingness often carries signal. The ensemble Bayesian strategy for hyperparameter robustness is a noted strength, and the semi-supervised extension addresses a common practical constraint.
major comments (2)
- [Methods] The description of the mixed-mode mixture models (abstract and methods) states that the missingness indicator is treated as an additional observed mode, but does not provide an explicit derivation or set of equations showing that this construction remains consistent under MNAR mechanisms without implicitly reintroducing an ignorability assumption; this is load-bearing for the claim of exploiting informative missingness.
- [Experiments] Experiments section: the benchmark and case-study results are asserted to demonstrate effectiveness, but the manuscript does not report quantitative metrics (e.g., clustering accuracy, ARI, or comparison deltas versus standard TCK) or ablation controls that isolate the contribution of the missing-pattern representation; without these, the central empirical claim cannot be evaluated.
minor comments (2)
- [Methods] Notation for the missing-pattern representation should be introduced with a clear definition (e.g., an indicator matrix or embedding) before its use in the mixture model.
- [Case study] The real-world EHR case study would benefit from a brief description of the missingness rate and pattern statistics to contextualize the informative-missingness assumption.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Methods] The description of the mixed-mode mixture models (abstract and methods) states that the missingness indicator is treated as an additional observed mode, but does not provide an explicit derivation or set of equations showing that this construction remains consistent under MNAR mechanisms without implicitly reintroducing an ignorability assumption; this is load-bearing for the claim of exploiting informative missingness.
Authors: We agree that an explicit derivation would strengthen the presentation. In the revised manuscript we will add a dedicated subsection deriving the mixed-mode mixture model likelihood under MNAR, showing that the missingness indicator enters the joint density directly and that no ignorability assumption is reintroduced. revision: yes
-
Referee: [Experiments] Experiments section: the benchmark and case-study results are asserted to demonstrate effectiveness, but the manuscript does not report quantitative metrics (e.g., clustering accuracy, ARI, or comparison deltas versus standard TCK) or ablation controls that isolate the contribution of the missing-pattern representation; without these, the central empirical claim cannot be evaluated.
Authors: We accept that the current version relies on qualitative assertions. The revision will include tables with ARI, NMI and accuracy on the benchmark datasets, direct numerical comparisons against TCK, and ablation results that isolate the missing-pattern component. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central construction extends the existing TCK by explicitly representing missingness patterns as an additional observed mode and incorporating them into mixed-mode Bayesian mixture models within an ensemble. This modeling choice is presented as a direct, non-tautological extension that avoids imputation while exploiting informative missingness; the semi-supervised variant follows the same explicit construction. No load-bearing step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or definitional renaming. The derivation remains self-contained against the stated assumptions and does not invoke prior author work as an external uniqueness theorem.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Missing at random and ignorable missingness assumptions do not hold in many real-world applications such as medicine.
Reference graph
Works this paper leans on
-
[1]
D. B. Rubin, Inference and missing data, Biometrika 63 (3) (1976) 581– 592
work page 1976
-
[2]
G. Molenberghs, Incomplete data in clinical studies: analysis, sensitivity, and sensitivity analysis, Drug Information Journal 43 (4) (2009) 409–429
work page 2009
-
[3]
G. Molenberghs, C. Beunckens, C. Sotto, M. G. Kenward, Every missing- ness not at random model has a missingness at random counterpart with equal fit, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (2) (2008) 371–388
work page 2008
-
[4]
A. S. Allen, P. J. Rathouz, G. A. Satten, Informative missingness in ge- netic association studies: case-parent designs, The American Journal of Human Genetics 72 (3) (2003) 671–680
work page 2003
-
[5]
C.-Y . Guo, J. Cui, L. A. Cupples, Impact of non-ignorable missingness on genetic tests of linkage and /or association using case-parent trios, BMC Genetics 6 (1) (2005) S90
work page 2005
-
[6]
Z. Che, S. Purushotham, K. Cho, D. Sontag, Y . Liu, Recurrent neural net- works for multivariate time series with missing values, Scientific reports 8 (1) (2018) 6085
work page 2018
-
[7]
J. L. Schafer, J. W. Graham, Missing data: our view of the state of the art., Psychological methods 7 (2) (2002) 147
work page 2002
-
[8]
J. L. Schafer, Analysis of incomplete multivariate data, CRC press, 1997
work page 1997
-
[9]
R. J. Little, D. B. Rubin, Statistical analysis with missing data, John Wiley & Sons, 2014
work page 2014
-
[10]
P. J. Garc ´ıa-Laencina, J.-L. Sancho-G´omez, A. R. Figueiras-Vidal, Pat- tern classification with missing data: a review, Neural Computing and Applications 19 (2) (2010) 263–282
work page 2010
-
[11]
S. A. Rahman, Y . Huang, J. Claassen, N. Heintzman, S. Kleinberg, Com- bining Fourier and lagged k-nearest neighbor imputation for biomedical time series data, Journal of Biomedical Informatics 58 (2015) 198 – 207
work page 2015
-
[12]
J. M. Engels, P. Diehr, Imputation of missing longitudinal data: a com- parison of methods, Journal of Clinical Epidemiology 56 (10) (2003) 968 – 976
work page 2003
-
[13]
I. R. White, P. Royston, A. M. Wood, Multiple imputation using chained equations: issues and guidance for practice, Statistics in medicine 30 (4) (2011) 377–399
work page 2011
-
[14]
F. M. Bianchi, L. Livi, A. Ferrante, J. Milosevic, M. Malek, Time series kernel similarities for predicting paroxysmal atrial fibrillation from ECGs, arXiv preprint arXiv:1801.06845
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
K. Ø. Mikalsen, F. M. Bianchi, C. Soguero-Ruiz, S. O. Skrøvseth, R.-O. Lindsetmo, A. Revhaug, R. Jenssen, Learning similarities between irreg- ularly sampled short multivariate time series from EHRs, 3rd ICPR In- ternational Workshop on Pattern Recognition for Healthcare Analytics, Cancun, Mexico, 2016
work page 2016
-
[16]
Z. C. Lipton, D. Kale, R. Wetzel, Directly modeling missing data in se- quences with RNNs: Improved classification of clinical time series, in: Machine Learning for Healthcare Conference, V ol. 56, PMLR, 2016, pp. 253–270
work page 2016
-
[17]
F. M. Bianchi, L. Livi, K. Ø. Mikalsen, M. Kamp ffmeyer, R. Jenssen, Learning representations for multivariate time series with missing data us- ing temporal kernelized autoencoders, arXiv preprint arXiv:1805.03473
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
B. M. Marlin, D. C. Kale, R. G. Khemani, R. C. Wetzel, Unsupervised pattern discovery in electronic health care data using probabilistic clus- tering models, in: Proc. of 2nd ACM SIGHIT Int. Health Informatics Symposium, 2012, pp. 389–398
work page 2012
-
[19]
M. Ghassemi, M. A. F. Pimentel, T. Naumann, T. Brennan, D. A. Clifton, P. Szolovits, M. Feng, A multivariate timeseries modeling approach to severity of illness assessment and forecasting in ICU with sparse, hetero- geneous clinical data, in: Conference on Artificial Intelligence, AAAI, 2015, pp. 446–453. 12
work page 2015
-
[20]
K. Ø. Mikalsen, F. M. Bianchi, C. Soguero-Ruiz, R. Jenssen, Time series cluster kernel for learning similarities between multivariate time series with missing data, Pattern Recognition 76 (2018) 569–581
work page 2018
-
[21]
K. Ø. Mikalsen, C. Soguero-Ruiz, A. Revhaug, R.-O. Lindsetmo, R. Jenssen, et al., Using anchors from free text in electronic health records to diagnose postoperative delirium, Computer Methods and Programs in Biomedicine 152 (Supplement C) (2017) 105 – 114
work page 2017
-
[22]
R. Jenssen, Kernel entropy component analysis, IEEE Trans Pattern Anal Mach Intell 33 (5) (2010) 847–860
work page 2010
-
[23]
G. Camps-Valls, L. Bruzzone, Kernel methods for remote sensing data analysis, John Wiley & Sons, 2009
work page 2009
-
[24]
C. Soguero-Ruiz, A. Revhaug, R.-O. Lindsetmo, K. M. Augestad, R. Jenssen, et al., Support vector feature selection for early detection of anastomosis leakage from bag-of-words in electronic health records, IEEE journal of biomedical and health informatics 20 (5) (2016) 1404– 1415
work page 2016
-
[25]
J. Shawe-Taylor, N. Cristianini, Kernel methods for pattern analysis, Cambridge university press, 2004
work page 2004
-
[26]
H. Chen, F. Tang, P. Tino, X. Yao, Model-based kernel for e fficient time series analysis, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2013, pp. 392–400
work page 2013
-
[27]
D. J. Berndt, J. Cli fford, Using dynamic time warping to find patterns in time series, in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1994, pp. 359–370
work page 1994
-
[28]
P.-F. Marteau, S. Gibet, On recursive edit distance kernels with applica- tion to time series classification, IEEE Transactions on Neural Networks and Learning Systems 26 (6) (2015) 1121–1133
work page 2015
-
[29]
M. Cuturi, J.-P. Vert, O. Birkenes, T. Matsui, A kernel for time series based on global alignments, in: Acoustics, Speech and Signal Processing,
-
[30]
IEEE International Conference on, V ol
ICASSP 2007. IEEE International Conference on, V ol. 2, IEEE, 2007, pp. II–413
work page 2007
-
[31]
M. Cuturi, Fast global alignment kernels, in: Proceedings of the 28th International Conference on Machine Learning, 2011, pp. 929–936
work page 2011
-
[32]
M. G. Baydogan, G. Runger, Time series representation and similar- ity based on local autopatterns, Data Mining and Knowledge Discovery 30 (2) (2016) 476–509
work page 2016
- [33]
-
[34]
T. G. Dietterich, Ensemble methods in machine learning, in: Interna- tional workshop on multiple classifier systems, Springer Berlin Heidel- berg, 2000, pp. 1–15
work page 2000
-
[35]
L. K. Hansen, P. Salamon, Neural network ensembles, IEEE transactions on pattern analysis and machine intelligence 12 (10) (1990) 993–1001
work page 1990
-
[36]
S. Vega-Pons, J. Ruiz-Shulcloper, A survey of clustering ensemble algo- rithms, International Journal of Pattern Recognition and Artificial Intelli- gence 25 (03) (2011) 337–372
work page 2011
-
[37]
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical so- ciety. Series B (methodological) (1977) 1–38
work page 1977
-
[38]
G. McLachlan, T. Krishnan, The EM algorithm and extensions, V ol. 382, John Wiley & Sons, 2007
work page 2007
-
[39]
S. Kullback, R. A. Leibler, On information and su fficiency, The annals of mathematical statistics 22 (1) (1951) 79–86
work page 1951
-
[40]
H. A. Dau, E. Keogh, K. Kamgar, C.-C. M. Yeh, Y . Zhu, S. Gharghabi, C. A. Ratanamahatana, Yanping, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, The ucr time series classification archive, https: //www.cs.ucr.edu/~eamonn/time_series_data_2018/ (October 2018)
work page 2018
-
[41]
Lichman, UCI machine learning repository, http://archive.ics
M. Lichman, UCI machine learning repository, http://archive.ics. uci.edu/ml, accessed: 2018-08-29 (2013)
work page 2018
-
[42]
R. T. Olszewski, Generalized feature extraction for structural pattern recognition in time-series data, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2001)
work page 2001
-
[43]
L. Wang, Z. Wang, S. Liu, An effective multivariate time series classifica- tion approach using echo state network and adaptive differential evolution algorithm, Expert Systems with Applications 43 (2016) 237 – 249
work page 2016
-
[44]
marcocuturi.net/GA.html, accessed: 2018-08-02
Fast global alignment kernel Matlab implementation, http://www. marcocuturi.net/GA.html, accessed: 2018-08-02
work page 2018
-
[45]
S. S. Lewis, R. W. Moehring, L. F. Chen, D. J. Sexton, D. J. Anderson, As- sessing the relative burden of hospital-acquired infections in a network of community hospitals, Infection Control & Hospital Epidemiology 34 (11) (2013) 1229–1230
work page 2013
-
[46]
S. S. Magill, W. Hellinger, J. Cohen, R. Kay, et al., Prevalence of healthcare-associated infections in acute care hospitals in Jacksonville, Florida, Infection Control 33 (03) (2012) 283–291
work page 2012
-
[47]
G. de Lissovoy, K. Fraeman, V . Hutchins, D. Murphy, D. Song, B. B. Vaughn, Surgical site infection: incidence and impact on hospital utiliza- tion and treatment costs, American Journal of Infection Control 37 (5) (2009) 387–397
work page 2009
-
[48]
C. Soguero-Ruiz, A. Revhaug, R.-O. Lindsetmo, R. Jenssen, et al., Pre- dicting colorectal surgical complications using heterogeneous clinical data and kernel methods, Journal of Biomedical Informatics 61 (2016) 87–96
work page 2016
-
[49]
A. S. Strauman, F. M. Bianchi, K. Ø. Mikalsen, M. Kamp ffmeyer, C. Soguero-Ruiz, R. Jenssen, Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks, in: 2018 IEEE EMBS International Conference on Biomedical Health Informatics (BHI), 2018, pp. 307–310
work page 2018
-
[50]
C. Soguero-Ruiz, R. Jenssen, K. M. Augestad, S. O. Skrøvseth, et al., Data-driven temporal prediction of surgical site infection, in: AMIA An- nual Symposium Proceedings, V ol. 2015, American Medical Informatics Association, 2015, p. 1164
work page 2015
- [51]
-
[52]
J. Silvestre, J. Rebanda, C. Lourenc ¸o, P. P´ovoa, Diagnostic accuracy of C- reactive protein and procalcitonin in the early detection of infection after elective colorectal surgery–a pilot study, BMC infectious diseases 14 (1) (2014) 444
work page 2014
-
[53]
F. J. Medina-Fern ´andez, D. J. Garcilazo-Arismendi, R. Garc ´ıa-Mart´ın, L. Rodr´ıguez-Ortiz, J. G´omez-Barbadillo, et al., Validation in colorectal procedures of a useful novel approach for the use of C-reactive protein in postoperative infectious complications, Colorectal Disease 18 (3) (2016) O111–O118
work page 2016
-
[54]
M. R. Angiolini, F. Gavazzi, C. Ridolfi, M. Moro, P. Morelli, M. Mon- torsi, A. Zerbi, Role of C-reactive protein assessment as early predictor of surgical site infections development after pancreaticoduodenectomy, Digestive surgery 33 (4) (2016) 267–275
work page 2016
-
[55]
S. Liu, J. Miao, G. Wang, M. Wang, X. Wu, K. Guo, M. Feng, W. Guan, J. Ren, Risk factors for postoperative surgical site infections in patients with crohn’s disease receiving definitive bowel resection, Scientific Re- ports 7 (1) (2017) 9828
work page 2017
-
[56]
E. Mujagic, W. R. Marti, M. Coslovsky, J. Zeindler, et al., The role of preoperative blood parameters to predict the risk of surgical site infection, The American Journal of Surgery 215 (4) (2018) 651–657
work page 2018
-
[57]
A. Goulart, C. Ferreira, A. Estrada, F. Nogueira, S. Martins, A. Mesquita- Rodrigues, N. Sousa, P. Leao, Early inflammatory biomarkers as predic- tive factors for freedom from infection after colorectal cancer surgery: A prospective cohort study, Surgical infections 19 (4) (2018) 446–450
work page 2018
-
[58]
Z. Hu, G. B. Melton, E. G. Arsoniadis, Y . Wang, M. R. Kwaan, G. J. Simon, Strategies for handling missing clinical data for automated surgi- cal site infection detection from the electronic health record, Journal of Biomedical Informatics 68 (2017) 112–120
work page 2017
-
[59]
S. L. Gans, J. J. Atema, S. Van Dieren, B. G. Koerkamp, M. A. Boer- meester, Diagnostic value of C-reactive protein to rule out infectious com- plications after major abdominal surgery: a systematic review and meta- analysis, International journal of colorectal disease 30 (7) (2015) 861– 873
work page 2015
-
[60]
P. C. Sanger, G. H. van Ramshorst, E. Mercan, et al., A prognostic model of surgical site infection using daily clinical wound assessment, Journal of the American College of Surgeons 223 (2) (2016) 259 – 270.e2
work page 2016
-
[61]
E. H. Lawson, C. Y . Ko, J. L. Adams, W. B. Chow, B. L. Hall, Reliability of evaluating hospital quality by colorectal surgical site infection type, Annals of surgery 258 (6) (2013) 994–1000. 13
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.