pith. sign in

arxiv: 2604.11965 · v1 · submitted 2026-04-13 · 💻 cs.DC

Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics

Pith reviewed 2026-05-10 15:26 UTC · model grok-4.3

classification 💻 cs.DC
keywords visual analyticsHPC monitoringanomaly detectiondimensionality reductioncontrastive learningdynamic mode decompositionnode clusteringsystem behavior
0
0 comments X

The pith

A visual analytics system uses two-phase dimensionality reduction and contrastive learning to automatically cluster unlabeled HPC node data and surface subtle behavioral differences for anomaly interpretation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes an interactive visual analytics system built for high-performance computing environments where monitoring data arrives unlabeled and high-dimensional. It combines two-phase dimensionality reduction, contrastive learning, and multi-resolution dynamic mode decomposition inside a customizable interface that lets users explore clusters, compare temporal patterns across groups, and test hypotheses against metrics such as CPU and memory activity. The authors show through two case studies that the workflow identifies meaningful node groupings and exposes intra- and inter-group differences that aid anomaly detection. A sympathetic reader cares because large-scale systems generate more data than humans can inspect manually, and the approach offers a concrete way to turn raw traces into interpretable pictures of system health.

Core claim

The authors claim that embedding two-phase dimensionality reduction with contrastive learning and multi-resolution dynamic mode decomposition in an interactive visual interface enables automatic identification of meaningful node clusters and revelation of subtle behavioral differences within and across groups in real HPC monitoring datasets, with expert feedback confirming improved anomalous behavior detection and interpretation.

What carries the argument

Two-phase dimensionality reduction paired with contrastive learning and multi-resolution dynamic mode decomposition, which extracts inter-cluster and intra-cluster variations from high-dimensional time-series metrics to drive visual cluster exploration and temporal pattern comparison.

If this is right

  • Users gain the ability to compare temporal patterns across node groups using customizable visual encodings and baselines.
  • Integration of multiple metrics such as CPU utilization and memory activity produces a holistic view of system behavior.
  • The same workflow applies to anomaly interpretation tasks in cloud, edge, and distributed computing infrastructures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The interface could support real-time streaming updates so operators detect emerging anomalies before they affect job completion.
  • If cluster labels transfer across similar hardware generations, the system might reduce the need for per-system retraining.
  • Extending the same reduction steps to network traffic or sensor streams could address comparable unlabeled high-dimensional monitoring problems outside HPC.

Load-bearing premise

The described combination of dimensionality reduction, contrastive learning, and dynamic mode decomposition will reliably produce human-interpretable clusters and patterns from raw, unlabeled HPC monitoring data without extensive manual tuning.

What would settle it

Deploying the system on fresh HPC traces and finding that the resulting clusters show no correspondence to known hardware partitions or expert-identified anomalies would falsify the claim that the workflow surfaces meaningful behavioral groups.

Figures

Figures reproduced from arXiv: 2604.11965 by Allison Austin, Kwan-Liu Ma, Michael E. Papka, Shilpika, Venkatram Vishwanath, Yan To Linus Lam, Yun-Hsin Kuo.

Figure 1
Figure 1. Figure 1: Analysis workflow of our system. Our analysis is driven [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The system interface for intra- and inter-cluster analysis of multivariate HPC monitoring data. The visualization comprises [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization interaction with mrDMD z-score analysis. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Intra-cluster analysis results of cluster 0 in the Ganglia monitoring dataset. Four highly-contributing metrics were [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Inter-cluster analysis results of Ganglia logs for “CPU [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Inter-cluster analysis results for clusters 1 and 2 ( [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Intra-cluster analysis results for cluster 0. We com [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of pipeline completion time, showing how [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

In high-performance computing (HPC) environments, system monitoring data is often unlabeled and high-dimensional, making it difficult to reliably detect and understand anomalous computing nodes. The growing scale and dimensionality of the collected datasets present significant challenges for analysis and visualization tasks. We present a scalable, interactive visual analytics system to support exploration, explanation, and comparison of compute node behaviors in HPC systems. Our approach integrates an analysis workflow combining two-phase dimensionality reduction with contrastive learning and multi-resolution dynamic mode decomposition to capture inter- and intra-cluster variations. These analyses are embedded in an interactive interface that enables users to explore clusters, compare temporal patterns, and iteratively refine hypotheses through customizable visual encodings and baselines. By integrating metrics such as CPU utilization and memory activity, the system offers a holistic view of large-scale system behavior. We demonstrate the utility of our tool through two case studies. In both cases, our system automatically identified meaningful node clusters and revealed subtle behavioral differences within and across node groups. Expert feedback confirmed the effectiveness of our tool in enhancing anomalous behavior detection and interpretation. Our work advances scalable visual analysis for HPC monitoring and has broader implications for cloud, edge computing, and distributed infrastructures where interpretability and behavior analysis are critical to operational efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a scalable interactive visual analytics system for unlabeled high-dimensional HPC monitoring data. It integrates two-phase dimensionality reduction, contrastive learning, and multi-resolution dynamic mode decomposition to identify node clusters and temporal patterns, embedded in a customizable interface for exploration and hypothesis refinement. Utility is demonstrated via two case studies claiming automatic identification of meaningful clusters and subtle behavioral differences, plus expert feedback on improved anomaly detection.

Significance. If the effectiveness claims hold under quantitative scrutiny, the work could meaningfully advance visual analytics for large-scale system monitoring in HPC, with extensions to cloud and edge environments. The combination of multiple analysis techniques into an interactive tool addresses a real operational need, and the emphasis on interpretability is a strength. However, the current reliance on qualitative evidence limits the assessed impact.

major comments (2)
  1. [Abstract and Case Studies] Abstract and Case Studies section: the central claim that the system 'automatically identified meaningful node clusters' and 'revealed subtle behavioral differences' is supported only by qualitative descriptions and expert feedback. No cluster validity metrics (e.g., silhouette score, Davies-Bouldin index), anomaly detection precision/recall, temporal pattern fidelity measures, or comparisons against baselines (PCA + k-means, t-SNE alone) are reported. This makes 'meaningful' and 'subtle' subjective and renders the effectiveness assertion load-bearing but unverified.
  2. [Abstract and Workflow] Abstract and Workflow description: the assumption that the specific pipeline (two-phase DR + contrastive learning + multi-resolution DMD) reliably surfaces interpretable results from unlabeled data without extensive tuning is not tested via ablation studies or robustness checks to hyperparameter choices. This directly affects the reproducibility and generalizability of the reported case-study outcomes.
minor comments (2)
  1. [Abstract] The abstract mentions 'integrating metrics such as CPU utilization and memory activity' but does not specify the full set of monitored features or their preprocessing; adding this detail would improve clarity.
  2. [Case Studies] Expert feedback is cited as confirming effectiveness, but the number of experts, their backgrounds, and the protocol used are not detailed; this would strengthen the qualitative evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that strengthening the quantitative support for our claims will improve the manuscript and outline specific revisions below.

read point-by-point responses
  1. Referee: [Abstract and Case Studies] Abstract and Case Studies section: the central claim that the system 'automatically identified meaningful node clusters' and 'revealed subtle behavioral differences' is supported only by qualitative descriptions and expert feedback. No cluster validity metrics (e.g., silhouette score, Davies-Bouldin index), anomaly detection precision/recall, temporal pattern fidelity measures, or comparisons against baselines (PCA + k-means, t-SNE alone) are reported. This makes 'meaningful' and 'subtle' subjective and renders the effectiveness assertion load-bearing but unverified.

    Authors: We agree that the current validation relies on qualitative case studies and expert feedback, which is common for exploratory visual analytics tools on unlabeled data but leaves the claims open to subjectivity. In the revised manuscript we will add internal cluster validity metrics (silhouette score and Davies-Bouldin index) computed on the two-phase reduced embeddings for both case studies. We will also report comparisons against baselines (PCA + k-means and t-SNE + k-means) using the same metrics, and include temporal pattern fidelity measures derived from the multi-resolution DMD reconstructions. Where expert-labeled subsets exist, we will compute anomaly detection precision/recall. These additions will provide objective evidence supporting the reported clusters and behavioral differences. revision: yes

  2. Referee: [Abstract and Workflow] Abstract and Workflow description: the assumption that the specific pipeline (two-phase DR + contrastive learning + multi-resolution DMD) reliably surfaces interpretable results from unlabeled data without extensive tuning is not tested via ablation studies or robustness checks to hyperparameter choices. This directly affects the reproducibility and generalizability of the reported case-study outcomes.

    Authors: We acknowledge that the absence of ablation studies limits demonstrated robustness. The revised manuscript will include a dedicated ablation section that removes or replaces individual components (two-phase DR, contrastive learning, multi-resolution DMD) and varies key hyperparameters (e.g., embedding dimensions, contrastive loss weights, DMD rank). Each variant will be evaluated using the same cluster validity and fidelity metrics, with results reported for both case studies. This will directly address reproducibility and show that the full pipeline yields superior interpretability compared with ablated versions. revision: yes

Circularity Check

0 steps flagged

No circularity: system description and qualitative case studies contain no derivations or fitted predictions

full rationale

The paper presents a practical visual analytics workflow (two-phase dimensionality reduction + contrastive learning + multi-resolution DMD) and evaluates it solely through two qualitative case studies plus expert feedback. No equations, fitted parameters, or predictions are defined anywhere in the provided text. Central claims rest on interpretive descriptions of cluster identification rather than any reduction to self-referential inputs, self-citations, or renamed known results. The workflow is presented as a tool implementation, not a mathematical derivation, making the analysis self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a systems and visualization paper describing a software tool and analysis workflow. No free parameters, mathematical axioms, or invented physical entities are involved.

pith-pipeline@v0.9.0 · 5537 in / 1129 out tokens · 64658 ms · 2026-05-10T15:26:05.447912+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

  1. [1]

    The Ganglia Distributed Monitoring System: Design, Implementation, and Experience,

    M. L. Massie, B. N. Chun, and D. E. Culler, “The Ganglia Distributed Monitoring System: Design, Implementation, and Experience,”Parallel Computing, vol. 30, no. 7, pp. 817–840, 2004

  2. [2]

    Mobile-Based Network Monitoring System Using Zabbix and Telegram,

    A. Mardiyono, W. Sholihah, and F. Hakim, “Mobile-Based Network Monitoring System Using Zabbix and Telegram,” inProceedings of the 3rd International Conference on Computer and Informatics Engineering (IC2IE), 2020, pp. 473–477

  3. [3]

    The Industry Standard in IT Infrastructure Monitoring,

    Nagios, “The Industry Standard in IT Infrastructure Monitoring,” https: //logon-int.com/nagios/, 2011

  4. [4]

    Diagnosing Performance Variations in HPC Ap- plications Using Machine Learning,

    O. Tuncer, E. Ates, Y . Zhang, A. Turk, J. Brandt, V . J. Leung, M. Egele, and A. K. Coskun, “Diagnosing Performance Variations in HPC Ap- plications Using Machine Learning,” inHigh Performance Computing, 2017, pp. 355–373

  5. [5]

    Ranking Anomalous High Performance Computing Sensor Data Using Unsupervised Clustering,

    A. Morrow, E. Baseman, and S. Blanchard, “Ranking Anomalous High Performance Computing Sensor Data Using Unsupervised Clustering,” in2016 International Conference on Computational Science and Com- putational Intelligence (CSCI). IEEE, 2016, pp. 629–632

  6. [6]

    DCDB Wintermute: Enabling Online and Holistic Oper- ational Data Analytics on HPC Systems,

    A. Netti, M. M ¨uller, C. Guillen, M. Ott, D. Tafani, G. Ozer, and M. Schulz, “DCDB Wintermute: Enabling Online and Holistic Oper- ational Data Analytics on HPC Systems,” inProceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery (ACM), 2020, pp. 101–112

  7. [7]

    Fail-Slow At Scale: Evidence of Hardware Performance Faults in Large Production Systems,

    H. S. Gunawi, R. O. Suminto, R. Sears, C. Golliher, S. Sundararaman, X. Lin, T. Emami, W. Sheng, N. Bidokhti, C. McCaffrey, D. Srinivasan, B. Panda, A. Baptist, G. Grider, P. M. Fields, K. Harms, R. B. Ross, A. Jacobson, R. Ricci, K. Webb, P. Alvaro, H. B. Runesha, M. Hao, and H. Li, “Fail-Slow At Scale: Evidence of Hardware Performance Faults in Large Pr...

  8. [8]

    Anomaly Detection and Anticipation in High Performance Computing Systems,

    A. Borghesi, M. Molan, M. Milano, and A. Bartolini, “Anomaly Detection and Anticipation in High Performance Computing Systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, pp. 739–750. [Online]. Available: https://ieeexplore.ieee.org/document/ 9439169/

  9. [9]

    A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction,

    T. Fujiwara, Shilpika, N. Sakamoto, J. Nonaka, K. Yamamoto, and K.-L. Ma, “A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction,”IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1601–1611, 2021

  10. [10]

    Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning,

    T. Fujiwara, O.-H. Kwon, and K.-L. Ma, “Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning,”IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 45–55, 2020

  11. [11]

    A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems,

    Shilpika, B. Lusch, M. K. Emani, F. Simini, V . Vishwanath, M. E. Papka, and K.-L. Ma, “A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems,”2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 478–488, 2023

  12. [12]

    Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning,

    O. Tuncer, E. Ates, Y . Zhang, A. Turk, J. Brandt, V . J. Leung, M. Egele, and A. K. Coskun, “Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning,”IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 4, pp. 883–896

  13. [13]

    Principal Components Analysis (PCA),

    A. Ma ´ckiewicz and W. Ratajczak, “Principal Components Analysis (PCA),”Computers & Geosciences, vol. 19, no. 3, pp. 303–342, 1993

  14. [14]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,”arXiv preprint, 2018. [Online]. Available: https://arxiv.org/abs/1802.03426

  15. [15]

    Multiresolution Dynamic Mode Decomposition,

    J. N. Kutz, X. Fu, and S. L. Brunton, “Multiresolution Dynamic Mode Decomposition,”SIAM Journal on Applied Dynamical Systems, vol. 15, no. 2, pp. 713–735, 2016

  16. [16]

    Predicting Faults in High Per- formance Computing Systems: An In-Depth Survey of the State-of- the-Practice,

    D. Jauk, D. Yang, and M. Schulz, “Predicting Faults in High Per- formance Computing Systems: An In-Depth Survey of the State-of- the-Practice,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, no. 30. Association for Computing Machinery (ACM), 2019, pp. 1–13

  17. [17]

    A Survey on Automated Log Analysis for Reliability Engineering,

    S. He, P. He, Z. Chen, T. Yang, Y . Su, and M. R. Lyu, “A Survey on Automated Log Analysis for Reliability Engineering,”ACM Computing Surveys, vol. 54, no. 6, 2021

  18. [18]

    A Survey on Log Anomaly Detection using Deep Learning,

    R. B. Yadav, P. S. Kumar, and S. V . Dhavale, “A Survey on Log Anomaly Detection using Deep Learning,” in2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2020, pp. 1215–1220

  19. [19]

    LogAider: A Tool for Mining Potential Correlations of HPC Log Events,

    S. Di, R. Gupta, M. Snir, E. Pershey, and F. Cappello, “LogAider: A Tool for Mining Potential Correlations of HPC Log Events,” inProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Press, 2017, pp. 442–451

  20. [20]

    Toward an In-Depth Analysis of Multifidelity High Performance Computing Systems,

    S. Shilpika, B. Lusch, M. Emani, F. Simini, V . Vishwanath, M. E. Papka, and K.-L. Ma, “Toward an In-Depth Analysis of Multifidelity High Performance Computing Systems,” in22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022, pp. 716–725

  21. [21]

    MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs,

    F. Shilpika, B. Lusch, M. Emani, V . Vishwanath, M. E. Papka, and K.- L. Ma, “MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs,” inProceedings of the 2019 IEEE/ACM Indus- try/University Joint International Workshop on Data-center Automation, Analytics, and Control (DAAC), 2019, pp. 13–18

  22. [22]

    MAP: A Visual Analytics System for Job Monitoring and Analysis,

    A. Pal and P. Malakar, “MAP: A Visual Analytics System for Job Monitoring and Analysis,” in2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2020, pp. 527–534

  23. [23]

    Prodigy: Towards Unsupervised Anomaly Detection in Production HPC Systems,

    B. Aksar, E. Sencan, B. Schwaller, O. Aaziz, V . J. Leung, J. Brandt, B. Kulis, M. Egele, and A. K. Coskun, “Prodigy: Towards Unsupervised Anomaly Detection in Production HPC Systems,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery (ACM), 2023

  24. [24]

    Understanding and Analyzing Interconnect Errors and Network Congestion on a Large Scale HPC System,

    M. Kumar, S. Gupta, T. Patel, M. Wilder, W. Shi, S. Fu, C. Engelmann, and D. Tiwari, “Understanding and Analyzing Interconnect Errors and Network Congestion on a Large Scale HPC System,” inProceedings of the 48th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2018, pp. 107–114

  25. [25]

    A Visual Analytics System for Optimizing Communications in Massively Parallel Applications,

    T. Fujiwara, P. Malakar, K. Reda, V . Vishwanath, M. E. Papka, and K.- L. Ma, “A Visual Analytics System for Optimizing Communications in Massively Parallel Applications,” inProceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 2017, pp. 21–31

  26. [26]

    Visual Analytics Techniques for Exploring the Design Space of Large- Scale High-Radix Networks,

    J. K. Li, M. Mubarak, R. B. Ross, C. D. Carothers, and K.-L. Ma, “Visual Analytics Techniques for Exploring the Design Space of Large- Scale High-Radix Networks,” inProceedings of the 2017 IEEE Inter- national Conference on Cluster Computing (CLUSTER). IEEE, 2017, pp. 478–488

  27. [27]

    MTSAD: Multivariate Time Series Abnormality Detection and Visualization,

    V . Pham, N. Nguyen, J. Li, J. Hass, Y . Chen, and T. Dang, “MTSAD: Multivariate Time Series Abnormality Detection and Visualization,” in 2019 IEEE International Conference on Big Data (Big Data), pp. 3267– 3276

  28. [28]

    Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning,

    B. Aksar, E. Sencan, B. Schwaller, O. Aaziz, V . J. Leung, J. Brandt, B. Kulis, M. Egele, and A. K. Coskun, “Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning,”IEEE Transactions on Parallel and Distributed Systems, vol. 35, no. 4, pp. 693–706

  29. [29]

    Deep- HYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems,

    F. K. Stehle, W. Vandelli, F. Zahn, G. Avolio, and H. Fr ¨oning, “Deep- HYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems,” inProceedings of the 38th ACM International Conference on Super- computing. Association for Computing Machinery (ACM), 2024, pp. 272–285

  30. [30]

    An In- cremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data,

    T. Fujiwara, J. Chou, Shilpika, P. Xu, L. Ren, and K.-L. Ma, “An In- cremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data,”IEEE Trans. Vis. Comput. Graph, vol. 26, no. 1, pp. 418–428, 2020

  31. [31]

    An Incremental Multi-Level, Multi-Scale Approach to Assessment of Multifidelity HPC Systems,

    S. Shilpika, B. Lusch, V . Vishwanath, and M. E. Papka, “An Incremental Multi-Level, Multi-Scale Approach to Assessment of Multifidelity HPC Systems,” inSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2024, pp. 1576–1587

  32. [32]

    A Visual Analytics Approach for Hardware System Monitoring with Streaming Functional Data Analysis,

    Shilpika, T. Fujiwara, N. Sakamoto, J. Nonaka, and K.-L. Ma, “A Visual Analytics Approach for Hardware System Monitoring with Streaming Functional Data Analysis,”IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 6, pp. 2338–2349, 2022

  33. [33]

    Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning,

    G. Ozer, A. Netti, D. Tafani, and M. Schulz, “Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning,” inHigh Performance Computing. Springer International Publishing, pp. 280–292

  34. [34]

    RUAD: Unsupervised Anomaly Detection in HPC Systems,

    M. Molan, A. Borghesi, D. Cesarini, L. Benini, and A. Bartolini, “RUAD: Unsupervised Anomaly Detection in HPC Systems,”Future Generation Computer Systems, vol. 141, pp. 542–554, 2023

  35. [35]

    Prolego: Time-Series Analysis for Predicting Failures in Complex Systems,

    A. Das and A. Aiken, “Prolego: Time-Series Analysis for Predicting Failures in Complex Systems,” inProceedings of the 2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), 2023, pp. 77–86

  36. [36]

    An Empirical Study of Log Analysis at Microsoft,

    S. He, X. Zhang, P. He, Y . Xu, L. Li, Y . Kang, M. Ma, Y . Wei, Y . Dang, S. Rajmohan, and Q. Lin, “An Empirical Study of Log Analysis at Microsoft,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery (ACM), 2022, pp. 1465–1476

  37. [37]

    Grafana: Open Source Visualization and Analytics Platform,

    G. Labs, “Grafana: Open Source Visualization and Analytics Platform,” https://grafana.com/, 2025

  38. [38]

    A Visual Analytics Framework for Reviewing Streaming Performance Data,

    S. P. Kesavan, T. Fujiwara, J. K. Li, C. Ross, M. Mubarak, C. D. Carothers, R. B. Ross, and K.-L. Ma, “A Visual Analytics Framework for Reviewing Streaming Performance Data,” in2020 IEEE Pacific Visualization Symposium (PacificVis), pp. 206–215

  39. [39]

    Towards a Systematic Combination of Dimension Reduction and Clustering in Visual Analytics,

    J. Wenskovitch, I. Crandell, N. Ramakrishnan, L. House, S. Leman, and C. North, “Towards a Systematic Combination of Dimension Reduction and Clustering in Visual Analytics,”IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 1, pp. 131–141, 2018

  40. [40]

    Wu and V

    X. Wu and V . Kumar,The Top Ten Algorithms in Data Mining. Boca Raton, FL, USA: Chapman & Hall/CRC, 2009

  41. [41]

    Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections,

    H. Jeon, H.-K. Ko, J. Jo, Y . Kim, and J. Seo, “Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections,”IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 551–561, 2022

  42. [42]

    Extracting Spatial–Temporal Coherent Patterns in Large-Scale Neural Recordings Using Dynamic Mode Decomposition,

    B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. Kutz, “Extracting Spatial–Temporal Coherent Patterns in Large-Scale Neural Recordings Using Dynamic Mode Decomposition,”Journal of Neuro- science Methods, vol. 258, pp. 1–15, 2016

  43. [43]

    The Tachyon Project: Scalable Modeling Framework for HEP,

    DOE, “The Tachyon Project: Scalable Modeling Framework for HEP,” https://tachyon-org.github.io/, 2023

  44. [44]

    The NOνA DAQ Monitor System,

    M. Baird, D. Grover, S. Kasahara, and M. M. for the NOνA Col- laboration, “The NOνA DAQ Monitor System,”Journal of Physics: Conference Series, vol. 664, no. 8, 2015

  45. [45]

    RRDTool: A Round Robin Database for Network Monitoring,

    M. Singh, S. Dargada, H. Mewada, M. Tahilyani, J. Malviya, R. Sharma, and S. S. Shrivastava, “RRDTool: A Round Robin Database for Network Monitoring,”Journal of Computer Science, vol. 18, no. 8, pp. 770–776, 2022

  46. [46]

    Visualizing Data using t-SNE,

    L. van der Maaten and G. Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008

  47. [47]

    A. J. Izenman,Linear Discriminant Analysis. Springer New York, pp. 237–280

  48. [48]

    Interactive Dimensionality Reduction for Comparative Analysis,

    T. Fujiwara, X. Wei, J. Zhao, and K.-L. Ma, “Interactive Dimensionality Reduction for Comparative Analysis,”IEEE Transactions on Visualiza- tion and Computer Graphics, vol. 28, no. 1, pp. 758–768, 2022

  49. [49]

    Visual Analytics using Tensor Unified Linear Comparative Analysis,

    N. Okami, K. Miyake, N. Sakamoto, J. Nonaka, and T. Fujiwara, “Visual Analytics using Tensor Unified Linear Comparative Analysis,”IEEE Transactions on Visualization and Computer Graphics, vol. 32, no. 1, pp. 79–89, 2026

  50. [50]

    Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,

    P. J. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,”Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987

  51. [51]

    A Cluster Separation Measure,

    D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI- 1, no. 2, pp. 224–227

  52. [52]

    A Dendrite Method for Cluster Analysis,

    T. Calinski and J. Harabasz, “A Dendrite Method for Cluster Analysis,” Communications in Statistics - Theory and Methods, vol. 3, no. 1, pp. 1–27

  53. [53]

    Classes are Not Clusters: Improving Label-Based Evaluation of Dimensionality Re- duction,

    H. Jeon, Y .-H. Kuo, M. Aupetit, K.-L. Ma, and J. Seo, “Classes are Not Clusters: Improving Label-Based Evaluation of Dimensionality Re- duction,”IEEE Transactions on Visualization and Computer Graphics, vol. 30, no. 1, pp. 781–791