pith. sign in

arxiv: 2605.29466 · v1 · pith:RXEFWSV3new · submitted 2026-05-28 · 📊 stat.CO · physics.data-an

`pandemonium`: High Dimensional Analysis in Linked Spaces

Pith reviewed 2026-06-28 23:55 UTC · model grok-4.3

classification 📊 stat.CO physics.data-an
keywords high-dimensional datalinked visualizationscluster analysisdimension reductionanimated toursR packageneural network interpretationphysics modeling
0
0 comments X

The pith

The pandemonium package clusters one variable space and links visualizations in both predictor and response spaces to explore high-dimensional relationships.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the pandemonium R package to address the challenge of uncovering how many predictors map onto many responses. It clusters observations in one space to locate regions of similar behavior, then displays those clusters simultaneously in both spaces through linked non-linear dimension reduction and animated tours. A reader would care because high-dimensional data often hides patterns that numerical summaries alone miss, and the linked views make the connections visible in a way that can guide further modeling. The approach is shown on a neural network regression example that groups inputs by their latent activations and on a physics model that relates input structure to output behavior.

Core claim

The pandemonium package performs cluster analysis in one set of variables to identify regions with similar patterns, then visualizes the resulting clusters simultaneously in both spaces using linked non-linear dimension reduction and animated tours, allowing users to investigate relationships between predictors and responses in high-dimensional problems.

What carries the argument

Linked non-linear dimension reduction and animated tours applied to clusters identified in one of the two variable spaces.

If this is right

  • Input combinations that produce similar latent activations in a neural network can be identified and inspected together.
  • Structure in the predictor space of a physics model can be directly related to patterns in the response variables.
  • High-dimensional problems become explorable in R without reducing all variables to summary statistics first.
  • Two distinct types of linked spaces are supported, one for model internals and one for scientific simulation outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same linking approach could be tested on domains such as genomics to connect gene sets to trait measurements.
  • If the tours preserve local cluster geometry reliably, the method may complement purely algebraic techniques like canonical correlation analysis.
  • Extending the package to allow user-defined distance metrics in the clustering step could adapt it to problems with known domain-specific similarities.

Load-bearing premise

Clusters formed in one variable space will correspond to meaningful patterns in the second space that the chosen visualization methods display without introducing misleading artifacts.

What would settle it

Applying the package to data with deliberately mismatched cluster structures between the two spaces and observing whether the linked views still suggest false connections.

Figures

Figures reproduced from arXiv: 2605.29466 by Gabriel McCoy, German Valencia, Ursula Laa.

Figure 1
Figure 1. Figure 1: The Data page shown here is used to input the data provided to pandemonium into each field. After finishing the pre-processing selection, the user can launch the corresponding analysis page by clicking the load data button. Analysis page The data prepared in the data input page provides the basis for the analysis, combining clustering with high-dimensional visualisation techniques. To provide the required … view at source ↗
Figure 2
Figure 2. Figure 2: The analysis page contains eight tabs, shown here is the input tab where options for the hierarchical clustering can be selected interactively, and a first overview of the results is shown. Summaries of the clustering results The next few tabs should give an overview of the clustering results, primarily focusing on exploring the clustering in the linked space. First, the benchmark tab gives the coordinate … view at source ↗
Figure 3
Figure 3. Figure 3: Linking visualisation of the clustering space with UMAP (left) and tour (right). The colours show the result from hierarchical clustering, and we can use brushing to understand where differences in this grouping come from that are not found with UMAP. The bottom panel shows one group identified in the UMAP view, brushed and highlighted in the tour view. clustering, grouping, scores or bins. Brushing result… view at source ↗
Figure 4
Figure 4. Figure 4: Diagram of the neural network used to model the daily number of bikes rented. The activations are used in the clustering space and the inputs in the linked space. 230 240 250 260 270 2 4 6 8 # clusters Calinski and Harabasz index 0.50 0.55 0.60 0.65 2 4 6 8 # clusters WB ratio [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Selected panels from the statistics tab: The Calinski and Harabasz (CH) index (left), and the average within/between (WB) ratio (right). rental counts with keras (Allaire and Chollet, 2024), as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Final projection of LDA projection pursuit guided tour of bikes clustering space with 4 clusters, shown with colouring by 4 and 3 clusters. The purple cluster is extending along the A4 direction; the pink cluster appears to be sensitive to a combination of activations. those suggested by the clustering, and this discrepancy can be further explored using the tour views. Examining the clustering space (the m… view at source ↗
Figure 7
Figure 7. Figure 7: Parallel coordinate plot, where going from 3 to 4 clusters splits a large cluster into those shown in pink and purple, showing large differences in A4 and A5 direction. Highlighted lines correspond to cluster benchmarks. yr temp atempweathersit hum windspeed [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Final projection of LDA projection pursuit guided tour of bikes linked space with 4 clusters. The pink and purple clusters appear as separated along temperature and humidity inputs, and potentially also wind speed matters. The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Centred coordinate plots for the four activations picked up by the guided tour, plotted on ‘temp‘ and ‘hum‘ variables. The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: t-SNE reduction of linked space (input variables) with binned residual values for each point. complex non-linear model with 150 parameters is used to fit certain experimental results and, in that way, extract information from the data (Aaij et al., 2024). The dimensionality of a system with this many predictors and responses is too large to handle with pandemonium, so the study is carried out in steps. In… view at source ↗
Figure 11
Figure 11. Figure 11: Selected panels from the statistics tab: the maximum cluster radius as a function of the number of clusters (left), and minimum distance between cluster benchmarks also as a function of number of clusters (right). −5 0 5 −3 −2 −1 0 1 2 X1 X3 Cluster assignment in linked space −4 0 4 −3 −2 −1 0 1 2 X1 X6 Cluster assignment in linked space [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Four-six dimensional clusters in predictor space projected onto (X1 ,X3) (left panel) and onto (X1 ,X6) (right panel). We see a correlation pattern for cluster boundaries when X3 is included, while there seems to be no dependence on X6. implemented in pandemonium, in the function pullCoords(), which uses input provided in the exp argument of pandemonium() to set the reference point Zj,exp. With these coor… view at source ↗
Figure 13
Figure 13. Figure 13: Results from a guided tour (left) and radial tour (right) exploring which predictors Xi are relevant for the cluster separation. X1 and X3 appear as particularly important in the final view of the guided tour, and this is confirmed when removing them in a radial tour. 7 Summary We have introduced the R package pandemonium, which combines cluster analysis with linked visualisations to guide exploration in … view at source ↗
read the original abstract

A common challenge in data analysis is uncovering relationships between predictors and responses in problems involving large numbers of both. When the number of predictors and responses is limited, visual approaches are particularly effective. We present an R package, pandemonium, designed to explore such problems by combining cluster analysis with linked visualisations. Clustering is performed in one set of variables to identify regions with similar patterns in that space. The resulting clusters are simultaneously visualised in both spaces using linked views based on non-linear dimension reduction and animated tours. We introduce the package through two examples that illustrate different types of linked spaces. In the first example, we consider how a set of input variables is mapped to latent activations in a neural network regression model, to identify input combinations that result in similar activation patterns. In the second example, we analyse a complex multivariable mathematical model arising in physics to investigate how structure in the predictor space relates to the responses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the R package pandemonium for exploratory analysis of relationships between high-dimensional predictors and responses. Clustering is performed in one variable space, with the resulting groups visualized simultaneously in both spaces via linked non-linear dimension reduction and animated tours. The approach is illustrated through two qualitative case studies: mapping inputs to latent activations in a neural-network regression model, and analyzing structure in a complex multivariable physics model.

Significance. If the linking procedure reliably surfaces non-artifactual cross-space relationships, the package would supply a practical visual workflow for high-dimensional linked data that is currently underserved by existing tools. The work is primarily a software contribution rather than a methodological derivation, so its significance hinges on demonstrated utility rather than theoretical novelty.

major comments (2)
  1. [Abstract and examples] Abstract and examples sections: the central claim that clustering in one space yields groups whose patterns are meaningfully related to structure in the second space is supported only by two qualitative illustrations. No simulation studies with known ground-truth cross-space mappings, no recovery metrics (e.g., adjusted Rand index or cluster purity across spaces), and no comparison against alternative linking methods are reported, leaving open the possibility that observed alignments are driven by the visualization pipeline.
  2. [Methods / package description] The manuscript does not specify how the non-linear dimension reduction (e.g., choice of method, hyperparameters) and tour parameters are selected or validated to avoid introducing misleading artifacts when projecting clusters from one space into the other.
minor comments (2)
  1. [Title and abstract] The package name and citation should be consistently formatted; the title uses backticks around pandemonium while the abstract does not.
  2. [Examples] No mention of reproducibility: the two examples would benefit from included code or data files so readers can replicate the linked visualizations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript describing the pandemonium R package. The work is positioned as a software tool for exploratory analysis of linked high-dimensional spaces rather than a confirmatory statistical method. We address each major comment below and indicate planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract and examples] Abstract and examples sections: the central claim that clustering in one space yields groups whose patterns are meaningfully related to structure in the second space is supported only by two qualitative illustrations. No simulation studies with known ground-truth cross-space mappings, no recovery metrics (e.g., adjusted Rand index or cluster purity across spaces), and no comparison against alternative linking methods are reported, leaving open the possibility that observed alignments are driven by the visualization pipeline.

    Authors: We agree that the manuscript presents only qualitative case studies and does not include simulation studies, recovery metrics, or comparisons to alternative methods. pandemonium is intended as an exploratory visualization tool to surface candidate relationships for subsequent investigation, not as a method for recovering known cross-space structure. In such exploratory contexts, quantitative validation against ground truth is often infeasible because the true mappings are unknown by definition. That said, we will add a dedicated limitations subsection discussing the risk of visualization-driven artifacts and recommending user checks such as stability across multiple dimension reductions. This constitutes a partial revision. revision: partial

  2. Referee: [Methods / package description] The manuscript does not specify how the non-linear dimension reduction (e.g., choice of method, hyperparameters) and tour parameters are selected or validated to avoid introducing misleading artifacts when projecting clusters from one space into the other.

    Authors: We acknowledge the lack of explicit detail on dimension reduction and tour parameter choices. The package relies on standard implementations (UMAP and t-SNE via their R packages with default hyperparameters, and tours via the tourr package using the grand tour with default settings). In the revised manuscript we will expand the methods and package description sections to document these defaults, provide user-facing options for customization, and include practical guidance on assessing projection stability (e.g., comparing multiple runs or alternative methods) to reduce the chance of misleading artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: software package description with no derivation chain

full rationale

The manuscript presents an R package for linked clustering and visualization in high-dimensional spaces, illustrated via two qualitative case studies. No equations, fitted parameters, predictions, uniqueness theorems, or ansatzes appear in the provided text. The central contribution is a tool and workflow description; it contains no load-bearing mathematical steps that could reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. This is the expected non-finding for a methods/software paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper describes a software tool rather than a derivation or empirical claim resting on free parameters, axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5682 in / 1109 out tokens · 24048 ms · 2026-06-28T23:55:43.928138+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 15 canonical work pages · 4 internal anchors

  1. [1]

    doi: 10.1103/PhysRevLett.125.011802. [p13] R. Aaij et al. Comprehensive analysis of local and nonlocal amplitudes in the B0→ K∗0µ+µ− decay. JHEP, 09:026,

  2. [2]

    doi: 10.1007/JHEP09(2024)026. [p13] J. Allaire and F. Chollet.keras: R Interface to ’Keras’,

  3. [3]

    doi: 10.1140/epjc/s10052-019-6944-8. [p13] W. Chang, J. Cheng, J. Allaire, C. Sievert, B. Schloerke, Y. Xie, J. Allen, J. McPherson, A. Dipert, and B. Borges.shiny: Web Application Framework for R,

  4. [4]

    doi: 10.1007/s13748-013-0040-3

    ISSN 2192-6352. doi: 10.1007/s13748-013-0040-3. URLhttps://doi.org/10.1007/s13748-013-0040-3. [p8] T. Galili. dendextend: an r package for visualizing, adjusting, and comparing trees of hierarchical clustering.Bioinformatics,

  5. [5]

    URL https://doi.org/10

    doi: 10.1093/bioinformatics/btv428. URL https://doi.org/10. 1093/bioinformatics/btv428. [p3] C. Hart and E. Wang. Taking the scenic route: Interactive and performant tour animations.The R Journal, 15:307–329,

  6. [6]

    doi: 10.32614/RJ-2023-052

    ISSN 2073-4859. doi: 10.32614/RJ-2023-052. https://doi.org/10.32614/RJ- 2023-052. [p6] C. Hart and E. Wang.detourr: Portable and Performant Tour Animations,

  7. [7]

    doi: 10.18637/jss.v074.i07. [p2] J. H. Krijthe.Rtsne: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation,

  8. [8]

    [p1, 13, 14] U

    doi: 10.1140/epjp/s13360-021-02310-1. [p1, 13, 14] U. Laa, D. Cook, and G. Valencia. A slice tour for finding hollowness in high-dimensional data. Journal of Computational and Graphical Statistics, 29(3):681–687,

  9. [9]

    Local linear forests

    URL https://doi.org/10. 1080/10618600.2020.1777140. [p6] U. Laa, A. Aumann, D. Cook, and G. Valencia. New and simplified manual controls for projection and slice tours, with application to exploring classification boundaries in high dimensions.Journal of Computational and Graphical Statistics, 32(3):1229–1236,

  10. [10]

    [p6, 14] E

    URL https://doi.org/10.1080/ 10618600.2023.2206459. [p6, 14] E. K. Lee and D. Cook. A projection pursuit index for large p small n data.Statistics and Computing, 20 (3):381–392,

  11. [11]

    doi: 10.1007/s11222-009-9131-1. [p14] S. Lee, D. Cook, N. da Silva, U. Laa, N. Spyrison, E. Wang, and H. S. Zhang. The state-of-the-art on tours for dynamic visualization of high-dimensional data.WIREs Computational Statistics, 14(4): e1573, 2022a. doi: https://doi.org/10.1002/wics.1573. URL https://wires.onlinelibrary.wiley. com/doi/abs/10.1002/wics.1573...

  12. [12]

    doi: 10.48550/arXiv.2509.04603. [p1] G. McCoy and G. Valencia

  13. [13]

    Manuscript in preparation. [p13] L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.arXiv e-prints, art. arXiv:1802.03426, Feb

  14. [14]

    doi: 10.48550/arXiv.1802.03426. [p5] M. Medl, D. Cook, and U. Laa. Demonstrating the capabilities of the lionfish software for interactive visualization of market segmentation partitions.Austrian Journal of Statistics, 54(3):71–99, Apr

  15. [15]

    URLhttps://ajs.or.at/index.php/ajs/article/view/2058

    doi: 10.17713/ajs.v54i3.2058. URLhttps://ajs.or.at/index.php/ajs/article/view/2058. [p1] J. Melville.uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction,

  16. [16]

    URL https://doi.org/10.18637/ jss.v040.i02. [p6] Gabriel McCoy Monash University School of Physics and Astronomy Melbourne, Australia ORCiD: 0009-0008-3570-0361 gabe.mccoy02@gmail.com German Valencia Monash University School of Physics and Astronomy Melbourne, Australia ORCiD: 0000-0001-6600-1290 german.valencia@monash.edu Ursula Laa BOKU University Insti...