BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

Joonhyung Bae

arxiv: 2511.19162 · v3 · pith:RU23GPTMnew · submitted 2025-09-27 · 💻 cs.IR · cs.CY· cs.HC· cs.LG· cs.MM

BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

Joonhyung Bae This is my paper

Pith reviewed 2026-05-21 21:17 UTC · model grok-4.3

classification 💻 cs.IR cs.CYcs.HCcs.LGcs.MM

keywords bioartclusteringmulti-dimensional analysiscomputational humanitiesUMAPagglomerative clusteringart classificationpolysemy handling

0 comments

The pith

BioArtlas clusters 81 bioart works across thirteen dimensions to identify four organizational patterns in the field.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bioart mixes art, science, ethics and politics in ways that single categories cannot capture. The paper builds BioArtlas to represent each of 81 works along thirteen curated dimensions while using codebook grouping to handle overlapping terms. It tests hundreds of representation and algorithm combinations and settles on agglomerative clustering after four-dimensional reduction. This produces four repeatable patterns: works by the same artist stay close, techniques form clear groups, styles shift over time, and some concepts link works from different periods. The system keeps the quantitative analysis separate from a public web explorer so both rigor and access are possible.

Core claim

The optimal configuration of axis-aware representations, codebook-based grouping, and agglomerative clustering on four-dimensional UMAP space partitions the 81 works into clusters that display artist-specific methodological cohesion, technique-based segmentation, temporal artistic evolution, and trans-temporal conceptual affinities.

What carries the argument

Axis-aware representations paired with codebook-based grouping of related concepts, followed by agglomerative clustering on 4D UMAP embeddings.

If this is right

Bioart can be studied as a multi-dimensional space rather than a single-axis category.
Artist-level methodological signatures persist across individual projects.
Technique choices create detectable segments independent of artist identity.
Works from different decades can share conceptual clusters while showing stylistic drift.
Quantitative maps can coexist with public-facing interfaces without sacrificing analytical standards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same representation-and-clustering pipeline could be applied to other hybrid domains such as digital performance or climate art.
If the four patterns hold under new data, they could serve as benchmarks for tracking how bioart evolves with new technologies.
Public access to both the dataset and the interactive explorer lowers the barrier for non-computational researchers to test or extend the groupings.

Load-bearing premise

The thirteen chosen dimensions together with the codebook grouping capture the hybrid and polysemous character of bioart works without large curator bias or loss of meaning.

What would settle it

Re-running the full pipeline on the same 81 works after replacing the thirteen dimensions with an independent set of descriptors or removing the codebook step would produce substantially different cluster memberships or eliminate the four reported patterns.

Figures

Figures reproduced from arXiv: 2511.19162 by Joonhyung Bae.

read the original abstract

Bioart brings living material into artistic practice, where a single work can be at once an aesthetic object, a scientific instrument, and an ethical provocation. Traditional categories sort such works along one axis at a time, which flattens the very hybridity that defines the field and leaves curators no way to compare works across many dimensions together. I introduce BioArtlas, a computational atlas that represents each bioartwork along many curated dimensions at once and organizes the field by conceptual similarity rather than by medium or chronology. My method embeds the keywords of all 81 works on each of thirteen interpretive axes, groups related concepts into a shared codebook that tames inconsistent terminology, and then searches systematically for a clustering that is both statistically clean and interpretable. Among the methods that place every work on the map, agglomerative clustering separates the field far more cleanly than the usual k-means baseline (silhouette 0.664 versus 0.483), whereas density-based methods reach higher scores only by discarding most of the corpus as noise. By separating rigorous analysis from public storytelling, BioArtlas turns the tangled complexity of bioart into a navigable landscape, openly available as an interactive interface (https://www.bioartlas.com) and dataset (https://github.com/joonhyungbae/BioArtlas).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BioArtlas applies standard clustering to 81 bioart works with a thorough search over 800 setups and public data release, but the four named patterns come from post-hoc mapping of the 15 clusters.

read the letter

The main takeaway is that this paper applies established clustering techniques to a curated collection of 81 bioart works and identifies four patterns through that process, though the link from clusters to those patterns involves author interpretation. It does well by exhaustively testing around 800 combinations of representations, spaces, and algorithms to select the best one, which turns out to be agglomerative clustering with k=15 on 4D UMAP, supported by a silhouette score of 0.664 plus trustworthiness and continuity scores. The codebook method for grouping concepts across the thirteen dimensions addresses some of the terminology issues in this interdisciplinary area. Providing both the full dataset on GitHub and a working interactive interface at bioartlas.com adds real value by letting others explore and verify the results directly. The softer part is the presentation of the four patterns—artist-specific methodological cohesion, technique-based segmentation, temporal artistic evolution, and trans-temporal conceptual affinities—as direct revelations from the clustering. With fifteen clusters available, selecting and naming these four specific ones could benefit from additional checks like sensitivity analysis or external validation to show they are not just one possible reading. This work would interest people working at the intersection of data science and cultural studies, particularly those analyzing art forms that mix science and technology. A reader who wants a practical example of dimensionality reduction and clustering on small but complex datasets could take something away from the evaluation setup. I would recommend sending it to peer review. The computational core is clear and the resources are public, so referees can focus on tightening the interpretive claims around the patterns.

Referee Report

2 major / 2 minor

Summary. The manuscript presents BioArtlas, a computational framework for clustering 81 bioart works across 13 author-curated dimensions. It employs codebook-based grouping to handle polysemy, evaluates up to 800 combinations of representations and algorithms, identifies Agglomerative clustering with k=15 on 4D UMAP as optimal (silhouette 0.664 ± 0.008, trustworthiness/continuity 0.805/0.812), and interprets the clusters as revealing four organizational patterns: artist-specific methodological cohesion, technique-based segmentation, temporal artistic evolution, and trans-temporal conceptual affinities. The work includes a public dataset and interactive web interface.

Significance. If the central claims hold, the paper contributes a reproducible, quantitative method for navigating the multi-dimensional complexity of bioart, with explicit separation of analytical optimization from interpretive communication. The public release of the dataset and interface supports further exploration and cross-disciplinary use in information retrieval and digital humanities.

major comments (2)

[Results] Results section (cluster interpretation paragraph): The claim that the optimal clustering 'reveals' the four specific organizational patterns is load-bearing for the paper's main contribution, yet the mapping from the 15 clusters to these named patterns appears to rely on post-hoc author judgment. No quantitative validation (e.g., inter-rater agreement, ablation on alternative groupings, or robustness checks against random cluster-to-label assignments) is reported to confirm that these exact patterns emerge reliably rather than reflecting curation choices in the 13 dimensions or codebook.
[Methods] Methods section (dimension curation and codebook): The 13 dimensions and codebook-based grouping are presented as capturing the hybrid nature of bioart without significant bias, but no sensitivity analysis or comparison to alternative dimension sets is provided. This directly affects whether the four patterns reflect intrinsic structure in the 81 works or author-imposed semantic distinctions.

minor comments (2)

[Abstract] Abstract and Methods: The phrase 'novel axis-aware representations' is used without a precise definition or equation showing how semantic distinctions are preserved during cross-dimensional comparison.
[Evaluation] Evaluation paragraph: The ±0.008 on the silhouette score should specify whether this is standard deviation across runs or another measure, and the exact number of runs should be stated for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important distinctions between our quantitative optimization procedure and the interpretive analysis of the resulting clusters. We address each major comment below, proposing targeted revisions to strengthen the manuscript while preserving the separation of analytical and communicative components.

read point-by-point responses

Referee: [Results] Results section (cluster interpretation paragraph): The claim that the optimal clustering 'reveals' the four specific organizational patterns is load-bearing for the paper's main contribution, yet the mapping from the 15 clusters to these named patterns appears to rely on post-hoc author judgment. No quantitative validation (e.g., inter-rater agreement, ablation on alternative groupings, or robustness checks against random cluster-to-label assignments) is reported to confirm that these exact patterns emerge reliably rather than reflecting curation choices in the 13 dimensions or codebook.

Authors: We agree that the four organizational patterns are interpretive observations drawn from inspecting the dimension distributions and representative works within the 15 clusters, rather than outputs of an automated labeling procedure. The load-bearing quantitative contribution remains the exhaustive evaluation of up to 800 representation-algorithm combinations that selected Agglomerative clustering at k=15 on 4D UMAP (silhouette 0.664 ± 0.008). In revision we will augment the Results section with per-cluster quantitative summaries (e.g., mean or mode values for each of the 13 dimensions, plus counts of works per artist or time period) that directly support the named patterns, together with two or three concrete example works per pattern. We will also add an explicit statement that these patterns constitute author-guided interpretation of the optimized grouping and that formal inter-rater or randomization tests would require a separate annotation study, which we flag as future work. This revision clarifies the evidential basis without overstating the quantitative support for the interpretive layer. revision: partial
Referee: [Methods] Methods section (dimension curation and codebook): The 13 dimensions and codebook-based grouping are presented as capturing the hybrid nature of bioart without significant bias, but no sensitivity analysis or comparison to alternative dimension sets is provided. This directly affects whether the four patterns reflect intrinsic structure in the 81 works or author-imposed semantic distinctions.

Authors: The 13 dimensions were derived from a systematic review of bioart literature and prior curatorial frameworks to span methodological, technical, temporal, and conceptual axes; the codebook was constructed by grouping polysemous terms observed across the 81 works. We acknowledge that no explicit sensitivity analysis against alternative dimension sets was reported. In the revised Methods section we will (1) provide a table or paragraph justifying each dimension with supporting references from the bioart scholarship, (2) describe the codebook construction process in greater detail, and (3) include a short discussion of limitations together with a limited robustness check: re-running the full 800-combination evaluation on a reduced 10-dimension subset (dropping the two least frequent dimensions) to verify that the top-ranked configuration and the four high-level patterns remain stable. This addition directly addresses the concern about author-imposed structure while remaining within the scope of the existing dataset. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper applies standard unsupervised clustering (Agglomerative at k=15 on 4D UMAP) to author-curated data from 81 works across 13 dimensions, with optimization and evaluation performed using external metrics such as silhouette score (0.664), trustworthiness, and continuity. The four organizational patterns are presented as interpretive labels assigned after clustering rather than as outputs of any equations or derivations that reduce by construction to the input parameters, fitted values, or self-citations. No load-bearing steps match the enumerated circularity patterns; the analysis remains self-contained against the reported benchmarks without renaming known results or smuggling ansatzes via citation.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

The central claim rests on the validity of the 13 hand-curated dimensions and the assumption that clustering outputs correspond to meaningful artistic patterns rather than artifacts of representation choices.

free parameters (2)

k=15
Number of clusters selected as optimal after evaluating multiple values
4D UMAP
Dimensionality reduction target chosen after testing combinations

pith-pipeline@v0.9.0 · 5710 in / 1212 out tokens · 30980 ms · 2026-05-21T21:17:51.413949+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Comprehensive evaluation of up to 800 representation-space-algorithm combinations identifies Agglomerative clustering at k=15 on 4D UMAP as optimal (silhouette 0.664 +/- 0.008)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The approach reveals four organizational patterns: artist-specific methodological cohesion, technique-based segmentation, temporal artistic evolution, and trans-temporal conceptual affinities

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.