arxiv: 2603.28816 · v2 · submitted 2026-03-28 · 💻 cs.DL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

ASTRA: Mapping Art-Technology Institutions via Conceptual Axes, Text Embeddings, and Unsupervised Clustering

Joonhyung Bae

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:47 UTC · model grok-4.3

classification 💻 cs.DL cs.AI

keywords art-technology institutionsconceptual axestext embeddingsunsupervised clusteringinstitutional mappingcultural institutionsclustering analysislatent topics

0 comments

The pith

An eight-axis framework combined with text embeddings clusters 78 art-technology institutions into ten coherent groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ASTRA, a method to map the diverse world of art-technology institutions using qualitative descriptions along eight conceptual axes. These descriptions are turned into numerical vectors through sentence embeddings and then grouped using dimensionality reduction and clustering algorithms. The resulting clusters show clear patterns, such as art-science centers around ZKM and academic computing conferences. A sympathetic reader would care because it offers a data-driven way to navigate and understand the connections in this growing field, potentially aiding curators and policymakers in seeing the bigger picture.

Core claim

The ASTRA methodology applies an eight-axis conceptual framework to characterize 78 art-technology institutions, encodes the qualitative descriptions using E5-large-v2 embeddings, reduces dimensions with UMAP, and clusters them with average-linkage agglomerative clustering at k=10. This produces a composite score of 0.825, silhouette coefficient of 0.803, and high Calinski-Harabasz index, yielding coherent groupings including an art-science hub anchored by ZKM, an innovation cluster with Ars Electronica, an ACM academic cluster, and an electronic music cluster.

What carries the argument

The eight-axis conceptual framework (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, and Disciplinary Positioning) combined with E5-large-v2 sentence embeddings and UMAP-based agglomerative clustering.

If this is right

Curators and researchers can explore institutional similarities and cross-disciplinary connections using the interactive React-based tool.
Neighbor-cluster entropy identifies boundary institutions that bridge multiple thematic communities.
Non-negative matrix factorization extracts ten latent topics from the encoded descriptions.
The pipeline yields specific coherent clusters including art-science hubs anchored by ZKM and an ACM academic cluster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adapting the eight-axis framework to institutions in adjacent fields such as scientific research labs could generate comparable maps.
Tracking how new or evolving institutions enter or move between clusters over successive years would reveal shifts in the overall landscape.
The identified groupings could inform targeted collaboration or funding strategies by highlighting both similar peers and bridging organizations.

Load-bearing premise

The eight conceptual axes and the qualitative descriptions collected for each institution capture the multidimensional characteristics without significant omission or bias.

What would settle it

Re-running the embedding and clustering steps on the same qualitative descriptions but obtaining substantially different cluster assignments or markedly lower validation scores such as a silhouette coefficient below 0.7 would indicate the groupings are not stable or coherent.

Figures

Figures reproduced from arXiv: 2603.28816 by Joonhyung Bae.

**Figure 1.** Figure 1: Overview of the ASTRA processing pipeline. Qualitative axis descriptions for 78 institutions are encoded through E5-large-v2 sentence embeddings, quantized via a word-level codebook, and clustered using UMAP and agglomerative clustering. NMF topic modeling and entropy-based boundary analysis are applied post-clustering, and results are served through an interactive web visualization. covers institutions fo… view at source ↗

**Figure 2.** Figure 2: 2D UMAP scatter plots comparing Agglomerative Average (𝑘=10, left) and DBSCAN (𝑘=2, right). Agglomerative clustering yields 10 interpretable groups, while DBSCAN produces only two effective clusters with 27.5% of institutions classified as noise (gray). dual-model configurations, a lightweight Sentence-BERT model, and traditional baselines (Word2Vec, TF-IDF); (2) a leave-one-axis-out analysis measuring eac… view at source ↗

**Figure 3.** Figure 3: Axis contribution analysis (leave-one-axis-out). Each bar represents the change in the respective metric when the named axis is removed. confirming the substantial advantage of modern learned embeddings over traditional representations. The codebook size 𝑘=7 is used in the main pipeline; E5-large-v2 with 𝑘=5 attains the sweep maximum (0.845), while 𝑘=7 with Average linkage yields the selected configuratio… view at source ↗

**Figure 4.** Figure 4: 2D UMAP scatter plot of 78 institutions colored by cluster membership (𝑘=10). (a) Full view showing the spatial separation of Cluster 4; (b) zoomed view of the nine remaining clusters with representative institution labels. peer-review, publication, and reproducibility ethos of the ACM community. Cluster 5: Electronic music and media. This cluster groups seven institutions centered on sound, music, and med… view at source ↗

**Figure 5.** Figure 5: Cluster–topic heatmap. Each cell shows the mean NMF weight of the topic (column) within the cluster (row). Darker cells indicate stronger thematic associations. cluster (8), reflecting its extensive programmatic scope as a museum, research center and production house. More broadly, high neighbor-cluster entropy admits two interpretations: (a) cross-pollinator institutions that intentionally span multiple d… view at source ↗

**Figure 6.** Figure 6: Screenshot of the APESuite Explorer web interface, showing the 2D scatter plot, selected institution detail panel with thematic profile, and similarity links. vation & industry cluster (1), while CTM Festival and MUTEK form a distinct electronic music cluster (5), reflecting shared curatorial philosophies rather than organizational format. A codebook-level examination of the transmediale–SXSW pairing shows… view at source ↗

read the original abstract

The global landscape of art-technology institutions, including festivals, biennials, research labs, conferences, and hybrid organizations, has grown increasingly diverse, yet systematic frameworks for analyzing their multidimensional characteristics remain scarce. This paper proposes ASTRA (Art-technology Institution Spatial Taxonomy and Relational Analysis), a computational methodology combining an eight-axis conceptual framework (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, and Disciplinary Positioning) with a text-embedding and clustering pipeline to map 78 cultural-technology institutions into a unified analytical space. Each institution is characterized through qualitative descriptions along the eight axes, encoded via E5-large-v2 sentence embeddings and quantized through a word-level codebook into TF-IDF feature vectors. Dimensionality reduction using UMAP, followed by agglomerative clustering (Average linkage, k=10), yields a composite score of 0.825, a silhouette coefficient of 0.803, and a Calinski-Harabasz index of 11196. Non-negative matrix factorization extracts ten latent topics, and a neighbor-cluster entropy measure identifies boundary institutions bridging multiple thematic communities. An interactive React-based tool enables curators, researchers, and policymakers to explore institutional similarities and cross-disciplinary connections. Results reveal coherent groupings such as an art-science hub cluster anchored by ZKM and ArtScience Museum, an innovation and industry cluster including Ars Electronica, transmediale, and Sonar, an ACM academic cluster comprising TEI, DIS, and NIME, and an electronic music cluster including CTM Festival, MUTEK, and Sonic Acts. Code and data: https://github.com/joonhyungbae/astra

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A usable mapping pipeline for art-tech institutions built on author-defined axes and embeddings, with good internal metrics but no external checks on the coding.

read the letter

The core contribution is a concrete eight-axis framework applied to 78 art-technology institutions, turned into E5 embeddings, reduced with UMAP, and clustered with average-linkage at k=10. The reported numbers (silhouette 0.803, Calinski-Harabasz 11196, composite 0.825) and the public GitHub code make the groupings reproducible on the supplied data. Clusters such as the ZKM-anchored art-science group and the ACM academic cluster come out cleanly, and the neighbor entropy measure plus the React explorer give a practical way to spot boundary cases and cross links. That combination of taxonomy plus pipeline is new relative to the cited cultural-mapping work and delivers something a curator or policymaker could actually load and tweak.

Referee Report

2 major / 2 minor

Summary. The manuscript presents ASTRA, a pipeline that codes 78 art-technology institutions along eight conceptual axes (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, Disciplinary Positioning), embeds the resulting descriptions with the E5-large-v2 model, applies UMAP dimensionality reduction, and performs average-linkage agglomerative clustering with k=10. It reports strong internal clustering metrics (composite score 0.825, silhouette 0.803, Calinski-Harabasz 11196), extracts topics via NMF, identifies boundary institutions, and provides an interactive React tool for exploration, revealing clusters such as an art-science hub around ZKM and an ACM academic cluster.

Significance. If the central claims hold, the work offers a reproducible computational approach to mapping the diverse landscape of art-technology institutions, facilitating analysis of cross-disciplinary connections for curators, researchers, and policymakers. The public release of code and data on GitHub strengthens the contribution by enabling independent verification and extension of the mappings.

major comments (2)

Section 3 (Qualitative coding of institutions): The qualitative descriptions along the eight axes are generated by the authors without reported inter-rater reliability metrics, multiple independent coders, or validation against institutional self-descriptions or expert review. Because the TF-IDF vectors, UMAP embeddings, and clustering results (including the silhouette coefficient of 0.803) are derived directly from these descriptions, the observed cluster coherence may primarily reflect the consistency of the authors' framing rather than robust, intrinsic structures in the data. This assumption is load-bearing for the claim that the pipeline produces coherent and insightful groupings.
Section 4 (Clustering and validation): No sensitivity analysis is presented for the choice of k=10 or for variations in the axis definitions and descriptions; the high Calinski-Harabasz index of 11196 is reported only for the selected configuration, limiting assessment of robustness to the unsupervised pipeline.

minor comments (2)

Abstract and Section 4: The composite score of 0.825 is mentioned but not defined; please clarify its calculation from the individual metrics (silhouette, Calinski-Harabasz, etc.) in the main text.
Results section and figures: Ensure cluster visualizations include clear legends, axis labels, and institution labels for interpretability; the GitHub link should be repeated in the main text for accessibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each of the major comments point by point below, outlining the revisions we intend to make to improve methodological transparency and robustness.

read point-by-point responses

Referee: Section 3 (Qualitative coding of institutions): The qualitative descriptions along the eight axes are generated by the authors without reported inter-rater reliability metrics, multiple independent coders, or validation against institutional self-descriptions or expert review. Because the TF-IDF vectors, UMAP embeddings, and clustering results (including the silhouette coefficient of 0.803) are derived directly from these descriptions, the observed cluster coherence may primarily reflect the consistency of the authors' framing rather than robust, intrinsic structures in the data. This assumption is load-bearing for the claim that the pipeline produces coherent and insightful groupings.

Authors: We recognize the validity of this concern regarding the subjectivity of the qualitative coding. The conceptual axes were derived from an extensive literature review on art-technology institutions, and the descriptions aim to reflect publicly available information about each institution. To strengthen this aspect, we will revise Section 3 to provide greater transparency: including a table or appendix with sample codings for representative institutions across axes, and explicitly discussing the potential influence of author perspective. Additionally, we will conduct and report a sensitivity analysis by generating alternative descriptions for a subset of institutions and re-evaluating the clustering metrics. While we cannot retroactively introduce multiple independent coders for the original dataset, this will help demonstrate that the cluster structures are not overly sensitive to specific phrasings. We will also add a limitations section acknowledging this. revision: partial
Referee: Section 4 (Clustering and validation): No sensitivity analysis is presented for the choice of k=10 or for variations in the axis definitions and descriptions; the high Calinski-Harabasz index of 11196 is reported only for the selected configuration, limiting assessment of robustness to the unsupervised pipeline.

Authors: We agree that presenting sensitivity analyses would better support the robustness of our findings. In the revised manuscript, we will include additional experiments in Section 4: (1) varying the number of clusters k from 6 to 14 and reporting the corresponding silhouette, Calinski-Harabasz, and composite scores to justify k=10; (2) testing variations in the embedding model or slight modifications to axis descriptions to assess impact on the final clusters. These analyses will be presented with tables and figures showing metric stability, thereby addressing the limitation of reporting metrics only for the selected configuration. revision: yes

Circularity Check

0 steps flagged

No significant circularity; clustering derives directly from independent qualitative inputs

full rationale

The paper defines the eight conceptual axes a priori, generates qualitative descriptions for each of the 78 institutions along those axes, encodes the descriptions with a fixed pre-trained embedding model, and applies standard unsupervised dimensionality reduction and clustering. Cluster validity metrics are computed on the transformed embeddings without any fitted parameter being defined in terms of the resulting clusters, without self-citation chains supporting core claims, and without renaming or smuggling prior results. The pipeline is a straightforward computational mapping of author-provided text inputs; the coherence scores reflect structure within those inputs rather than a definitional loop. This is the normal, non-circular case for descriptive unsupervised analysis.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on the validity of the eight invented axes and the assumption that sentence embeddings faithfully encode the qualitative descriptions; no free parameters are fitted to the final clusters beyond the choice of k=10 and the embedding model.

free parameters (2)

number of clusters k
Set to 10 for agglomerative clustering; chosen to produce interpretable groups rather than derived from data.
E5-large-v2 embedding model
Pre-trained model selected for encoding descriptions; its parameters are fixed from prior training.

axioms (2)

domain assumption Sentence embeddings from E5-large-v2 capture semantic distinctions relevant to the eight conceptual axes
Invoked when converting qualitative descriptions to TF-IDF vectors after quantization.
domain assumption UMAP followed by average-linkage agglomerative clustering yields meaningful partitions of cultural institutions
Used to justify the reported silhouette and Calinski-Harabasz scores as evidence of coherent groupings.

invented entities (1)

Eight conceptual axes (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, Disciplinary Posit no independent evidence
purpose: To provide a multidimensional characterization framework for art-technology institutions
Newly defined in the paper with no independent prior validation cited; used as the basis for all qualitative coding.

pith-pipeline@v0.9.0 · 5603 in / 1705 out tokens · 62324 ms · 2026-05-14T21:47:21.676224+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Breath1024.lean period8 / 8-tick periodicity echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

An eight-axis conceptual framework (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, and Disciplinary Positioning) ... agglomerative clustering (Average linkage, k=10)
IndisputableMonolith/Cost/FunctionalEquation.lean Jcost / washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

E5-large-v2 sentence embeddings ... word-level codebook ... TF-IDF feature vectors ... composite score of 0.825, silhouette 0.803

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 6 internal anchors

[1]

Components of Game Experience: An Automatic Text Analysis of Online Reviews,

X. Wang and D. H.-L. Goh, “Components of Game Experience: An Automatic Text Analysis of Online Reviews,”Entertainment Computing, vol. 33, p. 100338, 2020. Article (CrossRef Link)

work page 2020
[2]

Manovich,Cultural Analytics

L. Manovich,Cultural Analytics. Cambridge, MA: MIT Press, 2020

work page 2020
[3]

Cultural Cartography with Word Embeddings,

D. S. Stoltz and M. A. Taylor, “Cultural Cartography with Word Embeddings,”Poetics, vol. 88, p. 101567, 2021.Article (CrossRef Link)

work page 2021
[4]

Analyzing Cross-Platform Gaming Experi- ences Using Topic Modeling,

Y. Sim, T.-S. Chung, and I. Park, “Analyzing Cross-Platform Gaming Experi- ences Using Topic Modeling,”Entertainment Computing, vol. 54, p. 100946, 2025. Article (CrossRef Link)

work page 2025
[5]

Sentiment Analysis of Animated Film Reviews Using Intelligent Machine Learning,

C. Chen, B. Xu, J.-H. Yang, and M. Liu, “Sentiment Analysis of Animated Film Reviews Using Intelligent Machine Learning,”Computational Intelligence and Neuroscience, vol. 2022, 2022.Article (CrossRef Link)

work page 2022
[6]

Beyond Skill Rating: Advanced Matchmaking in Ghost Recon Online,

O. Delalleau, E. Contal, E. Thibodeau-Laufer, R. C. Ferrari, Y. Bengio, and F. Zhang, “Beyond Skill Rating: Advanced Matchmaking in Ghost Recon Online,”IEEE Trans. Comput. Intell. AI Games, vol. 4, no. 3, pp. 167–177, Sep. 2012.Article (CrossRef Link) KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 3, NO. 6, DECEMBER 20XX 20

work page 2012
[7]

BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart,

J. Bae, “BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart,” in Proc. 39th Conf. Neural Information Processing Systems (NeurIPS), 2025, Creative AI Track

work page 2025
[8]

A. K. Yetisen, J. Daviset al., “Bioart,”Trends in Biotechnology, vol. 33, no. 12, pp. 724–734, Dec. 2015.Article (CrossRef Link)

work page 2015
[9]

NewYork: Columbia University Press, 1993

P.Bourdieu,The Field of Cultural Production: Essays on Art and Literature. NewYork: Columbia University Press, 1993

work page 1993
[10]

TheFormsofCapital,

P.Bourdieu,“TheFormsofCapital,”inThe Sociology of Economic Life,3rded.,M.Gra- novetter and R. Swedberg, Eds. New York: Routledge, 2018, pp. 78–92

work page 2018
[11]

TheIronCageRevisited: InstitutionalIsomorphismand Collective Rationality in Organizational Fields,

P.J.DiMaggioandW.W.Powell,“TheIronCageRevisited: InstitutionalIsomorphismand Collective Rationality in Organizational Fields,”American Sociological Review, vol. 48, no. 2, pp. 147–160, 1983.Article (CrossRef Link)

work page 1983
[12]

Arts Festivals and the City,

B. Quinn, “Arts Festivals and the City,”Urban Studies, vol. 42, no. 5-6, pp. 927–943, 2005.Article (CrossRef Link)

work page 2005
[13]

Festivalisation: Patterns and Limits,

E. Négrier, “Festivalisation: Patterns and Limits,” inFocus on Festivals: Contemporary European Case Studies and Perspectives,C.Newbold,C.Maughan,J.Jordan,andF.Bian- chini, Eds. Oxford: Goodfellow Publishers, 2015, pp. 18–27.Article (CrossRef Link)

work page 2015
[14]

Knowledge Cultures in New Media Art,

R. C. Hoetzlein, “Knowledge Cultures in New Media Art,”Artnodes, no. 31, pp. 1–9, 2023.Article (CrossRef Link)

work page 2023
[15]

Grau, Ed.,MediaArtHistories

O. Grau, Ed.,MediaArtHistories. Cambridge, MA: MIT Press, 2007

work page 2007
[16]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,”inProc.2019Conf.EmpiricalMethodsinNaturalLanguageProcessing (EMNLP-IJCNLP), 2019, pp. 3982–3992.Article (CrossRef Link)

work page 2019
[17]

C-Pack: Packed Resources For General Chinese Embeddings

S. Xiao, Z. Liu, P. Zhang, and N. Muennighoff, “C-Pack: Packaged Resources To Advance General Chinese Embedding,”arXiv preprint arXiv:2309.07597, 2023. [Online]. Available: https://arxiv.org/abs/2309.07597

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu, “BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self- Knowledge Distillation,”arXiv preprint arXiv:2402.03216, 2024. [Online]. Available: https://arxiv.org/abs/2402.03216

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

mGTE:GeneralizedLong-ContextTextRepresentation and Reranking Models for Multilingual Text Retrieval,

X. Zhang, Y. Zhang, D. Long, W. Xie, Z. Dai, J. Tang, H. Lin, B. Yang, P. Xie, F. Huang, M.Zhang,W.Li,andM.Zhang,“mGTE:GeneralizedLong-ContextTextRepresentation and Reranking Models for Multilingual Text Retrieval,” in Proc. 2024 Conf. Empiri- cal Methods in Natural Language Processing: Industry Track, 2024, pp. 1393–1412. Article (CrossRef Link)

work page 2024
[20]

Learning the Parts of Objects by Non-Negative Matrix Factorization,

D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Non-Negative Matrix Factorization,”Nature, vol. 401, pp. 788–791, 1999.Article (CrossRef Link)

work page 1999
[21]

Neural Discrete Representation Learning

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural Discrete Representation Learning,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 6306–6315. [Online]. Available: https://arxiv.org/abs/1711.00937

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,”arXiv preprint arXiv:1802.03426, 2018. [Online]. Available: https://arxiv.org/abs/1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Visualizing Data using t-SNE,

L. van der Maaten and G. Hinton, “Visualizing Data using t-SNE,”Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008. [Online]. Available: https://jmlr.org/papers/v9/vandermaaten08a.html

work page 2008
[24]

Visualization of Cultural Heritage Collection Data: State of the Art and Future Chal- lenges,

F. Windhager, P. Federico, G. Schreder, K. Glinka, M. Dork, S. Miksch, and E. Mayr, “Visualization of Cultural Heritage Collection Data: State of the Art and Future Chal- lenges,”IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 6, pp. 2311–2330, 2019.Article (CrossRef Link)

work page 2019
[25]

The Cultural Mapping and Pattern Analysis (CMAP) Visualization Toolkit: Open Source Text Analysis for Qualitative and Computational Social Science,

C. M. Abramson and Y. Nian, “The Cultural Mapping and Pattern Analysis (CMAP) Visualization Toolkit: Open Source Text Analysis for Qualitative and Computational Social Science,”arXiv preprint arXiv:2510.16140, 2025, Under review at Journal of KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 3, NO. 6, DECEMBER 20XX 21 Open Source Software (JOSS)....

work page arXiv 2025
[26]

The Population Ecology of Organizations,

M. T. Hannan and J. Freeman, “The Population Ecology of Organizations,”American Journal of Sociology, vol. 82, no. 5, pp. 929–964, 1977.Article (CrossRef Link)

work page 1977
[27]

ArtintheInformationAge: TechnologyandConceptualArt,

E.A.Shanken,“ArtintheInformationAge: TechnologyandConceptualArt,”Leonardo, vol. 35, no. 4, pp. 433–438, 2002.Article (CrossRef Link)

work page 2002
[28]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei, “Text Embeddings by Weakly-Supervised Contrastive Pre-training,”arXiv preprint arXiv:2212.03533, 2022. [Online]. Available: https://arxiv.org/abs/2212.03533

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

Distributed Representations of Words and Phrases and their Compositionality

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 26, 2013. [Online]. Available: https://arxiv.org/abs/1310.4546

work page internal anchor Pith review Pith/arXiv arXiv 2013
[30]

ADensity-BasedAlgorithmforDiscovering Clusters in Large Spatial Databases with Noise,

M.Ester,H.-P.Kriegel,J.Sander,andX.Xu,“ADensity-BasedAlgorithmforDiscovering Clusters in Large Spatial Databases with Noise,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231

work page 1996
[31]

OPTICS: Ordering Points To Identify the Clustering Structure,

M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: Ordering Points To Identify the Clustering Structure,” in Proc. ACM SIGMOD Int. Conf. Management of Data, 1999, pp. 49–60.Article (CrossRef Link)

work page 1999
[32]

Estimating the Number of Clusters in a Data Set via the Gap Statistic,

R. Tibshirani, G. Walther, and T. Hastie, “Estimating the Number of Clusters in a Data Set via the Gap Statistic,”J. Royal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411–423, 2001.Article (CrossRef Link)

work page 2001
[33]

Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613, 2024

Z. Nussbaum, J. X. Morris, B. Duderstadt, and A. Mulyar, “Nomic Embed: Training a Reproducible Long Context Text Embedder,”arXiv preprint arXiv:2402.01613, 2024. [Online]. Available: https://arxiv.org/abs/2402.01613 Author Profile JoonhyungBaereceivedtheB.F.A.degreeinArt&DesignfromKorea University, Seoul, South Korea, in 2019, and the M.S. degree in Cul- ...

work page arXiv 2024