The Galaxy's Guide to the Tokenizer: A Benchmark for Scientific Foundation Models

Bahram Mobasher; Gabriela Canalizo; Juan Rafael Mart\'inez-Galarza; Manuel P\'erez-Carrasco; Michael J. Smith; Sogol Sanjaripour

arxiv: 2606.25610 · v1 · pith:E5Z5RE5Dnew · submitted 2026-06-24 · 🌌 astro-ph.IM · astro-ph.GA

The Galaxy's Guide to the Tokenizer: A Benchmark for Scientific Foundation Models

Sogol Sanjaripour , Michael J. Smith , Manuel P\'erez-Carrasco , Juan Rafael Mart\'inez-Galarza , Bahram Mobasher , Gabriela Canalizo This is my paper

Pith reviewed 2026-06-25 20:28 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.GA

keywords tokenizationfoundation modelsastronomical imagesgalaxy propertiestransformersreconstructionrepresentation learning

0 comments

The pith

No single tokenization strategy excels at both reconstructing galaxy images and predicting their physical properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Tokenization decides how scientific data like galaxy images gets fed into transformer foundation models. The authors test four different tokenizers inside the same model architecture on hundreds of thousands of real galaxy images. They measure how well each can rebuild the original image and how well the resulting representations predict measured galaxy properties such as size, brightness, and shape. The tests show clear trade-offs, with better image reconstruction not guaranteeing better property predictions. This suggests that standard ways of judging tokenizers by reconstruction alone miss important differences for scientific use.

Core claim

When four tokenization methods are plugged into the same transformer backbone and tested on galaxy image reconstruction plus physical property prediction, reconstruction quality and representation quality turn out to be decoupled, and no tokenizer dominates every task.

What carries the argument

A shared AstroPT transformer backbone that processes galaxy images tokenized by Affine, AIM, JetFormer, or VQ-VAE, evaluated on both reconstruction error and downstream physical property probes.

Load-bearing premise

That the shared AstroPT backbone and the chosen physical-property prediction tasks provide a fair, unbiased comparison of the four tokenization strategies without method-specific confounding effects.

What would settle it

Finding that one of the four tokenizers achieves both the highest reconstruction fidelity and the best physical property prediction accuracy across the full set of galaxy images.

Figures

Figures reproduced from arXiv: 2606.25610 by Bahram Mobasher, Gabriela Canalizo, Juan Rafael Mart\'inez-Galarza, Manuel P\'erez-Carrasco, Michael J. Smith, Sogol Sanjaripour.

**Figure 1.** Figure 1: Comparison of image reconstruction quality using different tokenization strategies with a shared transformer backbone (AstroPT). The top row shows the input galaxy images, while the bottom row presents the reconstructions produced by each tokenizer. For the AIM and Affine models, both input and reconstructed images are shown in their patch-wise normalized representation. enization strategies span a range o… view at source ↗

read the original abstract

Tokenization is central to adapting scientific data for transformer-based foundation models, yet its impact on learned representations remains poorly understood. We compare four tokenization strategies, Affine, AIM, JetFormer, and VQ-VAE, within a unified transformer framework for astronomical imaging. Using 640,000 galaxy images from the DESI Legacy Survey and a shared AstroPT backbone, we evaluate each method on reconstruction fidelity and prediction of physical properties. Our results reveal trade-offs across approaches. The flow-based JetFormer achieves higher reconstruction quality, while VQ-VAE yields strong probe performance for galaxy physical properties. Affine and AIM better preserve localized morphological information. We find that reconstruction and representation quality are decoupled, and no single method consistently performs best across the tasks considered here. By grounding our evaluation in independently measured physical quantities, we hope this study serves to highlight the potential of scientific data as a basis for constructing interpretable benchmarks for foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Benchmarks four tokenizers on DESI galaxies and finds reconstruction and physical prediction are decoupled, but shared backbone needs scrutiny for fair comparison.

read the letter

This paper benchmarks Affine, AIM, JetFormer, and VQ-VAE tokenizers on 640,000 DESI Legacy Survey galaxy images inside a shared AstroPT transformer. It reports that reconstruction quality and performance on physical-property prediction tasks do not track each other, and that no single tokenizer wins on every axis.

The useful piece is the decision to ground part of the evaluation in independently measured physical quantities rather than reconstruction loss alone. That gives the benchmark a scientific anchor that most tokenizer studies lack. The scale is reasonable, the four-way comparison is concrete, and the trade-off result (JetFormer strong on reconstruction, VQ-VAE stronger on the probes, Affine and AIM better on morphology) is the kind of practical information people actually need when picking a tokenizer for astronomy data.

The main soft spot is the shared backbone. If the AstroPT weights were not retrained from scratch or initialized identically for each tokenizer, performance gaps could reflect tokenizer-backbone mismatch instead of tokenizer quality. The abstract does not describe the initialization or training schedule, so the decoupling claim sits on an unverified assumption. The absence of error bars, statistical tests, or data-selection details in the summary also makes it hard to judge how robust the ordering is.

This is for people building or selecting foundation models for scientific imaging who want to see tokenizer effects measured against real observables. A reader in that group would get value from the empirical layout even if the numbers need verification. The work shows clear thinking about what a useful benchmark should test.

Send it to peer review. The physical-grounding idea is worth referee time; the backbone protocol and statistics are the obvious items to tighten.

Referee Report

2 major / 2 minor

Summary. The paper compares four tokenization strategies (Affine, AIM, JetFormer, VQ-VAE) for astronomical imaging within a unified transformer framework using 640,000 galaxy images from the DESI Legacy Survey and a shared AstroPT backbone. It evaluates each on reconstruction fidelity and prediction of physical properties, reports trade-offs (e.g., JetFormer higher reconstruction, VQ-VAE stronger probe performance, Affine/AIM better for morphology), and concludes that reconstruction and representation quality are decoupled with no single method best across tasks. The evaluation is grounded in independently measured physical quantities to create interpretable benchmarks for scientific foundation models.

Significance. If the decoupling result holds under a properly controlled comparison, the work supplies a concrete, physics-grounded benchmark for tokenizer choice in scientific transformers. The explicit use of independently measured physical properties as probes, rather than proxy metrics, is a clear strength that could help move the field toward falsifiable, domain-specific evaluations of foundation-model components.

major comments (2)

[Methods] Methods (shared AstroPT backbone description): the claim that differences in reconstruction vs. physical-property probe performance reflect tokenizer properties alone is load-bearing for the decoupling conclusion, yet the text does not specify whether the shared backbone weights are re-initialized or re-trained from scratch for each tokenizer or whether a single pre-trained checkpoint is reused. If the latter, performance gaps could arise from tokenizer–backbone mismatch rather than intrinsic tokenizer quality, undermining the central empirical comparison.
[Results] Results (probe-task evaluation): the abstract states that VQ-VAE yields strong probe performance and that reconstruction and representation quality are decoupled, but no error bars, statistical significance tests, or data-selection criteria for the 640k images are reported. Without these, it is impossible to assess whether the observed trade-offs are robust or whether the decoupling claim is supported by the data.

minor comments (2)

[Abstract] Abstract: the sentence 'no single method consistently performs best across the tasks considered here' would be clearer if the specific tasks (reconstruction, morphology preservation, physical-property prediction) were enumerated.
The manuscript would benefit from a table summarizing the four tokenizers' key hyperparameters and training schedules to allow direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important points for clarification. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Methods] Methods (shared AstroPT backbone description): the claim that differences in reconstruction vs. physical-property probe performance reflect tokenizer properties alone is load-bearing for the decoupling conclusion, yet the text does not specify whether the shared backbone weights are re-initialized or re-trained from scratch for each tokenizer or whether a single pre-trained checkpoint is reused. If the latter, performance gaps could arise from tokenizer–backbone mismatch rather than intrinsic tokenizer quality, undermining the central empirical comparison.

Authors: We thank the referee for this observation. The shared AstroPT backbone was implemented by using identical architecture and hyperparameters for each tokenizer, with the backbone weights randomly initialized and trained from scratch independently in each case. This design ensures that observed differences can be attributed to the tokenizers rather than to a pre-trained checkpoint mismatch. We will revise the Methods section to explicitly describe the initialization, training procedure, and hyperparameter sharing to remove any ambiguity. revision: yes
Referee: [Results] Results (probe-task evaluation): the abstract states that VQ-VAE yields strong probe performance and that reconstruction and representation quality are decoupled, but no error bars, statistical significance tests, or data-selection criteria for the 640k images are reported. Without these, it is impossible to assess whether the observed trade-offs are robust or whether the decoupling claim is supported by the data.

Authors: We agree that error bars, significance testing, and explicit data-selection criteria are necessary to substantiate the robustness of the reported trade-offs and decoupling result. In the revised manuscript we will add bootstrap-derived error bars to all probe-task metrics, include statistical significance tests (e.g., paired t-tests) comparing tokenizer performances, and provide a clear description of the selection criteria applied to the 640,000 DESI Legacy Survey images. These additions will directly support the claims in the abstract and results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark with independent physical-property ground truth

full rationale

The paper reports an empirical comparison of four tokenizers (Affine, AIM, JetFormer, VQ-VAE) on 640k DESI galaxy images using a shared AstroPT backbone. Performance is measured on reconstruction fidelity and prediction of independently measured physical properties. No equations, parameter fits, or derivations appear; the decoupling claim is an observed outcome across tasks rather than a quantity defined in terms of itself. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The setup is self-contained against external benchmarks (real galaxy properties), satisfying the criteria for score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are specified.

pith-pipeline@v0.9.1-grok · 5717 in / 988 out tokens · 28168 ms · 2026-06-25T20:28:43.592385+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 19 canonical work pages · 2 internal anchors

[1]

Amiaux, J., Scaramella, R., Mellier, Y ., Altieri, B., Burig- ana, C., Da Silva, A., Gomez, P., Hoar, J., Laureijs, R., Maiorano, E., Magalh˜aes Oliveira, D., Renk, F., Saavedra Criado, G., Tereno, I., Augu`eres, J. L., Brinchmann, J., Cropper, M., Duvet, L., Ealet, A., Franzetti, P., Garilli, B., Gondoin, P., Guzzo, L., Hoekstra, H., Holmes, R., Jahnke, ...

2012
[2]

Black Forest Labs

doi: 10.1117/12.926513. Black Forest Labs. FLUX.2: Analyzing and enhanc- ing the latent space of FLUX – representation compar- ison,

work page doi:10.1117/12.926513
[4]

J., Lang, D., et al

ISSN 1538-3881. doi: 10.3847/1538-3881/ab089d. El-Nouby, A., Klein, M., Zhai, S., Bautista, M. A., Toshev, A., Shankar, V ., Susskind, J. M., and Joulin, A. Scal- able Pre-training of Large Autoregressive Image Models. ArXiv e-prints,

work page doi:10.3847/1538-3881/ab089d
[5]

Esser, P., Rombach, R., and Ommer, B

doi: 10.48550/arXiv.2401.08541. Esser, P., Rombach, R., and Ommer, B. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883,

work page doi:10.48550/arxiv.2401.08541
[6]

He, K., Zhang, X., Ren, S., and Sun, J

doi: 10.48550/arXiv.2503.15312. He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition.arXiv,

work page doi:10.48550/arxiv.2503.15312
[7]

He, K., Zhang, X., Ren, S., and Sun, J

48550/arXiv.1512.03385. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn- ing for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778,

Pith/arXiv arXiv
[8]

URL https: //doi.org/10.3847/1538-4357/ae38b8

doi: 10.3847/1538-4357/ae38b8. URL https: //doi.org/10.3847/1538-4357/ae38b8. Ivezi´c, ˇZ., Kahn, S. M., Tyson, J. A., Abel, B., Acosta, E., Allsman, R., Alonso, D., AlSayyad, Y ., Anderson, S. F., Andrew, J., et al. LSST: From Science Drivers to Reference Design and Anticipated Data Products.The Astrophysical Journal, 873:111,

work page doi:10.3847/1538-4357/ae38b8
[9]

Leung, H

doi: 10.48550/ arXiv.2411.04750. Leung, H. W. and Bovy, J. Towards an astronomical foundation model for stars with a transformer-based model.Monthly Notices of the Royal Astronomical So- ciety, 527(1):1494–1520,

arXiv
[10]

doi: 10.1093/mnras/stad3015

ISSN 0035-8711. doi: 10.1093/mnras/stad3015. Loshchilov, I. and Hutter, F. Decoupled Weight Decay Regularization.ArXiv e-prints,

work page doi:10.1093/mnras/stad3015
[11]

McInnes, L., Healy, J., and Melville, J

doi: 10.48550/ arXiv.1711.05101. McInnes, L., Healy, J., and Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.ArXiv e-prints,

Pith/arXiv arXiv
[12]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

doi: 10.48550/arXiv. 1802.03426. Mousavi, P., Maimon, G., Moumen, A., Petermann, D., Shi, J., Wu, H., Yang, H., Kuznetsova, A., Ploujnikov, A., Marxer, R., Ramabhadran, B., Elizalde, B., Lugosch, L., Li, J., Subakan, C., Woodland, P., Kim, M., Lee, H.-y., Watanabe, S., Adi, Y ., and Ravanelli, M. Discrete Audio Tokens: More Than a Survey!ArXiv e-prints,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv
[13]

doi: 10.48550/arXiv.2506.10274. Parker, L., Lanusse, F., Shen, J., Liu, O., Hehir, T., Sarra, L., Meyer, L., Bowles, M., Wagner-Carena, S., Qu, H., Golkar, S., Bietti, A., Bourfoune, H., Casserau, N., Cornette, P., Hirashima, K., Krawezik, G., Ohana, R., Lourie, N., McCabe, M., Morel, R., Mukhopadhyay, P., Pettee, M., Blancard, B. R.-S., Cho, K., Cranmer,...

work page doi:10.48550/arxiv.2506.10274
[14]

arXiv e-prints , keywords =

doi: 10.48550/arXiv.2510.17960. Pearson, K. LIII. On lines and planes of closest fit to systems of points in space.London, Edinburgh, and Dublin Philosophical Magazine and Journal of Sci- ence, 2(11):559–572,

work page doi:10.48550/arxiv.2510.17960
[15]

1901 , pages =

ISSN 1941-5982. doi: 10.1080/14786440109462720. Radford, A., Wu, J., Child, R., Luan, D., A., D., and Sutskever, I. Language models are unsuper- vised multitask learners.OpenAI Whitepaper,

work page doi:10.1080/14786440109462720 1941
[16]

, keywords =

ISSN 1538-3881. doi: 10.3847/1538-3881/acb213. Sanjaripour, S., Hemmati, S., Mobasher, B., Canalizo, G., Barish, B. C., Shivaei, I., Coil, A. L., Chartab, N., Jafariyazani, M., Reddy, N. A., and Azadi, M. The application of manifold learning to a selection of dif- ferent galaxy populations and scaling relation analy- sis.The Astrophysical Journal, 977(2):202, dec

work page doi:10.3847/1538-3881/acb213
[17]

URL https://doi

doi: 10.3847/1538-4357/ad90ba. URL https://doi. org/10.3847/1538-4357/ad90ba. Sanjaripour, S., Aravindan, A., Canalizo, G., Hemmati, S., Mobasher, B., Coil, A. L., and Barish, B. C. Selec- tion of dwarf galaxies hosting active galactic nuclei: A measure of bias and contamination using unsupervised machine learning techniques.The Astrophysical Journal, 992...

work page doi:10.3847/1538-4357/ad90ba
[18]

URL https://doi.org/10.3847/1538-4357/ ae0326

doi: 10.3847/1538-4357/ae0326. URL https://doi.org/10.3847/1538-4357/ ae0326. Smith, M. J. and Geach, J. E. Astronomia ex machina: a his- tory, primer and outlook on neural networks in astronomy. R. Soc. Open Sci., 10(5):221454,

work page doi:10.3847/1538-4357/ae0326
[19]

doi: 10.1098/rsos.221454

ISSN 2054-5703. doi: 10.1098/rsos.221454. Smith, M. J., Roberts, R. J., Angeloudi, E., and Huertas- Company, M. AstroPT: Scaling Large Observation Models for Astronomy.ArXiv e-prints,

work page doi:10.1098/rsos.221454 2054
[20]

Strubell, E., Ganesh, A., and Mccallum, A

doi: 10.48550/arXiv.2405.14930. Strubell, E., Ganesh, A., and Mccallum, A. Energy and Policy Considerations for Deep Learning in NLP.ACL Anthology, pp. 3645–3650,

work page doi:10.48550/arxiv.2405.14930
[21]

CoLLaVO: Crayon large language and vision mOdel

doi: 10.18653/v1/ P19-1355. The Multimodal Universe Collaboration. The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data.Advances in Neu- ral Information Processing Systems, 37:57841–57913,

work page doi:10.18653/v1/
[22]

doi: 10.48550/arXiv.2411. 19722. van den Oord, A., Vinyals, O., and Kavukcuoglu, K. Neu- ral discrete representation learning. InNeural Informa- tion Processing Systems,

work page doi:10.48550/arxiv.2411
[23]

doi: 10.48550/arXiv.1706. 03762. Walmsley, M., G´eron, T., Kruk, S., Scaife, A. M. M., Lintott, C., Masters, K. L., Dawson, J. M., Dickinson, H., Fortson, L., Garland, I. L., et al. Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys.Monthly Notices of the Royal Astronomical Society, 526(3):4768–4786,

work page doi:10.48550/arxiv.1706
[24]

Vector-quantized Image Modeling with Improved VQGAN

ISSN 0035-8711. doi: 10.1093/mnras/stad2919. Yu, J., Li, X., Koh, J. Y ., Zhang, H., Pang, R., Qin, J., Ku, A., Xu, Y ., Baldridge, J., and Wu, Y . Vector-quantized image modeling with improved vqgan.arXiv preprint arXiv:2110.04627,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stad2919
[25]

Embedding Structure via PCA and UMAP We project the 768-dimensional embeddings onto two dimensions using PCA and UMAP analysis (Pearson, 1901; McInnes et al., 2018)

8 The Galaxy’s Guide to the Tokenizer A. Embedding Structure via PCA and UMAP We project the 768-dimensional embeddings onto two dimensions using PCA and UMAP analysis (Pearson, 1901; McInnes et al., 2018). The resulting projections are colour-coded by photometric redshift, g−r colour, r-band magnitude, and smoothness fraction as measured by Galaxy Zoo Ci...

1901

[1] [1]

Amiaux, J., Scaramella, R., Mellier, Y ., Altieri, B., Burig- ana, C., Da Silva, A., Gomez, P., Hoar, J., Laureijs, R., Maiorano, E., Magalh˜aes Oliveira, D., Renk, F., Saavedra Criado, G., Tereno, I., Augu`eres, J. L., Brinchmann, J., Cropper, M., Duvet, L., Ealet, A., Franzetti, P., Garilli, B., Gondoin, P., Guzzo, L., Hoekstra, H., Holmes, R., Jahnke, ...

2012

[2] [2]

Black Forest Labs

doi: 10.1117/12.926513. Black Forest Labs. FLUX.2: Analyzing and enhanc- ing the latent space of FLUX – representation compar- ison,

work page doi:10.1117/12.926513

[3] [4]

J., Lang, D., et al

ISSN 1538-3881. doi: 10.3847/1538-3881/ab089d. El-Nouby, A., Klein, M., Zhai, S., Bautista, M. A., Toshev, A., Shankar, V ., Susskind, J. M., and Joulin, A. Scal- able Pre-training of Large Autoregressive Image Models. ArXiv e-prints,

work page doi:10.3847/1538-3881/ab089d

[4] [5]

Esser, P., Rombach, R., and Ommer, B

doi: 10.48550/arXiv.2401.08541. Esser, P., Rombach, R., and Ommer, B. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883,

work page doi:10.48550/arxiv.2401.08541

[5] [6]

He, K., Zhang, X., Ren, S., and Sun, J

doi: 10.48550/arXiv.2503.15312. He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition.arXiv,

work page doi:10.48550/arxiv.2503.15312

[6] [7]

He, K., Zhang, X., Ren, S., and Sun, J

48550/arXiv.1512.03385. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn- ing for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778,

Pith/arXiv arXiv

[7] [8]

URL https: //doi.org/10.3847/1538-4357/ae38b8

doi: 10.3847/1538-4357/ae38b8. URL https: //doi.org/10.3847/1538-4357/ae38b8. Ivezi´c, ˇZ., Kahn, S. M., Tyson, J. A., Abel, B., Acosta, E., Allsman, R., Alonso, D., AlSayyad, Y ., Anderson, S. F., Andrew, J., et al. LSST: From Science Drivers to Reference Design and Anticipated Data Products.The Astrophysical Journal, 873:111,

work page doi:10.3847/1538-4357/ae38b8

[8] [9]

Leung, H

doi: 10.48550/ arXiv.2411.04750. Leung, H. W. and Bovy, J. Towards an astronomical foundation model for stars with a transformer-based model.Monthly Notices of the Royal Astronomical So- ciety, 527(1):1494–1520,

arXiv

[9] [10]

doi: 10.1093/mnras/stad3015

ISSN 0035-8711. doi: 10.1093/mnras/stad3015. Loshchilov, I. and Hutter, F. Decoupled Weight Decay Regularization.ArXiv e-prints,

work page doi:10.1093/mnras/stad3015

[10] [11]

McInnes, L., Healy, J., and Melville, J

doi: 10.48550/ arXiv.1711.05101. McInnes, L., Healy, J., and Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.ArXiv e-prints,

Pith/arXiv arXiv

[11] [12]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

doi: 10.48550/arXiv. 1802.03426. Mousavi, P., Maimon, G., Moumen, A., Petermann, D., Shi, J., Wu, H., Yang, H., Kuznetsova, A., Ploujnikov, A., Marxer, R., Ramabhadran, B., Elizalde, B., Lugosch, L., Li, J., Subakan, C., Woodland, P., Kim, M., Lee, H.-y., Watanabe, S., Adi, Y ., and Ravanelli, M. Discrete Audio Tokens: More Than a Survey!ArXiv e-prints,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv

[12] [13]

doi: 10.48550/arXiv.2506.10274. Parker, L., Lanusse, F., Shen, J., Liu, O., Hehir, T., Sarra, L., Meyer, L., Bowles, M., Wagner-Carena, S., Qu, H., Golkar, S., Bietti, A., Bourfoune, H., Casserau, N., Cornette, P., Hirashima, K., Krawezik, G., Ohana, R., Lourie, N., McCabe, M., Morel, R., Mukhopadhyay, P., Pettee, M., Blancard, B. R.-S., Cho, K., Cranmer,...

work page doi:10.48550/arxiv.2506.10274

[13] [14]

arXiv e-prints , keywords =

doi: 10.48550/arXiv.2510.17960. Pearson, K. LIII. On lines and planes of closest fit to systems of points in space.London, Edinburgh, and Dublin Philosophical Magazine and Journal of Sci- ence, 2(11):559–572,

work page doi:10.48550/arxiv.2510.17960

[14] [15]

1901 , pages =

ISSN 1941-5982. doi: 10.1080/14786440109462720. Radford, A., Wu, J., Child, R., Luan, D., A., D., and Sutskever, I. Language models are unsuper- vised multitask learners.OpenAI Whitepaper,

work page doi:10.1080/14786440109462720 1941

[15] [16]

, keywords =

ISSN 1538-3881. doi: 10.3847/1538-3881/acb213. Sanjaripour, S., Hemmati, S., Mobasher, B., Canalizo, G., Barish, B. C., Shivaei, I., Coil, A. L., Chartab, N., Jafariyazani, M., Reddy, N. A., and Azadi, M. The application of manifold learning to a selection of dif- ferent galaxy populations and scaling relation analy- sis.The Astrophysical Journal, 977(2):202, dec

work page doi:10.3847/1538-3881/acb213

[16] [17]

URL https://doi

doi: 10.3847/1538-4357/ad90ba. URL https://doi. org/10.3847/1538-4357/ad90ba. Sanjaripour, S., Aravindan, A., Canalizo, G., Hemmati, S., Mobasher, B., Coil, A. L., and Barish, B. C. Selec- tion of dwarf galaxies hosting active galactic nuclei: A measure of bias and contamination using unsupervised machine learning techniques.The Astrophysical Journal, 992...

work page doi:10.3847/1538-4357/ad90ba

[17] [18]

URL https://doi.org/10.3847/1538-4357/ ae0326

doi: 10.3847/1538-4357/ae0326. URL https://doi.org/10.3847/1538-4357/ ae0326. Smith, M. J. and Geach, J. E. Astronomia ex machina: a his- tory, primer and outlook on neural networks in astronomy. R. Soc. Open Sci., 10(5):221454,

work page doi:10.3847/1538-4357/ae0326

[18] [19]

doi: 10.1098/rsos.221454

ISSN 2054-5703. doi: 10.1098/rsos.221454. Smith, M. J., Roberts, R. J., Angeloudi, E., and Huertas- Company, M. AstroPT: Scaling Large Observation Models for Astronomy.ArXiv e-prints,

work page doi:10.1098/rsos.221454 2054

[19] [20]

Strubell, E., Ganesh, A., and Mccallum, A

doi: 10.48550/arXiv.2405.14930. Strubell, E., Ganesh, A., and Mccallum, A. Energy and Policy Considerations for Deep Learning in NLP.ACL Anthology, pp. 3645–3650,

work page doi:10.48550/arxiv.2405.14930

[20] [21]

CoLLaVO: Crayon large language and vision mOdel

doi: 10.18653/v1/ P19-1355. The Multimodal Universe Collaboration. The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data.Advances in Neu- ral Information Processing Systems, 37:57841–57913,

work page doi:10.18653/v1/

[21] [22]

doi: 10.48550/arXiv.2411. 19722. van den Oord, A., Vinyals, O., and Kavukcuoglu, K. Neu- ral discrete representation learning. InNeural Informa- tion Processing Systems,

work page doi:10.48550/arxiv.2411

[22] [23]

doi: 10.48550/arXiv.1706. 03762. Walmsley, M., G´eron, T., Kruk, S., Scaife, A. M. M., Lintott, C., Masters, K. L., Dawson, J. M., Dickinson, H., Fortson, L., Garland, I. L., et al. Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys.Monthly Notices of the Royal Astronomical Society, 526(3):4768–4786,

work page doi:10.48550/arxiv.1706

[23] [24]

Vector-quantized Image Modeling with Improved VQGAN

ISSN 0035-8711. doi: 10.1093/mnras/stad2919. Yu, J., Li, X., Koh, J. Y ., Zhang, H., Pang, R., Qin, J., Ku, A., Xu, Y ., Baldridge, J., and Wu, Y . Vector-quantized image modeling with improved vqgan.arXiv preprint arXiv:2110.04627,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stad2919

[24] [25]

Embedding Structure via PCA and UMAP We project the 768-dimensional embeddings onto two dimensions using PCA and UMAP analysis (Pearson, 1901; McInnes et al., 2018)

8 The Galaxy’s Guide to the Tokenizer A. Embedding Structure via PCA and UMAP We project the 768-dimensional embeddings onto two dimensions using PCA and UMAP analysis (Pearson, 1901; McInnes et al., 2018). The resulting projections are colour-coded by photometric redshift, g−r colour, r-band magnitude, and smoothness fraction as measured by Galaxy Zoo Ci...

1901