GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

Juexi Shao; Juntao Yu; Siyou Li; Ziyu Zhai

arxiv: 2605.06641 · v1 · submitted 2026-05-07 · 💻 cs.AI · cs.CV

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

Ziyu Zhai , Siyou Li , Juexi Shao , Juntao Yu This is my paper

Pith reviewed 2026-05-08 09:32 UTC · model grok-4.3

classification 💻 cs.AI cs.CV

keywords ceramic glazesAI material designproperty predictionimage generationbenchmark datasetmachine learningmultimodal models

0 comments

The pith

A dataset of 23,148 real glaze recipes enables AI to predict fired surface properties and generate matching images from raw materials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GlazyBench as the first large dataset for AI-assisted ceramic glaze design. It contains 23,148 real formulations and supports two main tasks: predicting post-firing properties such as color and transparency from the list of raw materials, and generating visual images of the finished glaze. The work sets baseline results using traditional machine learning, large language models, and generative AI techniques, which achieve partial success but leave room for improvement. This provides a shared testbed so that future models can be compared systematically when helping artists reduce the trial-and-error costs of developing new glazes.

Core claim

GlazyBench supplies 23,148 real glaze formulations that allow models to learn the mapping from ingredient combinations to post-firing color, transparency, and visual appearance, with experiments on property prediction and image generation yielding promising but imperfect results.

What carries the argument

The GlazyBench dataset of real-world glaze recipes paired with their fired properties and images, used as training and test data for property prediction and image generation models.

Load-bearing premise

The 23,148 collected formulations accurately represent the range of possible glazes and their outcomes without collection biases that would limit model reliability on new designs.

What would settle it

Collect new glaze recipes outside the dataset, fire them under controlled conditions, and check whether the AI predictions of color, transparency, and generated images match the actual fired results.

Figures

Figures reproduced from arXiv: 2605.06641 by Juexi Shao, Juntao Yu, Siyou Li, Ziyu Zhai.

**Figure 1.** Figure 1: Two-step image generation task task that explicitly connects recipe representation, firing context, appearance properties, and image generation. The first step extracts the Unity Molecular Formula (UMF) from raw material information. It combines this formula with the cone rating and firing atmosphere to predict the surface properties of the glaze, including color, surface texture, and transparency. The sec… view at source ↗

**Figure 2.** Figure 2: Image region extraction pipeline. tasks, the category distributions remain consistent between the training and test sets, with a KL divergence below 0.12. This consistency supports adequate representation and reduces the risk of evaluation bias caused by distribution shifts. 2.3 Data For Image Generation The data used for image generation were manually re-annotated based on the previous test set. This was … view at source ↗

**Figure 3.** Figure 3: LLM’s image generation results under three different prompt conditions view at source ↗

read the original abstract

Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, the first dataset for AI-assisted glaze design. Comprising 23,148 real glaze formulations, GlazyBench supports two primary tasks: predicting post-firing surface properties, such as color and transparency, from raw materials, and generating accurate visual representations of the glaze based on these properties. We establish comprehensive baselines for property prediction using traditional machine learning and large language models, alongside image generation benchmarks using deep generative and large multimodal models. Our experiments demonstrate promising yet challenging results. GlazyBench pioneers a new research direction in AI-assisted material design, providing a standardized benchmark for systematic evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GlazyBench releases a new 23k-recipe dataset for glaze property prediction and image generation, but the value hinges on unshown details about label accuracy and data sourcing.

read the letter

The main thing here is a dataset release: 23,148 real glaze formulations pulled together for two tasks, predicting post-firing properties like color and transparency from ingredients, and generating images of the result. They run baselines with standard ML, LLMs, and generative models, and report the outcomes as promising but still hard. That is the concrete addition—no prior public benchmark covers this exact ceramics niche at scale, so it gives people working on multimodal models or materials AI a starting point they did not have before.

Referee Report

2 major / 2 minor

Summary. The paper introduces GlazyBench, a dataset of 23,148 real glaze formulations sourced from user-submitted repositories, positioned as the first benchmark for AI-assisted ceramic glaze design. It defines two core tasks: (1) predicting post-firing properties such as color and transparency from raw material compositions, with baselines using traditional ML and LLMs, and (2) generating visual representations of fired glazes using deep generative and large multimodal models. The authors report promising yet challenging baseline results and claim the resource enables systematic evaluation in a new research direction.

Significance. If the dataset proves representative and labels reliable, this benchmark could meaningfully advance AI applications in material design by reducing costly trial-and-error for ceramic artists and providing a standardized testbed. The release of baselines for both prediction and generation tasks is a constructive starting point that lowers the barrier for follow-on work.

major comments (2)

[Data Collection] Data Collection and Validation: The manuscript provides insufficient documentation on sourcing the 23,148 formulations (e.g., from Glazy.org), including any deduplication procedures, validation of user-reported post-firing properties against actual firing outcomes, inter-rater reliability for labels such as color and transparency, or quantitative coverage metrics (e.g., diversity in oxide compositions via PCA or firing schedule distributions). This directly undermines the central claim that models trained on GlazyBench will yield reliable predictions and generations for new designs.
[Experiments] Baseline Experiments: No quantitative performance metrics, error breakdowns, train/validation/test splits, or statistical validation details are reported for the property prediction or image generation baselines. Without these, the statement of 'promising yet challenging results' cannot be evaluated and does not yet support the benchmark's claimed utility.

minor comments (2)

[Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., best MAE or FID score) to ground the 'promising' claim.
[Methods] Notation for input features (oxide compositions) and output properties should be defined consistently in a table or early section to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We have carefully considered each point and provide detailed responses below, along with plans for revisions to improve the clarity and rigor of the paper.

read point-by-point responses

Referee: [Data Collection] Data Collection and Validation: The manuscript provides insufficient documentation on sourcing the 23,148 formulations (e.g., from Glazy.org), including any deduplication procedures, validation of user-reported post-firing properties against actual firing outcomes, inter-rater reliability for labels such as color and transparency, or quantitative coverage metrics (e.g., diversity in oxide compositions via PCA or firing schedule distributions). This directly undermines the central claim that models trained on GlazyBench will yield reliable predictions and generations for new designs.

Authors: We agree that additional documentation on data collection would strengthen the manuscript. In the revised version, we will expand the Data section to include: (1) details on sourcing from Glazy.org, including how formulations were collected via their public API or repository; (2) deduplication procedures, such as normalizing compositions to 100% and removing entries with identical oxide percentages; (3) quantitative coverage metrics, including PCA visualizations of the oxide composition space and distributions of firing schedules (temperature and hold times). For validation, since the properties are user-reported based on their firing experiences, we cannot provide independent lab validation for the entire dataset due to resource constraints. We will explicitly discuss this as a limitation of the benchmark, noting that Glazy.org entries often include photos and community feedback which provide some corroboration. Inter-rater reliability is not available as each formulation has a single reporter. These additions will better contextualize the dataset's strengths and limitations without overstating its reliability. revision: partial
Referee: [Experiments] Baseline Experiments: No quantitative performance metrics, error breakdowns, train/validation/test splits, or statistical validation details are reported for the property prediction or image generation baselines. Without these, the statement of 'promising yet challenging results' cannot be evaluated and does not yet support the benchmark's claimed utility.

Authors: We acknowledge that the experimental results section would benefit from more detailed quantitative reporting. We will revise the Experiments section to include: specific performance metrics such as mean absolute error (MAE) and root mean square error (RMSE) for property predictions (e.g., for color in CIELAB space and transparency), along with breakdowns by key factors like dominant oxides or firing temperature ranges. For image generation, we will report Fréchet Inception Distance (FID), Learned Perceptual Image Patch Similarity (LPIPS), and other relevant metrics, supported by statistical analysis including confidence intervals. We will clearly describe the train/validation/test splits used (e.g., random 70/15/15 split with stratification to ensure diversity), and any cross-validation procedures. These details will allow readers to fully evaluate the baselines and the benchmark's utility. The 'promising yet challenging' characterization will be supported by these numbers. revision: yes

standing simulated objections not resolved

Complete independent validation of all user-submitted post-firing properties against controlled laboratory experiments, due to the scale (23k entries) and crowdsourced nature of the data.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper is a dataset release and benchmark paper that introduces GlazyBench comprising 23,148 real glaze formulations and establishes baselines for property prediction and image generation tasks. There are no mathematical derivations, equations, fitted parameters, or predictions that reduce to their own inputs by construction. The central claims rest on data collection and experimental baselines rather than any self-definitional, self-citation load-bearing, or ansatz-smuggled steps. This is the most common honest finding for benchmark papers and receives the default low score.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the existence and utility of the collected dataset; no free parameters, mathematical axioms, or new invented entities are introduced beyond the dataset itself.

pith-pipeline@v0.9.0 · 5453 in / 1222 out tokens · 52890 ms · 2026-05-08T09:32:51.416850+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references

[1]

Ceramics international47(6), 7946–7956 (2021)

Ahmmad, S.K., Jabeen, N., Ahmed, S.T.U., Ahmed, S.A., Rahman, S.: Artificial intelligence density model for oxide glasses. Ceramics international47(6), 7946–7956 (2021)

2021
[2]

Molecules30(8), 1745 (2025)

Belciu, M.I., Velea, A.: Ensemble machine learning for the prediction and understanding of the refractive index in chalcogenide glasses. Molecules30(8), 1745 (2025)

2025
[3]

Tunnelling and underground space technology124, 104448 (2022)

Bo, Y ., Liu, Q., Huang, X., Pan, Y .: Real-time hard-rock tunnel prediction model for rock mass classification using catboost integrated with sequential model-based optimization. Tunnelling and underground space technology124, 104448 (2022)

2022
[4]

Journal of the european ceramic society26(3), 311–316 (2006)

Bondioli, F., Manfredini, T., Romagnoli, M.: Color matching algorithms in ceramic tile production. Journal of the european ceramic society26(3), 311–316 (2006)

2006
[5]

Journal of the European Ceramic Society30(12), 2451–2455 (2010)

Castela, A., Fonseca, A., Mantas, P.: Development of coloured glazes for tile applications using taguchi’s method. Journal of the European Ceramic Society30(12), 2451–2455 (2010)

2010
[6]

Computer Science Review59, 100845 (2026)

Chakraborty, S., Björk, J., Dahlqvist, M., Rosen, J., Heintz, F.: A survey of ai-supported materials informatics. Computer Science Review59, 100845 (2026)

2026
[7]

IEEE Computer Graphics and Applications40(5), 100–107 (2020) 10 GlazyBench

Chen, S.S.C., Cui, H., Tan, P., Sun, X., Ji, Y ., Duh, H.: Cantonese porcelain image generation using user-guided generative adversarial networks. IEEE Computer Graphics and Applications40(5), 100–107 (2020) 10 GlazyBench

2020
[8]

In: International conference on learning representations (2021)

Ding, X., Wang, Y ., Xu, Z., Welch, W.J., Wang, Z.J.: Ccgan: Continuous conditional generative adversarial networks for image generation. In: International conference on learning representations (2021)

2021
[9]

Progress in Geophysics40(1), 230–242 (2025)

FENG, H., ZHANG, G., CAO, J., REN, H., WAN, W., LIU, D.: Application of woa optimized lightgbm in lithology identification of igneous logging. Progress in Geophysics40(1), 230–242 (2025)

2025
[10]

Journal of the European Ceramic Society43(14), 6581–6589 (2023)

Feng, L., Wang, F., Luo, H., Zhu, J., Wang, M., Yang, C., Sun, J., Wang, T.: Phase-separated tenmoku “blue” glaze: Microstructure and coloring mechanism. Journal of the European Ceramic Society43(14), 6581–6589 (2023)

2023
[11]

Journal of Computational Methods in Sciences and Engineering p

Fu, Z.: Digital color enhancement in ceramic imagery using graph-guided residual learning and adaptive scattering models. Journal of Computational Methods in Sciences and Engineering p. 14727978251391297 (2025)

2025
[12]

Communications Materials3(1), 59 (2022)

Fujinuma, N., DeCost, B., Hattrick-Simpers, J., Lofland, S.E.: Why big data and compute are not necessarily the path to big materials science. Communications Materials3(1), 59 (2022)

2022
[13]

In: 2010 3rd International conference on computer science and information technology

Gao, W., Zhang, X., Yang, L., Liu, H.: An improved sobel edge detection. In: 2010 3rd International conference on computer science and information technology. vol. 5, pp. 67–71. IEEE (2010)

2010
[14]

Glazy Contributors: Glazy.https://glazy.org/(2026),https://glazy.org/, accessed: 2026-02-01

2026
[15]

Ceramics International42(15), 17222–17228 (2016)

Imer, C., Günay, E., Öveço˘glu, M.: Effects of firing temperatures and compositions on the formation of nano particles in lustre layers on a lead-alkali glaze. Ceramics International42(15), 17222–17228 (2016)

2016
[16]

In: 2019 6th international conference on systems and informatics (ICSAI)

Jin, Q., Luo, X., Shi, Y ., Kita, K.: Image generation method based on improved condition gan. In: 2019 6th international conference on systems and informatics (ICSAI). pp. 1290–1294. IEEE (2019)

2019
[17]

Ieee Access8, 60338–60343 (2020)

Li, Y ., Fu, R., Meng, X., Jin, W., Shao, F.: A sar-to-optical image translation method based on conditional generation adversarial network (cgan). Ieee Access8, 60338–60343 (2020)

2020
[18]

Journal of Non-Crystalline Solids557, 119419 (2021)

Liu, H., Fu, Z., Yang, K., Xu, X., Bauchy, M.: Machine learning for glass science and engineering: A review. Journal of Non-Crystalline Solids557, 119419 (2021)

2021
[19]

Construction and Building Materials498, 143712 (2025)

Mao, L.x., He, F., Li, L., Xu, W., Wang, Y ., Liu, Q.f.: A quantitative study of phase assemblage in cement-fly ash-slag ternary systems using machine learning-assisted bse-eds image analysis. Construction and Building Materials498, 143712 (2025)

2025
[20]

Applied Computing and Geosciences p

Mues, M., Kraemer, D., Styn, D.M.E.: Using machine learning classifiers together with discrimination diagrams for validation of rock classification labels. Applied Computing and Geosciences p. 100288 (2025)

2025
[21]

IEEE Transactions on pattern analysis and machine intelligence24(7), 971–987 (2002)

Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence24(7), 971–987 (2002)

2002
[22]

Automation in Construction135, 104144 (2022)

Riedel, H., Mokdad, S., Schulz, I., Kocer, C., Rosendahl, P.L., Schneider, J., Kraus, M.A., Drass, M.: Automated quality control of vacuum insulated glazing by convolutional neural network image classification. Automation in Construction135, 104144 (2022)

2022
[23]

In: International Congress of Ceramic Materiali

Romagnoli, M., Bondioli, F., Barattini, M., et al.: Neural network approach for color matching of ceramic glazes. In: International Congress of Ceramic Materiali. vol. 1, pp. xx–xx. ECERS (2008)

2008
[24]

grabcut

Rother, C., Kolmogorov, V ., Blake, A.: " grabcut" interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG)23(3), 309–314 (2004)

2004
[25]

Journal of Applied Geophysics195, 104480 (2021)

Ruiyi, H., Zhuwen, W., Wenhua, W., Fanghui, X., Xinghua, Q., Yitong, C.: Lithology identification of igneous rocks based on xgboost and conventional logging curves, a case study of the eastern depression of liaohe basin. Journal of Applied Geophysics195, 104480 (2021)

2021
[26]

Integrating Materials and Manufacturing Innovation6(2), 172–186 (2017)

Rumble Jr, J.R.: Accessing materials data: challenges and directions in the digital era. Integrating Materials and Manufacturing Innovation6(2), 172–186 (2017)

2017
[27]

Journal of Manufacturing and Materials Processing9(7), 213 (2025)

Santos, T., Hennetier, L., Costa, V .A., Costa, L.C.: Temperature assessment through decal color in microwave-fired porcelain. Journal of Manufacturing and Materials Processing9(7), 213 (2025)

2025
[28]

Journal of the European Ceramic Society31(5), 659–664 (2011)

Schabbach, L., Bondioli, F., Fredel, M.: Colouring of opaque ceramic glaze with zircon pigments: Formulation with simplified kubelka–munk model. Journal of the European Ceramic Society31(5), 659–664 (2011)

2011
[29]

Dyes and pigments99(3), 1029–1035 (2013)

Schabbach, L., Bondioli, F., Fredel, M.: Color prediction with simplified kubelka–munk model in glazes containing fe2o3–zrsio4 coral pink pigments. Dyes and pigments99(3), 1029–1035 (2013)

2013
[30]

Applied Computing and Geosciences15, 100090 (2022)

Trott, M., Leybourne, M., Hall, L., Layton-Matthews, D.: Random forest rock type classification with integration of geochemical and photographic data. Applied Computing and Geosciences15, 100090 (2022)

2022
[31]

Geochemistry, Geophysics, Geosystems19(4), 1327–1347 (2018) 11 GlazyBench

Ueki, K., Hino, H., Kuwatani, T.: Geochemical discrimination and characteristics of magmatic tectonic settings: A machine-learning-based approach. Geochemistry, Geophysics, Geosystems19(4), 1327–1347 (2018) 11 GlazyBench

2018
[32]

Scientific reports15(1), 31397 (2025)

Vasi´c, M.V ., Awoyera, P.O., Fadugba, O.G., Bariši´c, I., Grubeša, I.N.: Advanced machine learning models for the prediction of ceramic tiles’ properties during the firing stage. Scientific reports15(1), 31397 (2025)

2025
[33]

Electronics14(11), 2185 (2025)

Wang, Y ., Zhang, G.: Lightweight text-to-image generation model based on contrastive language-image pre- training embeddings and conditional variational autoencoders. Electronics14(11), 2185 (2025)

2025
[34]

Sensors20(7), 1834 (2020)

Wei, J., Hao, Y ., Fu, Y ., Yang, L., Gan, J., Li, H.: Experimental study on glaze icing detection of 110 kv composite insulators using fiber bragg gratings. Sensors20(7), 1834 (2020)

2020
[35]

Ceramics International47(23), 32817–32827 (2021)

Wu, B., Zhao, W., Ren, X., Liu, X., Li, B., Feng, S., Feng, X., Zhao, H.: Firing process and colouring mechanism of black glaze and brown glaze porcelains from the yuan and ming dynasties from the qingliang temple kiln in baofeng, henan, china. Ceramics International47(23), 32817–32827 (2021)

2021
[36]

Nanomaterials15(11), 860 (2025)

Xie, Y ., Wang, X.: Prediction of thermal and optical properties of oxyfluoride glasses based on interpretable machine learning. Nanomaterials15(11), 860 (2025)

2025
[37]

Industrial Engineering & Management Systems24(4), 650–662 (2025)

Yamagiwa, A., Goto, M., et al.: An analytical model using cvae-based image generation from product descriptions and image data. Industrial Engineering & Management Systems24(4), 650–662 (2025)

2025
[38]

In: European conference on computer vision

Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2image: Conditional image generation from visual attributes. In: European conference on computer vision. pp. 776–791. Springer (2016)

2016
[39]

Wear477, 203837 (2021)

Zhang, C., Neu, R.W.: Understanding the role of glaze layer with aligned images from multiple surface characteri- zation techniques. Wear477, 203837 (2021)

2021
[40]

Minerals15(9), 923 (2025)

Zhang, P., Xi, X., Wang, B.C.: Geochemical signatures and element interactions of volcanic-hosted agates: Insights from interpretable machine learning. Minerals15(9), 923 (2025)

2025
[41]

npj Materials Degradation4(1), 14 (2020)

Zhang, Y ., Li, A., Deng, B., Hughes, K.K.: Data-driven predictive models for chemical durability of oxide glass under different chemical conditions. npj Materials Degradation4(1), 14 (2020)

2020
[42]

Zhao, L., Zhang, Y .: Revealing the individual effects of firing temperature and chemical composition on raman parameters of celadon glaze. Ceramics6(2), 1263–1276 (2023) 12 GlazyBench A Appendix A: Data Preprocessing Details A.1 Color Annotation Methodology Transparency and surface-texture labels are obtained directly from structured dropdown menus on th...

2023
[43]

The two best-performing models—Random Forest and XGBoost—are retained and combined into an ensemble for downstream color selection

Reference model (ensemble construction).We train and compare four machine-learning models to learn the recipe-to-color mapping from the manually labeled data. The two best-performing models—Random Forest and XGBoost—are retained and combined into an ensemble for downstream color selection
[44]

Let the two predicted candidates be c1,c 2 ∈R 3, and let ¯cpred denote their centroid

RGB-based agreement and selection.The two models independently predict an RGB color. Let the two predicted candidates be c1,c 2 ∈R 3, and let ¯cpred denote their centroid. We compute Euclidean distances dk = ck − ¯cpred 2, k∈ {1,2},and selectarg min k dk. Intuitively, this step prefers the candidate closer to the consensus of the two predictors
[45]

After filtering, 12,175 training samples remain with validated color annotations

Ambiguity filtering.If |d1 −d 2|<10 , the two candidates are considered equally plausible and the sample is marked as ambiguous and discarded. After filtering, 12,175 training samples remain with validated color annotations. Sanity check.All 3,097 samples previously marked asuncertainduring manual curation are removed by the above filtering pipeline, supp...
[46]

Chemical composition (wt.% oxides).All oxide weight percentages larger than 0.01% are listed in the format Oxide: value%(comma-separated), e.g.,SiO2: 45.20%, Al2O3: 12.80%, CaO: 8.50%,
[47]

UMF formula.All UMF entries larger than 0.01 are listed as Oxide: value and prefixed by UMF Formula:
[48]

Otherwise, the field is set to No additional firing parameters available

Firing parameters.If available, we include cone information ( Cone: N orCone Range: N–M) and atmo- sphere (Oxidation or Reduction). Otherwise, the field is set to No additional firing parameters available. C.3 Prompt Design For each task, we use a unified prompt template that supports both zero-shot and few-shot evaluation. The template consists of:
[49]

a role declaration and task instruction
[50]

an explicit, enumerated label set with short descriptions
[51]

domain rules connecting oxides/firing conditions to visual properties
[52]

an optional few-shot block{few_shot_examples}
[53]

the query sample (three input blocks as above)
[54]

For zero-shot evaluation (K= 0 ), the few-shot block is omitted

a strict output constraint:output exactly one label from the allowed set. For zero-shot evaluation (K= 0 ), the few-shot block is omitted. For K-shot evaluation, the block is populated as described in Section C.4. Task-specific instantiations.The three tasks share the same structure but differ in label sets and domain rules: • Transparency (4 classes).Lab...
[55]

Group them by class

Collect training samples that (i) have valid labels for the target task and (ii) contain non-empty chemical composition data. Group them by class
[56]

Classes with no remaining samples are removed from the rotation

Iterate classes in insertion order and draw one example per class in sequence until K examples are obtained. Classes with no remaining samples are removed from the rotation
[57]

This procedure encourages class coverage in-context, ensuring up to min(K,|C|) distinct classes appear in the prompt

Serialize each selected example using the same three-block format as the query, followed by Answer: {label}. This procedure encourages class coverage in-context, ensuring up to min(K,|C|) distinct classes appear in the prompt. This is particularly relevant for imbalanced tasks (e.g., surface texture, whereGlossyaccounts for 49% of samples). Few-shot block...
[58]

Strip leading/trailing whitespace and quotation characters, then extract the first line
[59]

Iterate through the ordered list of valid labels and return the first label whose lowercase form appears as a substring of the lowercase response line
[60]

For multi-word labels (e.g.,Semi-opaque,Satin-matte,Smooth Matte), we accept both hyphenated and space-separated variants
[61]

Outputs that match none of the valid labels are recorded as parsing failures and excluded from metric computation. 18 GlazyBench D Appendix D: Specifications of Image-Generation Baselines This appendix reports the technical specifications of two baseline models for the conditional glaze image generation task (Task D), including the problem formulation, mo...
[62]

Resize to 128×128 using Lanczos resampling
[63]

Normalize pixel values to [−1,1] via (x/255−0.5)/0.5
[64]

Apply random horizontal flipping (probability 0.5) to training images only. 20

[1] [1]

Ceramics international47(6), 7946–7956 (2021)

Ahmmad, S.K., Jabeen, N., Ahmed, S.T.U., Ahmed, S.A., Rahman, S.: Artificial intelligence density model for oxide glasses. Ceramics international47(6), 7946–7956 (2021)

2021

[2] [2]

Molecules30(8), 1745 (2025)

Belciu, M.I., Velea, A.: Ensemble machine learning for the prediction and understanding of the refractive index in chalcogenide glasses. Molecules30(8), 1745 (2025)

2025

[3] [3]

Tunnelling and underground space technology124, 104448 (2022)

Bo, Y ., Liu, Q., Huang, X., Pan, Y .: Real-time hard-rock tunnel prediction model for rock mass classification using catboost integrated with sequential model-based optimization. Tunnelling and underground space technology124, 104448 (2022)

2022

[4] [4]

Journal of the european ceramic society26(3), 311–316 (2006)

Bondioli, F., Manfredini, T., Romagnoli, M.: Color matching algorithms in ceramic tile production. Journal of the european ceramic society26(3), 311–316 (2006)

2006

[5] [5]

Journal of the European Ceramic Society30(12), 2451–2455 (2010)

Castela, A., Fonseca, A., Mantas, P.: Development of coloured glazes for tile applications using taguchi’s method. Journal of the European Ceramic Society30(12), 2451–2455 (2010)

2010

[6] [6]

Computer Science Review59, 100845 (2026)

Chakraborty, S., Björk, J., Dahlqvist, M., Rosen, J., Heintz, F.: A survey of ai-supported materials informatics. Computer Science Review59, 100845 (2026)

2026

[7] [7]

IEEE Computer Graphics and Applications40(5), 100–107 (2020) 10 GlazyBench

Chen, S.S.C., Cui, H., Tan, P., Sun, X., Ji, Y ., Duh, H.: Cantonese porcelain image generation using user-guided generative adversarial networks. IEEE Computer Graphics and Applications40(5), 100–107 (2020) 10 GlazyBench

2020

[8] [8]

In: International conference on learning representations (2021)

Ding, X., Wang, Y ., Xu, Z., Welch, W.J., Wang, Z.J.: Ccgan: Continuous conditional generative adversarial networks for image generation. In: International conference on learning representations (2021)

2021

[9] [9]

Progress in Geophysics40(1), 230–242 (2025)

FENG, H., ZHANG, G., CAO, J., REN, H., WAN, W., LIU, D.: Application of woa optimized lightgbm in lithology identification of igneous logging. Progress in Geophysics40(1), 230–242 (2025)

2025

[10] [10]

Journal of the European Ceramic Society43(14), 6581–6589 (2023)

Feng, L., Wang, F., Luo, H., Zhu, J., Wang, M., Yang, C., Sun, J., Wang, T.: Phase-separated tenmoku “blue” glaze: Microstructure and coloring mechanism. Journal of the European Ceramic Society43(14), 6581–6589 (2023)

2023

[11] [11]

Journal of Computational Methods in Sciences and Engineering p

Fu, Z.: Digital color enhancement in ceramic imagery using graph-guided residual learning and adaptive scattering models. Journal of Computational Methods in Sciences and Engineering p. 14727978251391297 (2025)

2025

[12] [12]

Communications Materials3(1), 59 (2022)

Fujinuma, N., DeCost, B., Hattrick-Simpers, J., Lofland, S.E.: Why big data and compute are not necessarily the path to big materials science. Communications Materials3(1), 59 (2022)

2022

[13] [13]

In: 2010 3rd International conference on computer science and information technology

Gao, W., Zhang, X., Yang, L., Liu, H.: An improved sobel edge detection. In: 2010 3rd International conference on computer science and information technology. vol. 5, pp. 67–71. IEEE (2010)

2010

[14] [14]

Glazy Contributors: Glazy.https://glazy.org/(2026),https://glazy.org/, accessed: 2026-02-01

2026

[15] [15]

Ceramics International42(15), 17222–17228 (2016)

Imer, C., Günay, E., Öveço˘glu, M.: Effects of firing temperatures and compositions on the formation of nano particles in lustre layers on a lead-alkali glaze. Ceramics International42(15), 17222–17228 (2016)

2016

[16] [16]

In: 2019 6th international conference on systems and informatics (ICSAI)

Jin, Q., Luo, X., Shi, Y ., Kita, K.: Image generation method based on improved condition gan. In: 2019 6th international conference on systems and informatics (ICSAI). pp. 1290–1294. IEEE (2019)

2019

[17] [17]

Ieee Access8, 60338–60343 (2020)

Li, Y ., Fu, R., Meng, X., Jin, W., Shao, F.: A sar-to-optical image translation method based on conditional generation adversarial network (cgan). Ieee Access8, 60338–60343 (2020)

2020

[18] [18]

Journal of Non-Crystalline Solids557, 119419 (2021)

Liu, H., Fu, Z., Yang, K., Xu, X., Bauchy, M.: Machine learning for glass science and engineering: A review. Journal of Non-Crystalline Solids557, 119419 (2021)

2021

[19] [19]

Construction and Building Materials498, 143712 (2025)

Mao, L.x., He, F., Li, L., Xu, W., Wang, Y ., Liu, Q.f.: A quantitative study of phase assemblage in cement-fly ash-slag ternary systems using machine learning-assisted bse-eds image analysis. Construction and Building Materials498, 143712 (2025)

2025

[20] [20]

Applied Computing and Geosciences p

Mues, M., Kraemer, D., Styn, D.M.E.: Using machine learning classifiers together with discrimination diagrams for validation of rock classification labels. Applied Computing and Geosciences p. 100288 (2025)

2025

[21] [21]

IEEE Transactions on pattern analysis and machine intelligence24(7), 971–987 (2002)

Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence24(7), 971–987 (2002)

2002

[22] [22]

Automation in Construction135, 104144 (2022)

Riedel, H., Mokdad, S., Schulz, I., Kocer, C., Rosendahl, P.L., Schneider, J., Kraus, M.A., Drass, M.: Automated quality control of vacuum insulated glazing by convolutional neural network image classification. Automation in Construction135, 104144 (2022)

2022

[23] [23]

In: International Congress of Ceramic Materiali

Romagnoli, M., Bondioli, F., Barattini, M., et al.: Neural network approach for color matching of ceramic glazes. In: International Congress of Ceramic Materiali. vol. 1, pp. xx–xx. ECERS (2008)

2008

[24] [24]

grabcut

Rother, C., Kolmogorov, V ., Blake, A.: " grabcut" interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG)23(3), 309–314 (2004)

2004

[25] [25]

Journal of Applied Geophysics195, 104480 (2021)

Ruiyi, H., Zhuwen, W., Wenhua, W., Fanghui, X., Xinghua, Q., Yitong, C.: Lithology identification of igneous rocks based on xgboost and conventional logging curves, a case study of the eastern depression of liaohe basin. Journal of Applied Geophysics195, 104480 (2021)

2021

[26] [26]

Integrating Materials and Manufacturing Innovation6(2), 172–186 (2017)

Rumble Jr, J.R.: Accessing materials data: challenges and directions in the digital era. Integrating Materials and Manufacturing Innovation6(2), 172–186 (2017)

2017

[27] [27]

Journal of Manufacturing and Materials Processing9(7), 213 (2025)

Santos, T., Hennetier, L., Costa, V .A., Costa, L.C.: Temperature assessment through decal color in microwave-fired porcelain. Journal of Manufacturing and Materials Processing9(7), 213 (2025)

2025

[28] [28]

Journal of the European Ceramic Society31(5), 659–664 (2011)

Schabbach, L., Bondioli, F., Fredel, M.: Colouring of opaque ceramic glaze with zircon pigments: Formulation with simplified kubelka–munk model. Journal of the European Ceramic Society31(5), 659–664 (2011)

2011

[29] [29]

Dyes and pigments99(3), 1029–1035 (2013)

Schabbach, L., Bondioli, F., Fredel, M.: Color prediction with simplified kubelka–munk model in glazes containing fe2o3–zrsio4 coral pink pigments. Dyes and pigments99(3), 1029–1035 (2013)

2013

[30] [30]

Applied Computing and Geosciences15, 100090 (2022)

Trott, M., Leybourne, M., Hall, L., Layton-Matthews, D.: Random forest rock type classification with integration of geochemical and photographic data. Applied Computing and Geosciences15, 100090 (2022)

2022

[31] [31]

Geochemistry, Geophysics, Geosystems19(4), 1327–1347 (2018) 11 GlazyBench

Ueki, K., Hino, H., Kuwatani, T.: Geochemical discrimination and characteristics of magmatic tectonic settings: A machine-learning-based approach. Geochemistry, Geophysics, Geosystems19(4), 1327–1347 (2018) 11 GlazyBench

2018

[32] [32]

Scientific reports15(1), 31397 (2025)

Vasi´c, M.V ., Awoyera, P.O., Fadugba, O.G., Bariši´c, I., Grubeša, I.N.: Advanced machine learning models for the prediction of ceramic tiles’ properties during the firing stage. Scientific reports15(1), 31397 (2025)

2025

[33] [33]

Electronics14(11), 2185 (2025)

Wang, Y ., Zhang, G.: Lightweight text-to-image generation model based on contrastive language-image pre- training embeddings and conditional variational autoencoders. Electronics14(11), 2185 (2025)

2025

[34] [34]

Sensors20(7), 1834 (2020)

Wei, J., Hao, Y ., Fu, Y ., Yang, L., Gan, J., Li, H.: Experimental study on glaze icing detection of 110 kv composite insulators using fiber bragg gratings. Sensors20(7), 1834 (2020)

2020

[35] [35]

Ceramics International47(23), 32817–32827 (2021)

Wu, B., Zhao, W., Ren, X., Liu, X., Li, B., Feng, S., Feng, X., Zhao, H.: Firing process and colouring mechanism of black glaze and brown glaze porcelains from the yuan and ming dynasties from the qingliang temple kiln in baofeng, henan, china. Ceramics International47(23), 32817–32827 (2021)

2021

[36] [36]

Nanomaterials15(11), 860 (2025)

Xie, Y ., Wang, X.: Prediction of thermal and optical properties of oxyfluoride glasses based on interpretable machine learning. Nanomaterials15(11), 860 (2025)

2025

[37] [37]

Industrial Engineering & Management Systems24(4), 650–662 (2025)

Yamagiwa, A., Goto, M., et al.: An analytical model using cvae-based image generation from product descriptions and image data. Industrial Engineering & Management Systems24(4), 650–662 (2025)

2025

[38] [38]

In: European conference on computer vision

Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2image: Conditional image generation from visual attributes. In: European conference on computer vision. pp. 776–791. Springer (2016)

2016

[39] [39]

Wear477, 203837 (2021)

Zhang, C., Neu, R.W.: Understanding the role of glaze layer with aligned images from multiple surface characteri- zation techniques. Wear477, 203837 (2021)

2021

[40] [40]

Minerals15(9), 923 (2025)

Zhang, P., Xi, X., Wang, B.C.: Geochemical signatures and element interactions of volcanic-hosted agates: Insights from interpretable machine learning. Minerals15(9), 923 (2025)

2025

[41] [41]

npj Materials Degradation4(1), 14 (2020)

Zhang, Y ., Li, A., Deng, B., Hughes, K.K.: Data-driven predictive models for chemical durability of oxide glass under different chemical conditions. npj Materials Degradation4(1), 14 (2020)

2020

[42] [42]

Zhao, L., Zhang, Y .: Revealing the individual effects of firing temperature and chemical composition on raman parameters of celadon glaze. Ceramics6(2), 1263–1276 (2023) 12 GlazyBench A Appendix A: Data Preprocessing Details A.1 Color Annotation Methodology Transparency and surface-texture labels are obtained directly from structured dropdown menus on th...

2023

[43] [43]

The two best-performing models—Random Forest and XGBoost—are retained and combined into an ensemble for downstream color selection

Reference model (ensemble construction).We train and compare four machine-learning models to learn the recipe-to-color mapping from the manually labeled data. The two best-performing models—Random Forest and XGBoost—are retained and combined into an ensemble for downstream color selection

[44] [44]

Let the two predicted candidates be c1,c 2 ∈R 3, and let ¯cpred denote their centroid

RGB-based agreement and selection.The two models independently predict an RGB color. Let the two predicted candidates be c1,c 2 ∈R 3, and let ¯cpred denote their centroid. We compute Euclidean distances dk = ck − ¯cpred 2, k∈ {1,2},and selectarg min k dk. Intuitively, this step prefers the candidate closer to the consensus of the two predictors

[45] [45]

After filtering, 12,175 training samples remain with validated color annotations

Ambiguity filtering.If |d1 −d 2|<10 , the two candidates are considered equally plausible and the sample is marked as ambiguous and discarded. After filtering, 12,175 training samples remain with validated color annotations. Sanity check.All 3,097 samples previously marked asuncertainduring manual curation are removed by the above filtering pipeline, supp...

[46] [46]

Chemical composition (wt.% oxides).All oxide weight percentages larger than 0.01% are listed in the format Oxide: value%(comma-separated), e.g.,SiO2: 45.20%, Al2O3: 12.80%, CaO: 8.50%,

[47] [47]

UMF formula.All UMF entries larger than 0.01 are listed as Oxide: value and prefixed by UMF Formula:

[48] [48]

Otherwise, the field is set to No additional firing parameters available

Firing parameters.If available, we include cone information ( Cone: N orCone Range: N–M) and atmo- sphere (Oxidation or Reduction). Otherwise, the field is set to No additional firing parameters available. C.3 Prompt Design For each task, we use a unified prompt template that supports both zero-shot and few-shot evaluation. The template consists of:

[49] [49]

a role declaration and task instruction

[50] [50]

an explicit, enumerated label set with short descriptions

[51] [51]

domain rules connecting oxides/firing conditions to visual properties

[52] [52]

an optional few-shot block{few_shot_examples}

[53] [53]

the query sample (three input blocks as above)

[54] [54]

For zero-shot evaluation (K= 0 ), the few-shot block is omitted

a strict output constraint:output exactly one label from the allowed set. For zero-shot evaluation (K= 0 ), the few-shot block is omitted. For K-shot evaluation, the block is populated as described in Section C.4. Task-specific instantiations.The three tasks share the same structure but differ in label sets and domain rules: • Transparency (4 classes).Lab...

[55] [55]

Group them by class

Collect training samples that (i) have valid labels for the target task and (ii) contain non-empty chemical composition data. Group them by class

[56] [56]

Classes with no remaining samples are removed from the rotation

Iterate classes in insertion order and draw one example per class in sequence until K examples are obtained. Classes with no remaining samples are removed from the rotation

[57] [57]

This procedure encourages class coverage in-context, ensuring up to min(K,|C|) distinct classes appear in the prompt

Serialize each selected example using the same three-block format as the query, followed by Answer: {label}. This procedure encourages class coverage in-context, ensuring up to min(K,|C|) distinct classes appear in the prompt. This is particularly relevant for imbalanced tasks (e.g., surface texture, whereGlossyaccounts for 49% of samples). Few-shot block...

[58] [58]

Strip leading/trailing whitespace and quotation characters, then extract the first line

[59] [59]

Iterate through the ordered list of valid labels and return the first label whose lowercase form appears as a substring of the lowercase response line

[60] [60]

For multi-word labels (e.g.,Semi-opaque,Satin-matte,Smooth Matte), we accept both hyphenated and space-separated variants

[61] [61]

Outputs that match none of the valid labels are recorded as parsing failures and excluded from metric computation. 18 GlazyBench D Appendix D: Specifications of Image-Generation Baselines This appendix reports the technical specifications of two baseline models for the conditional glaze image generation task (Task D), including the problem formulation, mo...

[62] [62]

Resize to 128×128 using Lanczos resampling

[63] [63]

Normalize pixel values to [−1,1] via (x/255−0.5)/0.5

[64] [64]

Apply random horizontal flipping (probability 0.5) to training images only. 20