Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case
Pith reviewed 2026-05-24 13:03 UTC · model grok-4.3
The pith
Chinese Youth Subculture colors carry distinct aesthetic and semantic traits that support a multi-modal system for generating matching palettes and colorizing images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct a CYS color dataset that reveals special aesthetic and semantic characteristics different from generic color theory, then develop an interactive multi-modal generative framework to create CYS-styled color palettes that an automatic colorization model applies to images, all demonstrated through a human-in-the-loop demo system and evaluated via user studies.
What carries the argument
The interactive multi-modal generative framework that learns CYS color distributions to produce palettes and the paired automatic colorization model that transfers those palettes onto input images.
Load-bearing premise
The collected CYS color dataset accurately represents the unique aesthetic and semantic characteristics of the subculture and the model can learn to generate culturally appropriate outputs from it.
What would settle it
Blind preference tests in which CYS community members rate the system's palettes and colorized images no higher than those produced by generic color-theory baselines or random sampling.
Figures
read the original abstract
Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In this paper, we contribute to this line of research by first constructing a unique color dataset inspired by a specific culture, i.e., Chinese Youth Subculture (CYS), which is an vibrant and trending cultural group especially for the Gen Z population. We show that the colors used in CYS have special aesthetic and semantic characteristics that are different from generic color theory. We then develop an interactive multi-modal generative framework to create CYS-styled color palettes, which can be used to put a CYS twist on images using our automatic colorization model. Our framework is illustrated via a demo system designed with the human-in-the-loop principle that constantly provides feedback to our algorithms. User studies are also conducted to evaluate our generation results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs a color dataset from Chinese Youth Subculture (CYS) sources, claims these colors exhibit unique aesthetic and semantic properties distinct from generic color theory, develops an interactive multi-modal generative framework for CYS-styled palettes and an automatic colorization model, presents a human-in-the-loop demo system, and evaluates via user studies.
Significance. If the central claims hold, the work would contribute to culturally-aware generative models in computer vision and graphic design by addressing a gap in culture-specific color handling. The multi-modal interactive framework and human-in-the-loop design offer practical value for subculture-targeted applications, with user studies providing qualitative grounding.
major comments (2)
- [Dataset construction and analysis] The core claim that CYS colors have 'special aesthetic and semantic characteristics that are different from generic color theory' is not supported by any quantitative comparison (e.g., statistical tests on HSV distributions, color harmony metrics, or semantic association scores) against generic datasets or standard palettes. This distinction is load-bearing for the motivation and for isolating the framework's cultural appropriateness from generic palette generation.
- [Framework development and evaluation] No details are provided on data collection methodology, model architecture, training procedures, loss functions, or quantitative metrics (e.g., generation quality scores, colorization error, or ablation studies) for the multi-modal generative framework and colorization model. This prevents assessment of whether the framework effectively learns and reproduces the claimed CYS characteristics.
minor comments (1)
- The abstract and high-level description would benefit from explicit section references or a methods overview to clarify where quantitative validation (if present) appears.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments highlight important areas where additional rigor can strengthen the manuscript's claims about cultural specificity and the technical reproducibility of the framework. We address each major comment below and will incorporate revisions to provide the requested quantitative support and methodological details.
read point-by-point responses
-
Referee: [Dataset construction and analysis] The core claim that CYS colors have 'special aesthetic and semantic characteristics that are different from generic color theory' is not supported by any quantitative comparison (e.g., statistical tests on HSV distributions, color harmony metrics, or semantic association scores) against generic datasets or standard palettes. This distinction is load-bearing for the motivation and for isolating the framework's cultural appropriateness from generic palette generation.
Authors: We agree that the current presentation relies primarily on qualitative examples and user-study feedback to illustrate the distinct aesthetic and semantic properties of CYS colors. While these elements support the motivation, we acknowledge the value of quantitative backing. In the revised manuscript we will add direct statistical comparisons (HSV distribution statistics, color harmony metrics, and semantic association scores) against standard generic palettes and datasets to more rigorously substantiate the claimed cultural distinction. revision: yes
-
Referee: [Framework development and evaluation] No details are provided on data collection methodology, model architecture, training procedures, loss functions, or quantitative metrics (e.g., generation quality scores, colorization error, or ablation studies) for the multi-modal generative framework and colorization model. This prevents assessment of whether the framework effectively learns and reproduces the claimed CYS characteristics.
Authors: The manuscript emphasizes the overall interactive multi-modal system and human-in-the-loop demo, with user studies serving as the primary evaluation. We recognize that expanded technical specifications are necessary for reproducibility and assessment. The revision will include explicit descriptions of data collection methodology, model architectures, training procedures, loss functions, quantitative metrics (generation quality, colorization error), and ablation studies to demonstrate how the framework captures and reproduces CYS-specific characteristics. revision: yes
Circularity Check
No significant circularity; derivation is self-contained via new data and standard models
full rationale
The paper collects a new CYS-specific color dataset, performs analysis to identify aesthetic/semantic traits (not defined by construction from the model), and trains a multi-modal generative framework plus colorization model using standard techniques. No equations, fitted parameters renamed as predictions, self-citation load-bearing claims, or ansatz smuggling appear in the provided text. The central claim rests on empirical dataset properties and user studies rather than reducing to prior fitted values or self-referential definitions. This matches the default case of an independent, non-circular contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The historical evolution of youth subculture in China
C. Yan, “The historical evolution of youth subculture in China.” Chinese Youth Social Sciences 38.03, 2019, pp.83–91
work page 2019
-
[2]
X. Wang, “The power of the edge.” Central Academy of Fine Arts, PhD dissertation, 2014
work page 2014
-
[3]
The theory of color and culture
K Zhang, “The theory of color and culture.” Zhejiang University Press, 2017
work page 2017
-
[4]
A comparative study on the color image of buddhism in Korea, Mongolia, China and Japan
M. Sharkhuu and I. K. Choi, “A comparative study on the color image of buddhism in Korea, Mongolia, China and Japan.” 2007
work page 2007
-
[5]
J. H. Kim and Y . Kim, “Instagram user characteristics and the color of their photos: Colorfulness, color diversity, and color harmony,” Inf. Process. Manag., vol. 56, no. 4, pp. 1494–1505, Jul. 2019
work page 2019
-
[6]
Z. Zhe, Q. Wang, and Y . Xing, “Research on big data analysis tech- nology of chinese traditional culture yue embroidery color network,” J. Phys. Conf. Ser., vol. 1345, p. 022021, Nov. 2019
work page 2019
-
[7]
The aim and method of the color image scale,
S. Kobayashi, “The aim and method of the color image scale,” Color Res. Appl., vol. 6, no. 2, pp. 93–107, 1981
work page 1981
-
[8]
Coloring with words: guiding image colorization through text-based palette generation,
H. Bahng et al., “Coloring with words: guiding image colorization through text-based palette generation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 431–447
work page 2018
-
[9]
Stylization-based architecture for fast deep exemplar colorization,
Z. Xu, T. Wang, F. Fang, Y . Sheng, and G. Zhang, “Stylization-based architecture for fast deep exemplar colorization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern RecognitionPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9363–9372
work page 2020
-
[10]
Gray2ColorNet: transfer more colors from reference image,
P. Lu, J. Yu, X. Peng, Z. Zhao, and X. Wang, “Gray2ColorNet: transfer more colors from reference image,” in Proceedings of the 28th ACM International Conference on Multimedia, New York, NY , USA: Association for Computing Machinery, 2020, pp. 3210–3218
work page 2020
-
[11]
Two-Stage sketch colorization with color parsing,
H. Ren, J. Li, and N. Gao, “Two-Stage sketch colorization with color parsing,” IEEE Access, vol. 8, pp. 44599–44610, 2020
work page 2020
-
[12]
Palette- based photo recoloring,
H. Chang, O. Fried, Y . Liu, S. DiVerdi, and A. Finkelstein, “Palette- based photo recoloring,” ACM Trans. Graph., vol. 34, no. 4, p. 139:1- 139:11, Jul. 2015
work page 2015
-
[13]
Image colorization based on texture by using of CNN,
J. Li, H. Xiao, D. Tan, M. Zhang, and Y . Liu, “Image colorization based on texture by using of CNN,” in 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), Jul. 2019, pp. 167–171
work page 2019
-
[14]
Least squares generative adversarial networks,
X. Mao, Q. Li, H. Xie, R. Y . K. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” 2017, pp. 2794–2802
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.