pith. sign in

arxiv: 2102.05231 · v1 · submitted 2021-02-10 · 💻 cs.CV · cs.AI

Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case

Pith reviewed 2026-05-24 13:03 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords Chinese Youth Subculturecolor palette generationimage colorizationmulti-modal generationcultural colorinteractive frameworkGen Z aestheticssubculture dataset
0
0 comments X

The pith

Chinese Youth Subculture colors carry distinct aesthetic and semantic traits that support a multi-modal system for generating matching palettes and colorizing images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first builds a color dataset drawn from Chinese Youth Subculture imagery and shows that the colors exhibit aesthetic and semantic patterns unlike those in standard color theory. It then presents an interactive multi-modal generative framework that produces CYS-styled palettes and feeds them into an automatic colorization model to apply the style to new images. A human-in-the-loop demo system gathers ongoing feedback while user studies assess the cultural fit of the outputs. This work matters because most algorithmic color tools treat color as culturally neutral, yet the authors argue that subcultural context can be learned and reproduced directly from data.

Core claim

The authors construct a CYS color dataset that reveals special aesthetic and semantic characteristics different from generic color theory, then develop an interactive multi-modal generative framework to create CYS-styled color palettes that an automatic colorization model applies to images, all demonstrated through a human-in-the-loop demo system and evaluated via user studies.

What carries the argument

The interactive multi-modal generative framework that learns CYS color distributions to produce palettes and the paired automatic colorization model that transfers those palettes onto input images.

Load-bearing premise

The collected CYS color dataset accurately represents the unique aesthetic and semantic characteristics of the subculture and the model can learn to generate culturally appropriate outputs from it.

What would settle it

Blind preference tests in which CYS community members rate the system's palettes and colorized images no higher than those produced by generic color-theory baselines or random sampling.

Figures

Figures reproduced from arXiv: 2102.05231 by Harry Jiannan Wang, Jinggang Zhuo, Ling Fan, Yufan Li.

Figure 1
Figure 1. Figure 1: Examples of ‘red with green’ designs in Chinese youth subculture [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CYS dataset examples text in the image also has effect on the color, e.g., the text of the first poster is “flower blossoms on a river bank”, which implies the flower is a blossoming bright red flower on a fresh green bank. There is also additional text description of the poster that influences the CYS style design as we discuss in our dataset later. In addition, the context of the image also correlate wit… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison between CYS dataset and PAT dataset [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The training structure of our framework resulting in 2535 unique Chinese adjectives, nouns, and verbs. We also collected the categories data from the web pages, includes 14 categories, such as punk, hiphop, techno, etc. For each image, we first extract 10 colors using existing clustering based algorithm and then let designer select 5 colors to form the color palette according to the following rules: 1) rem… view at source ↗
Figure 5
Figure 5. Figure 5: The demo system As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation of multi-modility (a) different text with controlled image [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: User study examples We design a user study to evaluate from the second per￾spective. We choose 20 keywords and search the Adobe Color website to get the cultural neutral color palettes. Then, we generate our color palette using the keywords, 20 cultural neutral images that are from a graphic design website and category “indie” to form the color palette pairs. Then, we use the generated color palettes to co… view at source ↗
read the original abstract

Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In this paper, we contribute to this line of research by first constructing a unique color dataset inspired by a specific culture, i.e., Chinese Youth Subculture (CYS), which is an vibrant and trending cultural group especially for the Gen Z population. We show that the colors used in CYS have special aesthetic and semantic characteristics that are different from generic color theory. We then develop an interactive multi-modal generative framework to create CYS-styled color palettes, which can be used to put a CYS twist on images using our automatic colorization model. Our framework is illustrated via a demo system designed with the human-in-the-loop principle that constantly provides feedback to our algorithms. User studies are also conducted to evaluate our generation results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper constructs a color dataset from Chinese Youth Subculture (CYS) sources, claims these colors exhibit unique aesthetic and semantic properties distinct from generic color theory, develops an interactive multi-modal generative framework for CYS-styled palettes and an automatic colorization model, presents a human-in-the-loop demo system, and evaluates via user studies.

Significance. If the central claims hold, the work would contribute to culturally-aware generative models in computer vision and graphic design by addressing a gap in culture-specific color handling. The multi-modal interactive framework and human-in-the-loop design offer practical value for subculture-targeted applications, with user studies providing qualitative grounding.

major comments (2)
  1. [Dataset construction and analysis] The core claim that CYS colors have 'special aesthetic and semantic characteristics that are different from generic color theory' is not supported by any quantitative comparison (e.g., statistical tests on HSV distributions, color harmony metrics, or semantic association scores) against generic datasets or standard palettes. This distinction is load-bearing for the motivation and for isolating the framework's cultural appropriateness from generic palette generation.
  2. [Framework development and evaluation] No details are provided on data collection methodology, model architecture, training procedures, loss functions, or quantitative metrics (e.g., generation quality scores, colorization error, or ablation studies) for the multi-modal generative framework and colorization model. This prevents assessment of whether the framework effectively learns and reproduces the claimed CYS characteristics.
minor comments (1)
  1. The abstract and high-level description would benefit from explicit section references or a methods overview to clarify where quantitative validation (if present) appears.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important areas where additional rigor can strengthen the manuscript's claims about cultural specificity and the technical reproducibility of the framework. We address each major comment below and will incorporate revisions to provide the requested quantitative support and methodological details.

read point-by-point responses
  1. Referee: [Dataset construction and analysis] The core claim that CYS colors have 'special aesthetic and semantic characteristics that are different from generic color theory' is not supported by any quantitative comparison (e.g., statistical tests on HSV distributions, color harmony metrics, or semantic association scores) against generic datasets or standard palettes. This distinction is load-bearing for the motivation and for isolating the framework's cultural appropriateness from generic palette generation.

    Authors: We agree that the current presentation relies primarily on qualitative examples and user-study feedback to illustrate the distinct aesthetic and semantic properties of CYS colors. While these elements support the motivation, we acknowledge the value of quantitative backing. In the revised manuscript we will add direct statistical comparisons (HSV distribution statistics, color harmony metrics, and semantic association scores) against standard generic palettes and datasets to more rigorously substantiate the claimed cultural distinction. revision: yes

  2. Referee: [Framework development and evaluation] No details are provided on data collection methodology, model architecture, training procedures, loss functions, or quantitative metrics (e.g., generation quality scores, colorization error, or ablation studies) for the multi-modal generative framework and colorization model. This prevents assessment of whether the framework effectively learns and reproduces the claimed CYS characteristics.

    Authors: The manuscript emphasizes the overall interactive multi-modal system and human-in-the-loop demo, with user studies serving as the primary evaluation. We recognize that expanded technical specifications are necessary for reproducibility and assessment. The revision will include explicit descriptions of data collection methodology, model architectures, training procedures, loss functions, quantitative metrics (generation quality, colorization error), and ablation studies to demonstrate how the framework captures and reproduces CYS-specific characteristics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via new data and standard models

full rationale

The paper collects a new CYS-specific color dataset, performs analysis to identify aesthetic/semantic traits (not defined by construction from the model), and trains a multi-modal generative framework plus colorization model using standard techniques. No equations, fitted parameters renamed as predictions, self-citation load-bearing claims, or ansatz smuggling appear in the provided text. The central claim rests on empirical dataset properties and user studies rather than reducing to prior fitted values or self-referential definitions. This matches the default case of an independent, non-circular contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No specific free parameters, axioms, or invented entities can be identified from the abstract alone; full manuscript would be required to audit modeling choices or data assumptions.

pith-pipeline@v0.9.0 · 5698 in / 1206 out tokens · 77213 ms · 2026-05-24T13:03:04.114904+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    The historical evolution of youth subculture in China

    C. Yan, “The historical evolution of youth subculture in China.” Chinese Youth Social Sciences 38.03, 2019, pp.83–91

  2. [2]

    The power of the edge

    X. Wang, “The power of the edge.” Central Academy of Fine Arts, PhD dissertation, 2014

  3. [3]

    The theory of color and culture

    K Zhang, “The theory of color and culture.” Zhejiang University Press, 2017

  4. [4]

    A comparative study on the color image of buddhism in Korea, Mongolia, China and Japan

    M. Sharkhuu and I. K. Choi, “A comparative study on the color image of buddhism in Korea, Mongolia, China and Japan.” 2007

  5. [5]

    Instagram user characteristics and the color of their photos: Colorfulness, color diversity, and color harmony,

    J. H. Kim and Y . Kim, “Instagram user characteristics and the color of their photos: Colorfulness, color diversity, and color harmony,” Inf. Process. Manag., vol. 56, no. 4, pp. 1494–1505, Jul. 2019

  6. [6]

    Research on big data analysis tech- nology of chinese traditional culture yue embroidery color network,

    Z. Zhe, Q. Wang, and Y . Xing, “Research on big data analysis tech- nology of chinese traditional culture yue embroidery color network,” J. Phys. Conf. Ser., vol. 1345, p. 022021, Nov. 2019

  7. [7]

    The aim and method of the color image scale,

    S. Kobayashi, “The aim and method of the color image scale,” Color Res. Appl., vol. 6, no. 2, pp. 93–107, 1981

  8. [8]

    Coloring with words: guiding image colorization through text-based palette generation,

    H. Bahng et al., “Coloring with words: guiding image colorization through text-based palette generation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 431–447

  9. [9]

    Stylization-based architecture for fast deep exemplar colorization,

    Z. Xu, T. Wang, F. Fang, Y . Sheng, and G. Zhang, “Stylization-based architecture for fast deep exemplar colorization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern RecognitionPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9363–9372

  10. [10]

    Gray2ColorNet: transfer more colors from reference image,

    P. Lu, J. Yu, X. Peng, Z. Zhao, and X. Wang, “Gray2ColorNet: transfer more colors from reference image,” in Proceedings of the 28th ACM International Conference on Multimedia, New York, NY , USA: Association for Computing Machinery, 2020, pp. 3210–3218

  11. [11]

    Two-Stage sketch colorization with color parsing,

    H. Ren, J. Li, and N. Gao, “Two-Stage sketch colorization with color parsing,” IEEE Access, vol. 8, pp. 44599–44610, 2020

  12. [12]

    Palette- based photo recoloring,

    H. Chang, O. Fried, Y . Liu, S. DiVerdi, and A. Finkelstein, “Palette- based photo recoloring,” ACM Trans. Graph., vol. 34, no. 4, p. 139:1- 139:11, Jul. 2015

  13. [13]

    Image colorization based on texture by using of CNN,

    J. Li, H. Xiao, D. Tan, M. Zhang, and Y . Liu, “Image colorization based on texture by using of CNN,” in 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), Jul. 2019, pp. 167–171

  14. [14]

    Least squares generative adversarial networks,

    X. Mao, Q. Li, H. Xie, R. Y . K. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” 2017, pp. 2794–2802