pith. sign in

arxiv: 2602.23010 · v3 · submitted 2026-02-26 · 💻 cs.GR · cs.CV

Helmlab: A Two-Space Family of Analytical, Data-Driven Color Spaces for UI Design Systems

Pith reviewed 2026-05-15 19:21 UTC · model grok-4.3

classification 💻 cs.GR cs.CV
keywords color spacecolor differenceUI designperceptual uniformitygradient generationpalette generationCIE XYZ transform
0
0 comments X

The pith

MetricSpace, a 72-parameter analytical color space, cuts color-difference error by 23 percent versus CIEDE2000 on UI-relevant pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Helmlab, a two-space family built on an 11-stage pipeline that maps CIE XYZ into perceptually ordered coordinates using learned matrices, power compression, Fourier hue correction, and Helmholtz-Kohlrausch adjustment. MetricSpace is tuned for distance prediction and records a 23 percent STRESS reduction on the 3,813-pair COMBVD set while remaining invertible to machine precision. GenSpace trades some distance fidelity for stronger performance on gradient and palette generation across sRGB, P3, and Rec.2020. The transforms keep the gray axis neutral to 1e-5 chroma and allow rigid chromatic-plane rotation without changing the metric. Production libraries already ship the spaces for immediate use in design systems.

Core claim

Helmlab supplies two purpose-built color spaces that share the same analytical forward transform from CIE XYZ: MetricSpace (72 parameters) optimized for color-difference prediction and GenSpace (44 parameters) optimized for generation tasks. On the COMBVD benchmark MetricSpace reaches STRESS 22.48 against CIEDE2000’s 29.20; averaged across three primary datasets it scores 21.75 versus the next-best baseline at 35.98. The pipeline includes per-channel compression, Fourier hue correction, embedded lightness adjustment, neutral-axis correction, and an isometry that preserves distances while aligning hue angles. Both spaces are exactly invertible with round-trip error below 1e-13.

What carries the argument

An 11-stage analytical transform from CIE XYZ that chains learned linear matrices, per-channel power compression, Fourier-series hue correction, Helmholtz-Kohlrausch lightness adjustment, neutral-axis correction, and a rigid chromatic-plane rotation.

If this is right

  • Design tools can replace CIEDE2000 with MetricSpace for more accurate contrast and harmony checks without changing existing code paths.
  • Palette and gradient generators can switch to GenSpace to improve smoothness and uniformity across wide-gamut spaces.
  • The shared invertible pipeline allows round-tripping between any of the three spaces with negligible error.
  • A single rotation of the chromatic plane can be applied to align hue angles for specific brand palettes while leaving the distance metric unchanged.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the pipeline is fully analytical, new parameter sets could be derived for other domains such as medical imaging or material appearance without retraining the entire structure.
  • The neutral-axis correction and Fourier hue term may generalize to other perceptually motivated transforms that currently rely on ad-hoc fixes.
  • Exact invertibility opens the possibility of using these spaces as intermediate representations inside color-managed rendering pipelines.

Load-bearing premise

The optimized parameters will continue to perform well on new UI color pairs and screen conditions that were not seen during tuning.

What would settle it

Collect a fresh set of several thousand color-pair judgments under typical UI viewing conditions and measure whether MetricSpace STRESS remains below 24 while CIEDE2000 stays above 29.

Figures

Figures reproduced from arXiv: 2602.23010 by Gorkem Yildiz.

Figure 1
Figure 1. Figure 1: The HELMLAB MetricSpace forward transform pipeline. Blue: linear operations; yellow: nonlinear corrections; green: structural guarantees (NC, rotation). The 72 jointly-optimized parameters are distributed across the eleven stages as shown. (b) nonlinear chroma power (4 params): C → C 1+ε(h) where ε(h) is a 2-harmonic Fourier series, separating high-chroma from low-chroma discrimination; (c) L￾dependent sca… view at source ↗
Figure 3
Figure 3. Figure 3: Predicted vs observed color differences on [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cross-dataset validation on held-out data. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left: neutral ramp step uniformity. Right: [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: GenSpace v0.11.1 vs OKLab — per-category [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: sRGB gamut in HELMLAB a–b plane at three L levels. Points colored by their sRGB values. of ∼1×10−14 across 1,000 random sRGB samples. Integration into Color.js. A pull request adding all four HELMLAB spaces — helmlab-metric, helmgen, helmgenlch (cylindrical GenSpace), and deltaEHelmlab as a registered distance method — was merged into Color.js (PR #722, May 2026). Color.js is the reference JS color library… view at source ↗
read the original abstract

We present Helmlab, a family of two purpose-built color spaces for UI design systems sharing a common 11-stage analytical structure: MetricSpace, a 72-parameter space optimized for color-difference prediction, and GenSpace, a 44-parameter space optimized for gradient and palette generation. The forward transform maps CIE XYZ to a perceptually-organized Lab representation through learned matrices, per-channel power compression, Fourier hue correction, and embedded Helmholtz-Kohlrausch lightness adjustment. A post-pipeline neutral correction holds gray-axis chroma below 1e-5 on a 21-step ramp, and a rigid rotation of the chromatic plane improves hue-angle alignment without affecting the distance metric (which is invariant under isometries). On COMBVD (3,813 color pairs), MetricSpace v21 achieves STRESS 22.48, a 23 percent reduction from CIEDE2000 (29.20). On the held-out MacAdam 1974 dataset it scores 19.51 (CIEDE2000: 22.13; CAM16-UCS leads at 18.71). On a self-collected 3,552-judgement screen-condition set it scores 23.26 vs 62.54 for CIEDE2000. On academic He et al. 2022 (82 3D-printed pairs) MetricSpace scores 35.9 vs CIEDE2000 32.6, a regression we own. Averaging the three primary datasets, MetricSpace scores 21.75 vs the next-best baseline CIECAM02-UCS at 35.98. GenSpace v0.11.1 trades distance accuracy for generation quality: on a 90-metric, 3,038-pair gradient/palette benchmark across sRGB, P3, and Rec.2020, it wins 65 of 90 vs OKLab. The transform is invertible with round-trip errors below 1e-13. Production implementations ship on PyPI, npm, Color.js (PR 722, merged), and as a PostCSS plugin.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Helmlab, a two-space family of analytical color spaces for UI design: MetricSpace (72 parameters) optimized for color-difference prediction via an 11-stage forward transform from CIE XYZ using learned matrices, per-channel powers, Fourier hue terms, and embedded Helmholtz-Kohlrausch adjustment; and GenSpace (44 parameters) optimized for gradient/palette generation. It reports STRESS of 22.48 on COMBVD (23% below CIEDE2000), 19.51 on held-out MacAdam 1974, 23.26 on a self-collected screen set, but 35.9 (worse than CIEDE2000's 32.6) on He et al. 2022; GenSpace wins 65/90 metrics on a 3,038-pair generation benchmark. The transforms are invertible to 1e-13 error with production implementations provided.

Significance. If the gains hold out-of-sample, the work offers practical, invertible color spaces that combine analytical structure with data-driven tuning for UI workflows, supported by shipped code in PyPI, npm, Color.js, and PostCSS. The explicit handling of neutral-axis correction and isometry-invariant distances is a constructive contribution to perceptually organized spaces.

major comments (3)
  1. [Abstract, optimization section] Abstract and optimization section: MetricSpace's 72 parameters (learned matrices, per-channel powers, Fourier hue terms, HK adjustment) are optimized directly on COMBVD (3,813 pairs), the same dataset used to claim the primary 23% STRESS reduction (22.48 vs CIEDE2000 29.20); this in-sample fitting makes the reported superiority dependent on the training distribution rather than demonstrated generalization.
  2. [Results section] Results on held-out and secondary sets: On MacAdam 1974 the gain shrinks to ~12% (19.51 vs 22.13) with CAM16-UCS better at 18.71; on He et al. 2022 MetricSpace regresses to 35.9 vs CIEDE2000 32.6. These outcomes indicate the fitting does not reliably outperform established models on unseen data.
  3. [Methods / optimization section] Parameter fitting protocol: No cross-validation, regularization, or sensitivity analysis is described for the 72/44 parameters; without these, the claim that the analytical structure plus fitting yields robust UI-specific spaces cannot be evaluated.
minor comments (2)
  1. [Transform description] The 11-stage pipeline description would benefit from an explicit equation or diagram numbering each stage (matrix, power, Fourier, HK, neutral correction, rotation) to clarify data flow.
  2. [Generation results] Table or figure presenting the 90-metric generation benchmark should list the exact metrics and per-dataset breakdowns rather than aggregate win count.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, acknowledging the in-sample nature of the primary optimization while pointing to the held-out evaluations already present in the manuscript. We commit to revisions that clarify these points and strengthen the methods description.

read point-by-point responses
  1. Referee: [Abstract, optimization section] Abstract and optimization section: MetricSpace's 72 parameters (learned matrices, per-channel powers, Fourier hue terms, HK adjustment) are optimized directly on COMBVD (3,813 pairs), the same dataset used to claim the primary 23% STRESS reduction (22.48 vs CIEDE2000 29.20); this in-sample fitting makes the reported superiority dependent on the training distribution rather than demonstrated generalization.

    Authors: We agree that the primary STRESS reduction of 23% is reported on the COMBVD dataset used for optimization. The manuscript already reports performance on the held-out MacAdam 1974 set (19.51 vs CIEDE2000 22.13) and the self-collected screen set (23.26 vs 62.54), which provide evidence of generalization to other conditions. The regression on He et al. 2022 is explicitly noted in the paper. In revision we will add explicit language in the abstract and optimization section distinguishing the training set from the held-out evaluations and discuss the implications of in-sample optimization for UI-tuned spaces. revision: partial

  2. Referee: [Results section] Results on held-out and secondary sets: On MacAdam 1974 the gain shrinks to ~12% (19.51 vs 22.13) with CAM16-UCS better at 18.71; on He et al. 2022 MetricSpace regresses to 35.9 vs CIEDE2000 32.6. These outcomes indicate the fitting does not reliably outperform established models on unseen data.

    Authors: The manuscript already presents these exact results transparently, including the regression on He et al. 2022 which we own. On MacAdam 1974 we still improve over CIEDE2000, and the large gain on the screen-condition set supports the UI-specific tuning. We will expand the results discussion to explain dataset differences (e.g., 3D-printed vs. screen stimuli and viewing conditions) and why MetricSpace prioritizes screen/UI performance over uniform outperformance on all academic sets. revision: partial

  3. Referee: [Methods / optimization section] Parameter fitting protocol: No cross-validation, regularization, or sensitivity analysis is described for the 72/44 parameters; without these, the claim that the analytical structure plus fitting yields robust UI-specific spaces cannot be evaluated.

    Authors: We accept this criticism. The revised manuscript will include a dedicated subsection on the fitting protocol, detailing the optimization procedure, any regularization applied, and a sensitivity analysis on key parameters (e.g., matrix entries and power exponents). We will also report cross-validation results on COMBVD splits to quantify robustness. revision: yes

Circularity Check

2 steps flagged

72-parameter optimization on COMBVD and related benchmarks ties primary STRESS claims to in-sample fitting

specific steps
  1. fitted input called prediction [Abstract, paragraph 2]
    "a 72-parameter space optimized for color-difference prediction... On COMBVD (3,813 color pairs), MetricSpace v21 achieves STRESS 22.48, a 23 percent reduction from CIEDE2000 (29.20)."

    The optimization target is color-difference prediction; the primary reported metric is STRESS on COMBVD, which participates in that optimization. The reduction is therefore a measure of how well the fitted parameters reproduce the fitting data rather than an independent prediction on unseen conditions.

  2. fitted input called prediction [Abstract, paragraph 3]
    "GenSpace v0.11.1 trades distance accuracy for generation quality: on a 90-metric, 3,038-pair gradient/palette benchmark across sRGB, P3, and Rec.2020, it wins 65 of 90 vs OKLab."

    The 44 parameters are optimized for gradient and palette generation; the win count is reported on the identical benchmark class, rendering the superiority a direct consequence of the fitting objective.

full rationale

The paper explicitly optimizes MetricSpace's 72 parameters (learned matrices, per-channel powers, Fourier terms, HK adjustment) for color-difference prediction and reports the headline 23% STRESS reduction on COMBVD, the same class of data used in fitting. While MacAdam 1974 is labeled held-out and shows smaller gains, the central result and three-dataset average remain dependent on the fitting target without referenced cross-validation or regularization. GenSpace's 44 parameters follow the same pattern on its gradient/palette benchmark. This matches the fitted_input_called_prediction pattern but does not collapse the entire derivation to a tautology, as the analytical structure (11-stage transform, invertibility) retains independent content.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Central claims depend on extensive data-driven fitting of dozens of parameters plus standard color science transforms; no new physical entities are postulated.

free parameters (2)
  • 72 parameters in MetricSpace
    Learned matrices, per-channel exponents, Fourier hue coefficients, and Helmholtz-Kohlrausch factors fitted to color-difference data.
  • 44 parameters in GenSpace
    Parameters tuned specifically for gradient and palette quality metrics on the 90-metric benchmark.
axioms (2)
  • standard math CIE XYZ tristimulus values serve as the input representation
    Standard starting point from prior colorimetry literature.
  • domain assumption The sequence of matrix, power, Fourier, and rotation steps produces a perceptually organized space
    Assumed to hold after optimization; no independent derivation provided in abstract.

pith-pipeline@v0.9.0 · 5692 in / 1390 out tokens · 26350 ms · 2026-05-15T19:21:56.226122+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Colorimetry,

    CIE, “Colorimetry,” CIE Publication 15, Vi- enna, 1976. Latest revision: CIE 015:2018, ISBN 978-3-902842-13-8.https://cie.co.at/ publications/colorimetry-4th-edition

  2. [2]

    The develop- ment of the CIE 2000 colour-difference formula: CIEDE2000,

    M. R. Luo, G. Cui, and B. Rigg, “The develop- ment of the CIE 2000 colour-difference formula: CIEDE2000,”Color Res. Appl., vol. 26, no. 5, pp. 340–350, 2001. doi:10.1002/col.1049

  3. [3]

    A perceptual color space for image processing,

    B. Ottosson, “A perceptual color space for image processing,” blog post, December 2020.https: //bottosson.github.io/posts/oklab/. Adopted in CSS Color Module 4 (W3C Candidate Recommendation Draft, 2026)

  4. [4]

    Comprehensive color solutions: CAM16, CAT16, and CAM16-UCS,

    C. Li, Z. Li, Z. Wang,et al., “Comprehensive color solutions: CAM16, CAT16, and CAM16-UCS,” Color Res. Appl., vol. 42, no. 6, pp. 703–718, 2017. doi:10.1002/col.22131

  5. [5]

    Development and testing of a color space (IPT) with improved hue uniformity,

    F. Ebner and M. D. Fairchild, “Development and testing of a color space (IPT) with improved hue uniformity,” inProc. IS&T 6th Color Imaging Con- ference (CIC), Scottsdale, AZ, 1998, pp. 8–13

  6. [6]

    Per- ceptually uniform color space for image signals in- cluding high dynamic range and wide gamut,

    M. Safdar, G. Cui, Y . J. Kim, and M. R. Luo, “Per- ceptually uniform color space for image signals in- cluding high dynamic range and wide gamut,”Opt. Express, vol. 25, no. 13, pp. 15131–15151, 2017. doi:10.1364/OE.25.015131

  7. [7]

    Uniform colour spaces based on CIECAM02 colour appearance model,

    M. R. Luo, G. Cui, and C. Li, “Uniform colour spaces based on CIECAM02 colour appearance model,”Color Res. Appl., vol. 31, no. 4, pp. 320– 330, 2006. doi:10.1002/col.20227

  8. [8]

    Colour dif- ference evaluation using large colour differences,

    R. He, G. Cui, T. Zhu, and M. R. Luo, “Colour dif- ference evaluation using large colour differences,” inProc. 30th CIE Session, Ljubljana, 2022

  9. [9]

    Uniform color scales,

    D. L. MacAdam, “Uniform color scales,”J. Opt. Soc. Am., vol. 64, no. 12, pp. 1691–1702, 1974. doi:10.1364/JOSA.64.001691

  10. [10]

    Alimitedmem- ory algorithm for bound constrained optimization

    R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, “A lim- ited memory algorithm for bound constrained op- timization,”SIAM J. Sci. Comput., vol. 16, no. 5, pp. 1190–1208, 1995. doi:10.1137/0916069

  11. [11]

    Evaluation of color difference prediction with CIECAM16 using CIE 2- and 10-degree ob- servers,

    Y . Gao, M. R. Luo, M. R. Pointer, and C. Li, “Evaluation of color difference prediction with CIECAM16 using CIE 2- and 10-degree ob- servers,”J. Imaging Sci. Technol., vol. 67, no. 2, art. no. 020401, 2023

  12. [12]

    Verou and C

    L. Verou and C. Lilley,Color .js: A library for color conversions, manipulation, and difference computation, version 0.5.x, 2026. Project page: https://colorjs.io/; source repository: https://github.com/color-js/color.js; HELMLABintegration merged in PR #722, https://github.com/color-js/color.js/ pull/722, 2026-05-04

  13. [13]

    Yıldız,helmlab: Data-driven analytical color space for perceptual color difference, software package version 0.12.2, 2026

    G. Yıldız,helmlab: Data-driven analytical color space for perceptual color difference, software package version 0.12.2, 2026. PyPI:https:// pypi.org/project/helmlab/; npm:https: 14 //www.npmjs.com/package/helmlab; source: https://github.com/Grkmyldz148/helmlab

  14. [14]

    Yıldız,postcss-helmlab: PostCSS plugin for HELMLABCSS color functions, software pack- age version 0.1.x, 2026.https://www.npmjs

    G. Yıldız,postcss-helmlab: PostCSS plugin for HELMLABCSS color functions, software pack- age version 0.1.x, 2026.https://www.npmjs. com/package/postcss-helmlab; source: https://github.com/Grkmyldz148/helmlab/ tree/main/packages/postcss-helmlab

  15. [15]

    Mansencal et al.,Colour: Science soft- ware for the colour processing community, version 0.4.x, 2026

    T. Mansencal et al.,Colour: Science soft- ware for the colour processing community, version 0.4.x, 2026. Project page:https: //www.colour-science.org/; source reposi- tory:https://github.com/colour-science/ colour; reference implementations of CAM16- UCS, CIECAM02-UCS,J zazbz, DIN99 and IPT delta-E used as canonical baselines in this paper

  16. [16]

    OkLCh gamut clipping,

    B. Ottosson, “OkLCh gamut clipping,” technical note, May 2021.https://bottosson.github. io/posts/gamutclipping/; also seehttps:// bottosson.github.io/posts/colorpicker/ (November 2021) and the public discussion thread athttps://github.com/color-js/color.js/ issues/81(2022–2024). A Parameter Table (MetricSpace v21) Table 6 lists all 72 trained MetricSpace ...