pith. sign in

arxiv: 2602.02513 · v2 · pith:NP4KZSR3new · submitted 2026-01-23 · 💻 cs.LG · cond-mat.mtrl-sci

Learning ORDER-Aware Multimodal Representations for Composite Materials Design

Pith reviewed 2026-05-21 14:32 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.mtrl-sci
keywords multimodal learningcomposite materialsordinal representationslatent space alignmentmaterials designpretrainingimage-tabular data
0
0 comments X

The pith

ORDER uses ordinal-aware multimodal alignment to preserve continuity and enable interpolation in composite material design spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ORDER, a multimodal pretraining framework that makes ordinality central to learning representations from images and tabular data for composite materials. Existing alignment methods work for discrete graph-structured materials like crystals but break down on the continuous, nonlinear design spaces of composites when data is scarce. By pulling materials with similar target properties close together in the latent space, ORDER maintains the smooth nature of those properties and supports interpolation between observed designs. It is tested on nanofiber-reinforced and carbon fiber T700 datasets, where it beats standard alignment and property-aware contrastive baselines on prediction, retrieval, and generation tasks. The framework also adds physics-based surrogate signals so full property labels are not required during pretraining.

Core claim

ORDER establishes ordinality as a core principle for material representations. It ensures that materials with similar target properties occupy nearby regions in the latent space, which preserves the continuous nature of composite properties and enables meaningful interpolation between sparsely observed designs.

What carries the argument

The ordinal-aware image-tabular alignment that enforces proximity in latent space according to similarity in target properties.

If this is right

  • More accurate property prediction on composite datasets with limited labels.
  • Better cross-modal retrieval between microstructure images and tabular descriptors.
  • More coherent microstructure generation by interpolating in the learned space.
  • Pretraining possible with physics-based surrogate signals instead of full annotations.
  • A step toward data-efficient universal multimodal systems for materials design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ordinal constraint might reduce discontinuities when applying latent-space methods to other continuous regression tasks outside materials.
  • Explicit ordinal signals could be tested as a regularizer in design optimization loops for fields that rely on composites.
  • Scaling the surrogate signals to new physics domains would show whether the approach generalizes beyond the two evaluated datasets.

Load-bearing premise

Enforcing ordinality through multimodal alignment will reliably preserve continuous property relationships and enable interpolation without the ordinal signals introducing fitting artifacts under extreme data scarcity.

What would settle it

Generating microstructures from interpolated latent points and finding that the resulting fiber distributions or measured properties deviate systematically from physical expectations on held-out continuous variations.

Figures

Figures reproduced from arXiv: 2602.02513 by Hangwei Qian, Ivor Tsang, Jingjing Li, Lei Zhu, Xinyao Li.

Figure 1
Figure 1. Figure 1: a The pretraining pipeline of ORDER. Step 1, raw composite data images and target properties are obtained by simulating on various descriptors, and organized as pairs. Step 2, paired tabular descriptors and microstructure images are encoded into a shared latent space via dedicated encoders. Step 3, we apply cross-modal contrastive learning between modalities to enforce image-tabular alignment, and ordinal￾… view at source ↗
Figure 2
Figure 2. Figure 2: a Cross-modal retrieval results w.r.t. accuracy with varying number of retrieved candidates (k). ORDER variants consistently outperform vanilla cross-modal contrastive learning (CMCL) and MatMCL. b Examples on top-5 retrieved images given tabular descriptors. The left panel shows query descriptors and the corresponding ground-truth image. The middle panel shows top-5 retrieved examples, with their target p… view at source ↗
Figure 3
Figure 3. Figure 3: Target property prediction performance on Composite and Nanofiber datasets. For multimodal pretraining methods (ORDER, MatMCL, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Descriptor-conditioned microstructure generation. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 2
Figure 2. Figure 2: This dual optimization design distinguishes OR [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of the pretrained multimodal representations using target property ‘Elongation’. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Artificial intelligence has shown remarkable success in materials discovery and property prediction, particularly for crystalline and polymer systems where material properties and structures are dominated by discrete graph representations. Such graph-central paradigm breaks down on composite materials, which possess continuous and nonlinear design spaces. General composite descriptors, e.g., fiber volume and misalignment angle, cannot fully capture the fiber distributions that determine microstructural characteristics, necessitating the integration of heterogeneous data sources through multimodal learning. Existing alignment-oriented frameworks have proven effective on abundant crystal or polymer data under discrete, unique graph-property mapping assumptions, but fail to address the highly continuous composite design space under extreme data scarcity. In this work we introduce ORDinal-aware imagE-tabulaR alignment (ORDER), a multimodal pretraining framework that establishes ordinality as a core principle for material representations. ORDER ensures that materials with similar target properties occupy nearby regions in the latent space, which effectively preserves the continuous nature of composite properties and enables meaningful interpolation between sparsely observed designs. We evaluate ORDER on a Nanofiber-reinforced composite dataset and a carbon fiber T700 dataset. ORDER and its variants outperform both alignment-oriented and customized property-aware contrastive baselines across property prediction, cross-modal retrieval, and microstructure generation tasks. We further introduce physics-based ordinal surrogate signals avoiding the need for full property annotation during pretrain. Our work demonstrates learning continuous multimodal features are fundamental for composite materials, and provides a reliable pathway toward data-efficient universal multimodal intelligent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces ORDinal-aware imagE-tabulaR alignment (ORDER), a multimodal pretraining framework for composite materials. It aligns image and tabular modalities while enforcing ordinal relationships via physics-based surrogate signals (e.g., fiber volume fraction and misalignment angle) to ensure that materials with similar target properties lie nearby in latent space. The central claim is that this ordinality preserves the continuous and nonlinear nature of composite design spaces, enabling meaningful interpolation under extreme data scarcity. ORDER is evaluated on a nanofiber-reinforced composite dataset and a carbon fiber T700 dataset, where it and its variants outperform alignment-oriented and property-aware contrastive baselines on property prediction, cross-modal retrieval, and microstructure generation tasks.

Significance. If the central claims hold, the work addresses a genuine gap in multimodal representation learning for composites, whose continuous design spaces differ from the discrete graph structures common in crystals and polymers. The introduction of physics-based ordinal surrogate signals that avoid full property annotation during pretraining is a concrete strength, as is the explicit focus on interpolation in sparsely observed regimes. This could support more data-efficient universal multimodal systems for materials design.

major comments (3)
  1. [§3.3, Eq. (7)] §3.3, Eq. (7): The combined contrastive-plus-ordinal objective is presented without a derivation or ablation showing that the ordinal ranking term remains subordinate to the multimodal alignment term in the low-data regime; if the ordinal loss dominates, it risks creating artificial plateaus that contradict the claim of smooth interpolation across the continuous composite manifold.
  2. [§4.1, Table 2] §4.1, Table 2: The reported gains on the retrieval task (e.g., Recall@10) are given without error bars, statistical significance tests, or controls for dataset size variation; this weakens the assertion that ORDER reliably outperforms baselines under the extreme scarcity conditions emphasized in the introduction.
  3. [§5.2] §5.2: The microstructure generation results claim faithful interpolation between sparsely observed designs, yet no quantitative check (e.g., property-gradient consistency or Lipschitz continuity of the decoded outputs with respect to the surrogate signals) is supplied to rule out surrogate-induced artifacts.
minor comments (2)
  1. [§1] The acronym ORDER is expanded only in the abstract; repeating the full expansion at first use in §1 would improve readability.
  2. [Figure 3] Figure 3 caption refers to 'latent space visualizations' but does not specify the dimensionality reduction method (t-SNE, UMAP, or PCA) or the color scale for property values.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the presentation of our results without altering the core claims.

read point-by-point responses
  1. Referee: [§3.3, Eq. (7)] §3.3, Eq. (7): The combined contrastive-plus-ordinal objective is presented without a derivation or ablation showing that the ordinal ranking term remains subordinate to the multimodal alignment term in the low-data regime; if the ordinal loss dominates, it risks creating artificial plateaus that contradict the claim of smooth interpolation across the continuous composite manifold.

    Authors: We appreciate the referee's emphasis on the balance between loss terms. The weights in Eq. (7) were chosen via grid search on a small validation split to ensure the contrastive alignment term dominates while the ordinal term provides a soft constraint; this choice was motivated by the physics-based surrogates being lower-dimensional and less noisy than full properties. In the revised manuscript we will add an explicit derivation of the combined objective in §3.3 and include an ablation study (new Table in supplementary material) that varies the ordinal weight across low-data regimes, reporting both retrieval metrics and a simple plateau-detection metric (variance of decoded property gradients along linear interpolations). These additions will demonstrate that the ordinal term remains subordinate and does not induce artificial plateaus. revision: yes

  2. Referee: [§4.1, Table 2] §4.1, Table 2: The reported gains on the retrieval task (e.g., Recall@10) are given without error bars, statistical significance tests, or controls for dataset size variation; this weakens the assertion that ORDER reliably outperforms baselines under the extreme scarcity conditions emphasized in the introduction.

    Authors: We agree that statistical rigor is necessary to support claims under data scarcity. The original Table 2 reported single-run results for brevity. In the revision we will recompute all retrieval metrics over five independent runs with different random seeds, report mean ± standard deviation, add paired t-test p-values against each baseline, and introduce a controlled experiment that subsamples both datasets to identical sizes before training to isolate the effect of scarcity from dataset-size variation. revision: yes

  3. Referee: [§5.2] §5.2: The microstructure generation results claim faithful interpolation between sparsely observed designs, yet no quantitative check (e.g., property-gradient consistency or Lipschitz continuity of the decoded outputs with respect to the surrogate signals) is supplied to rule out surrogate-induced artifacts.

    Authors: The referee correctly notes that qualitative visual inspection alone is insufficient to fully rule out artifacts. While the property-prediction accuracy and cross-modal retrieval results already provide indirect evidence of continuity, we will add two quantitative checks in the revised §5.2: (1) property-gradient consistency measured as the correlation between finite-difference gradients of predicted fiber volume fraction along latent-space interpolations and the corresponding surrogate-signal gradients, and (2) an approximate Lipschitz constant estimated via finite differences on decoded microstructures with respect to the surrogate signals. These metrics will be reported for both ORDER and the strongest baseline to demonstrate that surrogate-induced artifacts are not present. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The ORDER framework is introduced as a multimodal pretraining approach that incorporates ordinality as a design principle using physics-based surrogate signals (fiber volume, misalignment) to address continuous composite spaces under data scarcity. The central claim that similar-property materials occupy nearby latent regions (thereby preserving continuity and enabling interpolation) follows directly from the stated loss construction rather than reducing to a fitted parameter renamed as prediction or a self-citation chain. No equations or sections in the abstract or described full text show a load-bearing step where a 'prediction' equals its input by construction, nor does the paper invoke a uniqueness theorem from prior self-work to forbid alternatives. The surrogates are presented as external physics-based signals avoiding full annotation, providing independent grounding. Evaluation on Nanofiber and T700 datasets with outperformance over baselines further indicates the result is not forced by internal redefinition. This is the common honest non-finding for a methods paper whose core contribution is an architectural choice rather than a derived equality.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that ordinal proximity in latent space directly corresponds to property similarity in continuous composite spaces; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Ordinality in the latent space preserves the continuous nature of composite properties and enables interpolation
    Stated as the core principle of the ORDER framework in the abstract.

pith-pipeline@v0.9.0 · 5796 in / 1137 out tokens · 65678 ms · 2026-05-21T14:32:06.944703+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 3 internal anchors

  1. [1]

    Accelerating materials discovery using artificial intelligence, high performance computing and robotics,

    E. O. Pyzer-Knapp, J. W. Pitera, P . W. Staar, S. Takeda, T. Laino, D. P . Sanders, J. Sexton, J. R. Smith, and A. Curioni, “Accelerating materials discovery using artificial intelligence, high performance computing and robotics,”npj Computational Materials, vol. 8, no. 1, p. 84, 2022

  2. [2]

    Data-driven materials science: status, challenges, and perspectives,

    L. Himanen, A. Geurts, A. S. Foster, and P . Rinke, “Data-driven materials science: status, challenges, and perspectives,”Advanced Science, vol. 6, no. 21, p. 1900808, 2019

  3. [3]

    From dft to machine learning: recent approaches to materials science–a review,

    G. R. Schleder, A. C. Padilha, C. M. Acosta, M. Costa, and A. Fazzio, “From dft to machine learning: recent approaches to materials science–a review,”Journal of Physics: Materials, vol. 2, no. 3, p. 032001, 2019

  4. [4]

    Integrating computational and experimental workflows for accelerated organic materials discov- ery,

    R. L. Greenaway and K. E. Jelfs, “Integrating computational and experimental workflows for accelerated organic materials discov- ery,”Advanced Materials, vol. 33, no. 11, p. 2004831, 2021

  5. [5]

    Experiment design frameworks for accelerated discovery of targeted materials across scales,

    A. Talapatra, S. Boluki, P . Honarmandi, A. Solomou, G. Zhao, S. F. Ghoreishi, A. Molkeri, D. Allaire, A. Srivastava, X. Qian et al., “Experiment design frameworks for accelerated discovery of targeted materials across scales,”Frontiers in Materials, vol. 6, p. 82, 2019

  6. [6]

    Machine learning in materials science,

    J. Wei, X. Chu, X.-Y. Sun, K. Xu, H.-X. Deng, J. Chen, Z. Wei, and M. Lei, “Machine learning in materials science,”InfoMat, vol. 1, no. 3, pp. 338–358, 2019

  7. [7]

    Scientific ai in materials science: a path to a sustainable and scalable paradigm,

    B. L. DeCost, J. R. Hattrick-Simpers, Z. Trautt, A. G. Kusne, E. Campo, and M. Green, “Scientific ai in materials science: a path to a sustainable and scalable paradigm,”Machine learning: science and technology, vol. 1, no. 3, p. 033001, 2020

  8. [8]

    Accelerated data-driven materials science with the materi- als project,

    M. K. Horton, P . Huck, R. X. Yang, J. M. Munro, S. Dwaraknath, A. M. Ganose, R. S. Kingsbury, M. Wen, J. X. Shen, T. S. Mathis et al., “Accelerated data-driven materials science with the materi- als project,”Nature Materials, pp. 1–11, 2025

  9. [9]

    Unified multimodal multidomain polymer representation for property prediction,

    Q. Huang, Y. Li, L. Zhu, Q. Zhao, and W. Yu, “Unified multimodal multidomain polymer representation for property prediction,”npj Computational Materials, vol. 11, no. 1, p. 153, 2025

  10. [10]

    Multimodal foundation models for material property prediction and discov- ery,

    V . Moro, C. Loh, R. Dangovski, A. Ghorashi, A. Ma, Z. Chen, S. Kim, P . Y. Lu, T. Christensen, and M. Solja ˇci´c, “Multimodal foundation models for material property prediction and discov- ery,”Newton, vol. 1, no. 1, 2025

  11. [11]

    Data-driven strategies for accelerated materials design,

    R. Pollice, G. dos Passos Gomes, M. Aldeghi, R. J. Hickman, M. Krenn, C. Lavigne, M. Lindner-D’Addario, A. Nigam, C. T. Ser, Z. Yaoet al., “Data-driven strategies for accelerated materials design,”Accounts of Chemical Research, vol. 54, no. 4, pp. 849–860, 2021

  12. [12]

    Connecting metal-organic frame- work synthesis to applications using multimodal machine learn- ing,

    S. T. Khan and S. M. Moosavi, “Connecting metal-organic frame- work synthesis to applications using multimodal machine learn- ing,”Nature Communications, vol. 16, no. 1, p. 5642, 2025

  13. [13]

    Bidi- rectional quantitative scattering microscopy,

    K. Horie, K. Toda, T. Nakamura, and T. Ideguchi, “Bidi- rectional quantitative scattering microscopy,”arXiv preprint arXiv:2503.14818, 2025

  14. [14]

    Taking three-dimensional x-ray diffrac- tion (3dxrd) from the synchrotron to the laboratory scale,

    S. Oh, Y. Jin, S. Lee, W. Li, K. Geauvreau, M. Williams, R. Drake, and A. Bucsek, “Taking three-dimensional x-ray diffrac- tion (3dxrd) from the synchrotron to the laboratory scale,”Nature Communications, vol. 16, no. 1, p. 3964, 2025

  15. [15]

    Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties,

    T. Xie and J. C. Grossman, “Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties,”Physical review letters, vol. 120, no. 14, p. 145301, 2018

  16. [16]

    A graph representation of molecular ensembles for polymer property prediction,

    M. Aldeghi and C. W. Coley, “A graph representation of molecular ensembles for polymer property prediction,”Chemical Science, vol. 13, no. 35, pp. 10 486–10 498, 2022

  17. [17]

    Synergistic effects of fibre arrangements on the microstructure and properties of organic composite materials,

    F. Makni, A.-L. Cristol, M. Kchaou, Y. Desplanques, and R. Elleuch, “Synergistic effects of fibre arrangements on the microstructure and properties of organic composite materials,”Journal of Compos- ite Materials, vol. 54, no. 29, pp. 4621–4634, 2020

  18. [18]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agar- wal, G. Sastry, A. Askell, P . Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

  19. [19]

    Best of both worlds: Multimodal contrastive learning with tabular and imaging data,

    P . Hager, M. J. Menten, and D. Rueckert, “Best of both worlds: Multimodal contrastive learning with tabular and imaging data,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 924–23 935

  20. [20]

    Tabular insights, visual impacts: transferring expertise from ta- bles to images,

    J.-P . Jiang, H.-J. Ye, L. Wang, Y. Yang, Y. Jiang, and D.-C. Zhan, “Tabular insights, visual impacts: transferring expertise from ta- bles to images,” inForty-first International Conference on Machine Learning, 2024. PREPRINT 14

  21. [21]

    Videoclip: Contrastive pre-training for zero-shot video-text understanding.arXiv preprint arXiv:2109.14084, 2021

    H. Xu, G. Ghosh, P .-Y. Huang, D. Okhonko, A. Aghajanyan, F. Metze, L. Zettlemoyer, and C. Feichtenhofer, “Videoclip: Con- trastive pre-training for zero-shot video-text understanding,” arXiv preprint arXiv:2109.14084, 2021

  22. [22]

    Audioclip: Extending clip to image, text and audio,

    A. Guzhov, F. Raue, J. Hees, and A. Dengel, “Audioclip: Extending clip to image, text and audio,” inICASSP 2022-2022 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 976–980

  23. [23]

    A ver- satile multimodal learning framework bridging multiscale knowl- edge for material design,

    Y. Wu, M. Ding, H. He, Q. Wu, S. Jiang, P . Zhang, and J. Ji, “A ver- satile multimodal learning framework bridging multiscale knowl- edge for material design,”npj Computational Materials, vol. 11, no. 1, p. 276, 2025

  24. [24]

    Multi-task learning with user prefer- ences: Gradient descent with controlled ascent in pareto optimiza- tion,

    D. Mahapatra and V . Rajan, “Multi-task learning with user prefer- ences: Gradient descent with controlled ascent in pareto optimiza- tion,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 6597–6607

  25. [25]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  26. [26]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  27. [27]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y. Shen, P . Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.”ICLR, vol. 1, no. 2, p. 3, 2022

  28. [28]

    Generation of 3d representative volume elements for heterogeneous materials: A review,

    S. Bargmann, B. Klusemann, J. Markmann, J. E. Schnabel, K. Schneider, C. Soyarslan, and J. Wilmers, “Generation of 3d representative volume elements for heterogeneous materials: A review,”Progress in materials science, vol. 96, pp. 322–384, 2018

  29. [29]

    Rank-n-contrast: learning continuous representations for regression,

    K. Zha, P . Cao, J. Son, Y. Yang, and D. Katabi, “Rank-n-contrast: learning continuous representations for regression,”Advances in Neural Information Processing Systems, vol. 36, pp. 17 882–17 903, 2023

  30. [30]

    Revis- iting deep learning models for tabular data,

    Y. Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko, “Revis- iting deep learning models for tabular data,”Advances in neural information processing systems, vol. 34, pp. 18 932–18 943, 2021

  31. [31]

    Xgboost: A scalable tree boosting system,

    T. Chen, “Xgboost: A scalable tree boosting system,”Cornell Uni- versity, 2016

  32. [32]

    Accurate predictions on small data with a tabular foundation model,

    N. Hollmann, S. M ¨uller, L. Purucker, A. Krishnakumar, M. K¨orfer, S. B. Hoo, R. T. Schirrmeister, and F. Hutter, “Accurate predictions on small data with a tabular foundation model,”Nature, vol. 637, no. 8045, pp. 319–326, 2025

  33. [33]

    Catboost: unbiased boosting with categorical features,

    L. Prokhorenkova, G. Gusev, A. Vorobev, A. V . Dorogush, and A. Gulin, “Catboost: unbiased boosting with categorical features,” Advances in neural information processing systems, vol. 31, 2018

  34. [34]

    Lightgbm: A highly efficient gradient boosting decision tree,

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,”Advances in neural information processing systems, vol. 30, 2017

  35. [35]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    A. Ramesh, P . Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hier- archical text-conditional image generation with clip latents,”arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3, 2022

  36. [36]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium,

    M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochre- iter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,”Advances in neural information processing systems, vol. 30, 2017

  37. [37]

    De- mystifying mmd gans,

    M. Bi ´nkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “De- mystifying mmd gans,” inInternational Conference on Learning Representations, 2018

  38. [38]

    The unreasonable effectiveness of deep features as a perceptual metric,

    R. Zhang, P . Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

  39. [39]

    Improved techniques for training gans,

    T. Salimans, I. Goodfellow, W. Zaremba, V . Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,”Advances in neural information processing systems, vol. 29, 2016

  40. [40]

    Scope of validity of psnr in image/video quality assessment,

    Q. Huynh-Thu and M. Ghanbari, “Scope of validity of psnr in image/video quality assessment,”Electronics letters, vol. 44, no. 13, pp. 800–801, 2008

  41. [41]

    Visualizing data using t-sne,

    L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008

  42. [42]

    Contrastive learning of medical visual representations from paired images and text,

    Y. Zhang, H. Jiang, Y. Miura, C. D. Manning, and C. P . Lan- glotz, “Contrastive learning of medical visual representations from paired images and text,” inMachine learning for healthcare conference. PMLR, 2022, pp. 2–25

  43. [43]

    Unified modality sep- aration: A vision-language framework for unsupervised domain adaptation,

    X. Li, J. Li, Z. Du, L. Zhu, and H. T. Shen, “Unified modality sep- aration: A vision-language framework for unsupervised domain adaptation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  44. [44]

    Split to merge: Unifying separated modalities for unsupervised domain adaptation,

    X. Li, Y. Li, Z. Du, F. Li, K. Lu, and J. Li, “Split to merge: Unifying separated modalities for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23 364–23 374

  45. [45]

    Adam: A Method for Stochastic Optimization

    D. P . Kingma, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

  46. [46]

    Pytorch: An im- perative style, high-performance deep learning library,

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antigaet al., “Pytorch: An im- perative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019

  47. [47]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020