pith. sign in

arxiv: 1907.09754 · v1 · pith:IRQRVVYJnew · submitted 2019-07-23 · 💻 cs.CV

Controlling biases and diversity in diverse image-to-image translation

Pith reviewed 2026-05-24 17:42 UTC · model grok-4.3

classification 💻 cs.CV
keywords image-to-image translationdiverse translationbiassemantic constraintsunpaired learningface translationobject translation
0
0 comments X

The pith

Semantic constraints reduce unwanted biases in diverse image-to-image translation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies bias in diverse unpaired image-to-image translation, where models trained on skewed datasets introduce extra unwanted alterations such as shifts in gender or race when translating faces. It proposes semantic constraints that force the model to keep specific image properties fixed during the translation. These constraints are intended to cut the undesired changes while leaving the intended domain shift and the range of possible outputs intact. Tests on several heavily biased collections covering faces, objects and scenes indicate that the constraints achieve this separation of wanted and unwanted effects.

Core claim

By imposing semantic constraints that enforce the preservation of desired image properties, the model produces translations with fewer unwanted changes while still performing the wanted transformation, serving as a step towards unbiased diverse image-to-image translation.

What carries the argument

Semantic constraints that enforce preservation of desired image properties while sampling different style codes in a disentangled content-style latent space.

Load-bearing premise

That the observed biases arise primarily from the visual distribution of the target domain and that semantic constraints can be imposed without degrading translation quality or diversity.

What would settle it

A quantitative test on the face datasets in which the constrained model still produces measurable gender or race shifts at rates comparable to the unconstrained baseline would falsify the claim that the constraints successfully limit unwanted changes.

Figures

Figures reproduced from arXiv: 1907.09754 by Abel Gonzalez-Garcia, Joost Van De Weijer, Luis Herranz, Yaxing Wang.

Figure 1
Figure 1. Figure 1: Diverse image-to-image translation in a very biased set￾ting (domain A: mostly white males without makeup, domain B: white females with makeup): (a) biased translations, (b) with semantic constraint to alleviate bias while keeping rele￾vant diversity. possible to design better and more balanced datasets, or at least understand the related biases, their nature and try to incorporate tools to alleviate them … view at source ↗
Figure 2
Figure 2. Figure 2: Examples of training sets for image translation: (a) paired edge-photo, (b) unpaired young-old (well-aligned bi￾ases), and (c) unpaired without-with makeup (misaligned in gender). generative models, Mathieu et al. [30] combined a GAN with a Variational Autoencoder (VAE) to obtain an in￾ternal representation that is disentangled across specified (e.g. labels) and unspecified factors. InfoGAN [9] achieves so… view at source ↗
Figure 3
Figure 3. Figure 3: Geometric interpretation of the semantic constraint unbiasing [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Diverse image-to-image translation (DIT): (a) biased, (b) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Robustness to bias on Biased makeup: (left) misclassification rate, (middle) drop in confidence, (right) ID distance. Input Direction MUNIT +PI DRIT UDIT UDIT+PI M Makeup 0.268 0.267 0.263 0.192 0.151 F Makeup 0.212 0.199 0.193 0.154 0.133 F Demakeup 0.297 0.293 0.253 0.208 0.203 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example translations for Biased makeup when applying makeup to a male. UDIT uses identity as semantic constraint. than preserving gender, and implicitly also preserves it. For this reason, we use a semantic constraint based on identity (ID). We consider an off-the shelf network for face recognition [31] and select its highest level convolutional features as semantic feature. The model has been trained with… view at source ↗
Figure 7
Figure 7. Figure 7: Example translations on MORPH by biased DIT methods (MUNIT/DRIT) and our UDIT with semantic constraint on identity. ate over the gender that is underrepresented in the target domain. These results confirm the trends observed quali￾tatively in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Robustness to bias on MORPH: (a)young to old and (b)old to young: (left) misclassification rate, (middle) drop in confidence, and (right) ID distance [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Robustness to bias in terms of misclassification rate and drop in confidence . In the case of the young female, gender is almost always changed due to the extreme bias towards males. UDIT, on the other hand, preserves the wanted semantic properties and outputs diversity without unwanted changes. Robustness to unwanted changes. Here we evaluate how the identity constraint impacts gender and ethnicity change… view at source ↗
Figure 7
Figure 7. Figure 7: 6.6. Cityscapes → Synthia-night Semantic constraint. We train a binary classifier for daytime classification based on VGG16 [38] using both real and synthetic images. We use 6000 realistic images from BDD-100K [44] with a 50/50 daytime distribution. As synthetic images we use 6000 images from a disjoint sub￾set of Synthia [36], also with a balanced class distribution. We consider two semantic constraints. … view at source ↗
Figure 10
Figure 10. Figure 10: Results on Cityscapes → Synthia-night. Example translations by MUNIT and UDIT with two variants of the semantic constraint. increases the required dimensionality on the semantic fea￾tures. Results [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Robustness to bias on Biased handbags [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Example translations for Handbags-texture (left) and Handbags-color (right). Better viewed electronically, zoom might be necessary to appreciate the changes in texture. [5] Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D., 2017. Unsupervised pixel-level domain adaptation with gen￾erative adversarial networks, in: Proceedings of the IEEE Con￾ference on Computer Vision and Pattern Recognitio… view at source ↗
read the original abstract

The task of unpaired image-to-image translation is highly challenging due to the lack of explicit cross-domain pairs of instances. We consider here diverse image translation (DIT), an even more challenging setting in which an image can have multiple plausible translations. This is normally achieved by explicitly disentangling content and style in the latent representation and sampling different styles codes while maintaining the image content. Despite the success of current DIT models, they are prone to suffer from bias. In this paper, we study the problem of bias in image-to-image translation. Biased datasets may add undesired changes (e.g. change gender or race in face images) to the output translations as a consequence of the particular underlying visual distribution in the target domain. In order to alleviate the effects of this problem we propose the use of semantic constraints that enforce the preservation of desired image properties. Our proposed model is a step towards unbiased diverse image-to-image translation (UDIT), and results in less unwanted changes in the translated images while still performing the wanted transformation. Experiments on several heavily biased datasets show the effectiveness of the proposed techniques in different domains such as faces, objects, and scenes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper addresses the problem of bias in diverse unpaired image-to-image translation (DIT), where target-domain statistics can induce unwanted attribute changes (e.g., gender or race in face images). It proposes adding semantic constraints to existing disentanglement-based DIT models to enforce preservation of desired properties, yielding a model for unbiased diverse image-to-image translation (UDIT) that reduces such changes while retaining the intended translation and output diversity. The abstract states that experiments on multiple heavily biased datasets across faces, objects, and scenes demonstrate the effectiveness of the approach.

Significance. If the semantic-constraint mechanism can be shown to measurably reduce unwanted attribute shifts without loss of translation fidelity or diversity, the work would provide a practical extension of disentanglement techniques that directly targets fairness issues in generative vision models. The absence of any reported metrics, however, prevents evaluation of whether the central claim holds.

major comments (1)
  1. [Abstract] Abstract: the claim that 'experiments on several heavily biased datasets show the effectiveness of the proposed techniques' supplies no quantitative results, ablation studies, error analysis, or even example metrics (e.g., attribute classification accuracy before/after, diversity scores such as LPIPS, or FID). Without such evidence the data-to-claim link cannot be assessed and the central assertion that unwanted changes are reduced while wanted transformations and diversity are preserved remains unevaluated.
minor comments (1)
  1. [Abstract] Abstract: the acronym 'UDIT' is introduced before its expansion ('unbiased diverse image-to-image translation') is given, which is a minor clarity issue.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the major comment below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'experiments on several heavily biased datasets show the effectiveness of the proposed techniques' supplies no quantitative results, ablation studies, error analysis, or even example metrics (e.g., attribute classification accuracy before/after, diversity scores such as LPIPS, or FID). Without such evidence the data-to-claim link cannot be assessed and the central assertion that unwanted changes are reduced while wanted transformations and diversity are preserved remains unevaluated.

    Authors: We agree that the abstract would be strengthened by including key quantitative metrics. The full manuscript reports attribute classification accuracies (to quantify reduction in unwanted bias-induced changes), LPIPS scores (for diversity), and FID (for translation quality) in the experiments section across the evaluated datasets. We will revise the abstract to explicitly cite representative results (e.g., X% reduction in attribute shift with no loss in LPIPS or FID) while preserving the overall length and readability. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes extending existing diverse I2I translation frameworks (e.g., via disentanglement of content/style) with an added semantic constraint term to reduce unwanted attribute shifts induced by target-domain statistics. The central claim rests on experimental validation across biased datasets rather than any closed-form derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, uniqueness theorems, or ansatzes are described that reduce the method to its inputs by construction; the approach is presented as an empirical extension with external constraints, consistent with the reader's assessment of score 1.0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are identifiable from the provided text. The approach implicitly relies on standard assumptions of disentangled latent representations in GAN-based translation models.

pith-pipeline@v0.9.0 · 5736 in / 1095 out tokens · 33737 ms · 2026-05-24T17:42:34.827855+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

  1. [1]

    Augmented cyclegan: Learning many-to- many mappings from unpaired data, in: International Confer- ence on Machine Learning

    Almahairi, A., Rajeswar, S., Sordoni, A., Bachman, P., Courville, A., 2018. Augmented cyclegan: Learning many-to- many mappings from unpaired data, in: International Confer- ence on Machine Learning

  2. [2]

    Segnet: A deep convolutional encoder-decoder architecture for image segmentation

    Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence

  3. [3]

    Representation learning: A review and new perspectives

    Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence

  4. [4]

    Pros and cons of gan evaluation measures

    Borji, A., 2019. Pros and cons of gan evaluation measures. Computer Vision and Image Understanding 179, 41–65. 11 Fig. 11: Robustness to bias on Biased handbags. Fig. 12: Example translations for Handbags-texture (left) and Handbags-color (right). Better viewed electronically, zoom might be necessary to appreciate the changes in texture

  5. [5]

    Unsupervised pixel-level domain adaptation with gen- erative adversarial networks, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition

    Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D., 2017. Unsupervised pixel-level domain adaptation with gen- erative adversarial networks, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition

  6. [6]

    Domain separation networks, in: Advances in Neural Information Processing Systems

    Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Er- han, D., 2016. Domain separation networks, in: Advances in Neural Information Processing Systems

  7. [7]

    Learn to synthesize and synthesize to learn

    Bozorgtabar, B., Rad, M.S., Ekenel, H.K., Thiran, J.P., 2019. Learn to synthesize and synthesize to learn. Computer Vision and Image Understanding

  8. [8]

    Gender shades: Intersec- tional accuracy disparities in commercial gender classification, in: Conference on Fairness, Accountability and Transparency, pp

    Buolamwini, J., Gebru, T., 2018. Gender shades: Intersec- tional accuracy disparities in commercial gender classification, in: Conference on Fairness, Accountability and Transparency, pp. 77–91

  9. [9]

    Infogan: Interpretable representation learn- ing by information maximizing generative adversarial nets, in: Advances in Neural Information Processing Systems

    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P., 2016. Infogan: Interpretable representation learn- ing by information maximizing generative adversarial nets, in: Advances in Neural Information Processing Systems

  10. [10]

    The cityscapes dataset for semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B., 2016. The cityscapes dataset for semantic urban scene understanding, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  11. [11]

    Frustratingly easy domain adaptation

    Daum´ e III, H., 2007. Frustratingly easy domain adaptation. Proceedings of the Annual Meeting of the Association of Com- putational Linguistics

  12. [12]

    Unbiased metric learn- ing: On the utilization of multiple datasets and web images for softening bias, in: Proceedings of the International Conference on Computer Vision, pp

    Fang, C., Xu, Y., Rockmore, D.N., 2013. Unbiased metric learn- ing: On the utilization of multiple datasets and web images for softening bias, in: Proceedings of the International Conference on Computer Vision, pp. 1657–1664

  13. [13]

    Unsupervised domain adapta- tion by backpropagation, in: International Conference on Ma- chine Learning

    Ganin, Y., Lempitsky, V., 2015. Unsupervised domain adapta- tion by backpropagation, in: International Conference on Ma- chine Learning

  14. [14]

    Image- to-image translation for cross-domain disentanglement, in: Ad- vances in Neural Information Processing Systems

    Gonzalez-Garcia, A., van de Weijer, J., Bengio, Y., 2018. Image- to-image translation for cross-domain disentanglement, in: Ad- vances in Neural Information Processing Systems

  15. [15]

    Generative adversarial nets, in: Advances in Neural Information Processing Systems

    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde- Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adversarial nets, in: Advances in Neural Information Processing Systems

  16. [16]

    Women also snowboard: Overcoming bias in cap- tioning models, in: Proceedings of the European Conference on Computer Vision, Springer

    Hendricks, L.A., Burns, K., Saenko, K., Darrell, T., Rohrbach, A., 2018. Women also snowboard: Overcoming bias in cap- tioning models, in: Proceedings of the European Conference on Computer Vision, Springer. pp. 793–811

  17. [17]

    Scene recognition with cnns: objects, scales and dataset bias, in: Proceedings of the IEEE 12 Conference on Computer Vision and Pattern Recognition, pp

    Herranz, L., Jiang, S., Li, X., 2016. Scene recognition with cnns: objects, scales and dataset bias, in: Proceedings of the IEEE 12 Conference on Computer Vision and Pattern Recognition, pp. 571–579

  18. [18]

    Howard, A., Zhang, C., Horvitz, E., 2017. Addressing bias in machine learning algorithms: A pilot study on emotion recog- nition for intelligent systems, in: 2017 IEEE Workshop on Ad- vanced Robotics and its Social Impacts (ARSO), IEEE. pp. 1–7

  19. [19]

    Multimodal unsupervised image-to-image translation

    Huang, X., Liu, M.Y., Belongie, S., Kautz, J., 2018. Multimodal unsupervised image-to-image translation. Proceedings of the European Conference on Computer Vision

  20. [20]

    Image-to-image translation with conditional adversarial networks, in: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition

    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A., 2017. Image-to-image translation with conditional adversarial networks, in: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition

  21. [21]

    Identifying and Correcting Label Bias in Machine Learning

    Jiang, H., Nachum, O., 2019. Identifying and correcting label bias in machine learning. arXiv preprint arXiv:1901.04966

  22. [22]

    Undoing the damage of dataset bias, in: Proceedings of the European Conference on Computer Vision, Springer

    Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A., 2012. Undoing the damage of dataset bias, in: Proceedings of the European Conference on Computer Vision, Springer. pp. 158–171

  23. [23]

    Learning to discover cross-domain relations with generative adversarial networks

    Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J., 2017. Learning to discover cross-domain relations with generative adversarial networks. International Conference on Machine Learning

  24. [24]

    Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.,

  25. [25]

    Proceedings of the European Conference on Com- puter Vision

    Diverse image-to-image translation via disentangled rep- resentations. Proceedings of the European Conference on Com- puter Vision

  26. [26]

    Automotive radar and camera fusion using generative adversarial networks

    Lekic, V., Babic, Z., 2019. Automotive radar and camera fusion using generative adversarial networks. Computer Vision and Image Understanding doi: 10.1016/j.cviu.2019.04.002

  27. [27]

    Age and gender classification us- ing convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Work- shops, pp

    Levi, G., Hassner, T., 2015. Age and gender classification us- ing convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Work- shops, pp. 34–42

  28. [28]

    Unsupervised image-to- image translation networks, in: Advances in Neural Information Processing Systems, pp

    Liu, M.Y., Breuel, T., Kautz, J., 2017. Unsupervised image-to- image translation networks, in: Advances in Neural Information Processing Systems, pp. 700–708

  29. [29]

    Exploiting unlabeled data in cnns by self-supervised learning to rank

    Liu, X., Van De Weijer, J., Bagdanov, A.D., 2019. Exploiting unlabeled data in cnns by self-supervised learning to rank. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 1862–1878

  30. [30]

    Detach and adapt: Learning cross-domain disen- tangled deep representation, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition

    Liu, Y.C., Yeh, Y.Y., Fu, T.C., Wang, S.D., Chiu, W.C., Wang, Y.C.F., 2018. Detach and adapt: Learning cross-domain disen- tangled deep representation, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition

  31. [31]

    Disentangling factors of variation in deep representation using adversarial training, in: Advances in Neu- ral Information Processing Systems

    Mathieu, M.F., Zhao, J.J., Zhao, J., Ramesh, A., Sprechmann, P., LeCun, Y., 2016. Disentangling factors of variation in deep representation using adversarial training, in: Advances in Neu- ral Information Processing Systems

  32. [32]

    Deep face recognition, in: Proceedings of the British Machine Vision Con- ference

    Parkhi, O.M., Vedaldi, A., Zisserman, A., 2015. Deep face recognition, in: Proceedings of the British Machine Vision Con- ference

  33. [33]

    Visual domain adaptation: A survey of recent advances

    Patel, V.M., Gopalan, R., Li, R., Chellappa, R., 2015. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine 32, 53–69

  34. [34]

    Learning to disentangle factors of variation with manifold interaction, in: International Conference on Machine Learning

    Reed, S., Sohn, K., Zhang, Y., Lee, H., 2014. Learning to disentangle factors of variation with manifold interaction, in: International Conference on Machine Learning

  35. [35]

    Deep visual analogy-making, in: Advances in Neural Information Processing Systems

    Reed, S.E., Zhang, Y., Zhang, Y., Lee, H., 2015. Deep visual analogy-making, in: Advances in Neural Information Processing Systems

  36. [36]

    Morph: A longitudinal image database of normal adult age-progression, in: Automatic Face and Gesture Recognition, 2006

    Ricanek, K., Tesafaye, T., 2006. Morph: A longitudinal image database of normal adult age-progression, in: Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference on, IEEE. pp. 341–345

  37. [37]

    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.,

  38. [38]

    3234–3243

    The synthia dataset: A large collection of synthetic im- ages for semantic segmentation of urban scenes, in: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3234–3243

  39. [39]

    Imagenet large scale visual recognition challenge

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al., 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 211–252

  40. [40]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  41. [41]

    Unsupervised cross- domain image generation, in: International Conference on Learning Representations

    Taigman, Y., Polyak, A., Wolf, L., 2017. Unsupervised cross- domain image generation, in: International Conference on Learning Representations

  42. [42]

    Deepface: Closing the gap to human-level performance in face verification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Taigman, Y., Yang, M., Ranzato, M., Wolf, L., 2014. Deepface: Closing the gap to human-level performance in face verification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708

  43. [43]

    Unbiased look at dataset bias, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE

    Torralba, A., Efros, A.A., 2011. Unbiased look at dataset bias, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE. pp. 1521–1528

  44. [44]

    Mix and match networks: encoder-decoder alignment for zero-pair image trans- lation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Wang, Y., van de Weijer, J., Herranz, L., 2018. Mix and match networks: encoder-decoder alignment for zero-pair image trans- lation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  45. [45]

    Dualgan: Unsu- pervised dual learning for image-to-image translation., in: Pro- ceedings of the International Conference on Computer Vision, pp

    Yi, Z., Zhang, H.R., Tan, P., Gong, M., 2017. Dualgan: Unsu- pervised dual learning for image-to-image translation., in: Pro- ceedings of the International Conference on Computer Vision, pp. 2868–2876

  46. [46]

    Bdd100k: A diverse driving video database with scalable annotation tooling

    Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T., 2018a. Bdd100k: A diverse driving video database with scalable annotation tooling. Proceedings of the European Conference on Computer Vision

  47. [47]

    Weakly supervised domain-specific color naming based on attention, in: Proceed- ings of the International Conference on Pattern Recognition, IEEE

    Yu, L., Cheng, Y., van de Weijer, J., 2018b. Weakly supervised domain-specific color naming based on attention, in: Proceed- ings of the International Conference on Pattern Recognition, IEEE. pp. 3019–3024

  48. [48]

    Synthetic data generation for end-to- end thermal infrared tracking

    Zhang, L., Gonzalez-Garcia, A., van de Weijer, J., Danelljan, M., Khan, F.S., 2018a. Synthetic data generation for end-to- end thermal infrared tracking. IEEE Transactions on Image Processing 28, 1837–1850

  49. [49]

    The unreasonable effectiveness of deep networks as a perceptual metric, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O., 2018b. The unreasonable effectiveness of deep networks as a perceptual metric, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  50. [50]

    Bias and generalization in deep generative models: An empirical study, in: Advances in Neural Information Processing Systems, pp

    Zhao, S., Ren, H., Yuan, A., Song, J., Goodman, N., Ermon, S., 2018. Bias and generalization in deep generative models: An empirical study, in: Advances in Neural Information Processing Systems, pp. 10815–10824

  51. [51]

    Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Zhu, J.Y., Park, T., Isola, P., Efros, A.A., 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  52. [52]

    Toward multimodal image- to-image translation, in: Advances in Neural Information Pro- cessing Systems, pp

    Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E., 2017b. Toward multimodal image- to-image translation, in: Advances in Neural Information Pro- cessing Systems, pp. 465–476

  53. [53]

    Ai can be sexist and racistits time to make it fair

    Zou, J., Schiebinger, L., 2018. Ai can be sexist and racistits time to make it fair. 13 Appendix Tables 6-10 show the architectures of the content en- coder, style encoder, image decoder and discriminator used in the cross-modal experiment. The used abbreviations are shown in Table 11. Layer Input →Output Kernel, stride, padconv1 [4,128, 128,3]→[4,128, 12...