pith. sign in

arxiv: 2204.13096 · v2 · submitted 2022-04-27 · 💻 cs.CV

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

Pith reviewed 2026-05-24 12:08 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionself-supervised learningclothingcausal modelnon-rigid objectssingle imagefashion imagesexpectation maximization
0
0 comments X

The pith

A self-supervised method uses a structural causal map to reconstruct 3D clothing from single 2D images without 3D annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to recover both the 3D shape and texture of clothing from one photograph. Existing approaches struggle because 3D labels are hard to obtain, templates fail on non-rigid items, and camera distance confuses shape estimates. By following an explicit causal structure among camera, shape, texture, and illumination, and using two expectation-maximization loops to separate these factors while refining a template, the method trains without 3D data. Tests on fashion image sets produce detailed 3D models, and the same pipeline works on bird photographs.

Core claim

The causality-aware self-supervised learning method can adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations by following an explainable structural causal map and embedding two expectation-maximization loops to disentangle four encoders and facilitate the prior template.

What carries the argument

The structural causal map (SCM) that explicitly models the relationships among camera position, shape, texture, and illumination, with two embedded expectation-maximization loops that disentangle four encoders and refine the prior template.

If this is right

  • The method reconstructs non-rigid clothing without access to 3D ground-truth meshes.
  • Disentangling camera, shape, texture, and illumination reduces the ambiguity in single-view reconstruction.
  • High-fidelity 3D results are achieved on the ATR and Market-HQ fashion benchmarks.
  • The same approach scales to fine-grained object datasets such as the CUB bird images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the causal relationships hold, the same disentanglement strategy could extend to other single-image 3D tasks like vehicle reconstruction.
  • Refining the prior template via EM loops may allow the model to adapt to new clothing styles not seen in training.
  • Applying the method to video sequences could test whether the disentangled factors remain consistent across frames.
  • Integrating more detailed lighting models might further improve texture accuracy under varying illumination.

Load-bearing premise

The structural causal map correctly encodes the generative relationships among camera position, shape, texture, and illumination such that the two EM loops can reliably separate the four latent variables and improve reconstruction.

What would settle it

Training the model with and without the causal structure and EM loops on the same fashion datasets and comparing the 3D reconstruction quality using available evaluation metrics; if the causal version shows no consistent advantage, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2204.13096 by Jiayin Zhu, Tat-Seng Chua, Wei Ji, Yi Yang, Zhedong Zheng.

Figure 1
Figure 1. Figure 1: Motivation. Here we compare the proposed approach [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Explanation of the Collider Connection. Here we show a [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The Structural Causal Map (SCM). We compare the proposed method with three typical 3D reconstruction works, including [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview. Here we show a “2D→3D→2D” loop. We follow the causal map in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Optimization Objectives. Here we show three kinds of [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Novel-view 3D clothing generation from single images on the unseen test set of Market-HQ and ATR. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Novel-view 3D clothing generation from single images on the unseen test set of Market-HQ. Here we gradually “Do” / change [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Novel-view 3D bird generation on the test set of CUB. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Histogram of 3D Camera Attributes C on Market-HQ. Here we show the distribution of azimuths, distances, elevations, Offsets-X and Offsets-Y. Besides, we also provide the distribution of the mean shape offset ∆S over the test set. replaces some visual changes with a more stable prediction. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
read the original abstract

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image. Compared with existing methods, we observe that three primary challenges remain: (1) 3D ground-truth meshes of clothing are usually inaccessible due to annotation difficulties and time costs; (2) Conventional template-based methods are limited to modeling non-rigid objects, e.g., handbags and dresses, which are common in fashion images; (3) The inherent ambiguity compromises the model training, such as the dilemma between a large shape with a remote camera or a small shape with a close camera. In an attempt to address the above limitations, we propose a causality-aware self-supervised learning method to adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations. In particular, to solve the inherent ambiguity among four implicit variables, i.e., camera position, shape, texture, and illumination, we introduce an explainable structural causal map (SCM) to build our model. The proposed model structure follows the spirit of the causal map, which explicitly considers the prior template in the camera estimation and shape prediction. When optimization, the causality intervention tool, i.e., two expectation-maximization loops, is deeply embedded in our algorithm to (1) disentangle four encoders and (2) facilitate the prior template. Extensive experiments on two 2D fashion benchmarks (ATR and Market-HQ) show that the proposed method could yield high-fidelity 3D reconstruction. Furthermore, we also verify the scalability of the proposed method on a fine-grained bird dataset, i.e., CUB. The code is available at https://github.com/layumi/ 3D-Magic-Mirror .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a causality-aware self-supervised method for reconstructing 3D clothing geometry and texture from a single 2D image. It introduces a structural causal map (SCM) to model relationships among camera position, shape, texture, and illumination, then embeds two expectation-maximization loops to disentangle four encoders and incorporate a prior template. The approach targets non-rigid objects without requiring 3D ground-truth annotations and is evaluated on the ATR and Market-HQ fashion datasets plus the CUB bird dataset, with code released publicly.

Significance. If the claimed disentanglement and reconstruction quality hold under quantitative scrutiny, the work would provide a concrete example of using SCM-guided architecture and dual EM loops to address inherent ambiguities in monocular 3D reconstruction of deformable objects. This could reduce dependence on expensive 3D annotations in fashion and fine-grained object modeling. The public code release is a clear strength that supports reproducibility.

major comments (2)
  1. [§4] §4 (Experiments): The abstract and method description assert 'high-fidelity 3D reconstruction' on ATR and Market-HQ, yet the provided manuscript excerpt supplies no quantitative metrics (e.g., Chamfer distance, normal consistency, or IoU against any baseline), ablation results on the two EM loops, or statistical significance tests. This evidence gap is load-bearing for the central claim that the SCM + EM design reliably separates the four latent variables.
  2. [§3.2] §3.2 (SCM and EM loops): The structural causal map is presented as encoding generative relationships among camera, shape, texture, and illumination, but the exact mathematical form of the interventions performed by the two EM loops (e.g., how the expectation step updates the four encoders or how the maximization step enforces the prior template) is not derived or shown to be parameter-free. Without these equations, it is impossible to verify that the loops achieve the claimed disentanglement rather than implicit fitting.
minor comments (2)
  1. The paper should include a clear notation table or diagram legend for the four encoders and the two EM loops to improve readability.
  2. Figure captions for qualitative results should explicitly state the input image, reconstructed mesh, and any texture map shown, rather than relying on visual inspection alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for stronger quantitative evidence and explicit mathematical derivations. We address each major comment below and will revise the manuscript to incorporate additional details and results.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): The abstract and method description assert 'high-fidelity 3D reconstruction' on ATR and Market-HQ, yet the provided manuscript excerpt supplies no quantitative metrics (e.g., Chamfer distance, normal consistency, or IoU against any baseline), ablation results on the two EM loops, or statistical significance tests. This evidence gap is load-bearing for the central claim that the SCM + EM design reliably separates the four latent variables.

    Authors: We acknowledge the absence of quantitative metrics and ablations in the reviewed version. Because the approach is fully self-supervised without 3D ground-truth meshes, standard 3D metrics such as Chamfer distance cannot be computed directly against ground truth. We will add 2D-based quantitative evaluations (reprojection error, perceptual similarity scores) against baselines, include ablation studies isolating each EM loop, and report statistical significance (e.g., paired t-tests) on the ATR and Market-HQ datasets in the revision. revision: yes

  2. Referee: [§3.2] §3.2 (SCM and EM loops): The structural causal map is presented as encoding generative relationships among camera, shape, texture, and illumination, but the exact mathematical form of the interventions performed by the two EM loops (e.g., how the expectation step updates the four encoders or how the maximization step enforces the prior template) is not derived or shown to be parameter-free. Without these equations, it is impossible to verify that the loops achieve the claimed disentanglement rather than implicit fitting.

    Authors: We agree that the precise update rules and intervention mechanics of the two EM loops require explicit derivation. In the revised manuscript we will provide the full mathematical formulation: the E-step expectations over the four encoders, the M-step maximization that incorporates the template prior, and the explicit intervention operators on the SCM. We will also clarify which parameters are learned versus fixed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained by design choices

full rationale

The paper introduces an SCM as an explanatory modeling tool and embeds two EM loops to enforce disentanglement of camera/shape/texture/illumination variables. These are explicit architectural decisions that follow from the stated causal framing rather than any derivation that reduces a claimed prediction back to fitted inputs or self-citations by construction. No equations, uniqueness theorems, or parameter-renaming steps are exhibited that would make the reconstruction output tautological with the training signals. The self-supervised claim is internally consistent with the absence of 3D annotations and does not rely on load-bearing self-citations for its central mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the high-level mention of an SCM and prior template; all ledger entries are therefore empty.

pith-pipeline@v0.9.0 · 5871 in / 1186 out tokens · 17302 ms · 2026-05-24T12:08:27.883896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 3 internal anchors

  1. [1]

    Wasserstein generative adversarial networks

    Martin Arjovsky, Soumith Chintala, and L ´eon Bottou. Wasserstein generative adversarial networks. InICML, 2017

  2. [2]

    View generalization for single image textured 3d models

    Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, and Bryan Catanzaro. View generalization for single image textured 3d models. In CVPR, 2021

  3. [3]

    Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop

    Benjamin Biggs, Ollie Boyne, James Charles, Andrew Fitzgibbon, and Roberto Cipolla. Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop. In ECCV, 2020

  4. [4]

    Emerging applications of bedside 3d printing in plas- tic surgery

    Michael P Chae, Warren M Rozen, Paul G McMenamin, Michael W Findlay, Robert T Spychal, and David J Hunter- Smith. Emerging applications of bedside 3d printing in plas- tic surgery. Frontiers in surgery, 2:25, 2015

  5. [5]

    Counterfactual samples synthesiz- ing for robust visual question answering

    Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. Counterfactual samples synthesiz- ing for robust visual question answering. In CVPR, 2020

  6. [6]

    Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer

    Wenzheng Chen, Jun Gao, Huan Ling, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer. In NeurIPS, 2019

  7. [7]

    Image search with text feedback by visiolinguistic attention learn- ing

    Yanbei Chen, Shaogang Gong, and Loris Bazzani. Image search with text feedback by visiolinguistic attention learn- ing. In CVPR, 2020

  8. [8]

    Stylegan-human: A data-centric odyssey of human genera- tion

    Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, and Ziwei Liu. Stylegan-human: A data-centric odyssey of human genera- tion. In ECCV, 2022

  9. [9]

    Kaolin: A pytorch library for accelerating 3d deep learning re- search

    Clement Fuji Tsang, Maria Shugrina, Jean Francois Lafleche, Towaki Takikawa, Jiehan Wang, Charles Loop, Wenzheng Chen, Krishna Murthy Jatavallabhula, Edward Smith, Artem Rozantsev, Or Perel, Tianchang Shen, Jun Gao, Sanja Fidler, Gavriel State, Jason Gorski, Tommy Xi- ang, Jianing Li, Michael Li, and Rev Lebaredian. Kaolin: A pytorch library for accelerati...

  10. [10]

    3d shape induction from 2d views of multiple objects

    Matheus Gadelha, Subhransu Maji, and Rui Wang. 3d shape induction from 2d views of multiple objects. In 3DV, 2017

  11. [11]

    Fd-gan: Pose-guided feature distilling gan for robust person re-identification

    Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, et al. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. NeurIPS, 2018

  12. [12]

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

    Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677, 2017

  13. [13]

    3d shape temporal aggregation for video-based clothing-change person re-identification

    Ke Han, Yan Huang, Shaogang Gong, Liang Wang, and Tie- niu Tan. 3d shape temporal aggregation for video-based clothing-change person re-identification. In ACCV, 2022

  14. [14]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  15. [15]

    Gans trained by a two time-scale update rule converge to a local nash equilib- rium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. NeurIPS, 30, 2017

  16. [16]

    Eva3d: Compositional 3d human generation from 2d image collections

    Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, and Ziwei Liu. Eva3d: Compositional 3d human generation from 2d image collections. arXiv:2210.04888, 2022

  17. [17]

    Self-supervised 3d mesh reconstruction from single images

    Tao Hu, Liwei Wang, Xiaogang Xu, Shu Liu, and Jiaya Jia. Self-supervised 3d mesh reconstruction from single images. In CVPR, 2021

  18. [18]

    Scops: Self-supervised co-part segmentation

    Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, and Jan Kautz. Scops: Self-supervised co-part segmentation. In CVPR, 2019

  19. [19]

    Black, David W

    Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, 2018

  20. [20]

    Learning category-specific mesh reconstruc- tion from image collections

    Angjoo Kanazawa, Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. Learning category-specific mesh reconstruc- tion from image collections. In ECCV, 2018

  21. [21]

    Learning view priors for single-view 3d reconstruction

    Hiroharu Kato and Tatsuya Harada. Learning view priors for single-view 3d reconstruction. In CVPR, 2019

  22. [22]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

  23. [23]

    CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

    Murat Kocaoglu, Christopher Snyder, Alexandros G Di- makis, and Sriram Vishwanath. Causalgan: Learning causal implicit generative models with adversarial training. arXiv:1709.02023, 2017

  24. [24]

    Canonical surface mapping via geometric cycle consistency

    Nilesh Kulkarni, Abhinav Gupta, and Shubham Tulsiani. Canonical surface mapping via geometric cycle consistency. In ICCV, 2019

  25. [25]

    Online adaptation for consistent mesh reconstruction in the wild

    Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xi- aolong Wang, Ming-Hsuan Yang, and Jan Kautz. Online adaptation for consistent mesh reconstruction in the wild. NeurIPS, 2020

  26. [26]

    Self-supervised single-view 3d reconstruction via semantic consistency

    Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, and Jan Kautz. Self-supervised single-view 3d reconstruction via semantic consistency. In ECCV, 2020

  27. [27]

    Invariant grounding for video question answering

    Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, and Tat-Seng Chua. Invariant grounding for video question answering. In CVPR, 2022

  28. [28]

    Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015

    Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015

  29. [29]

    End-to-end hu- man pose and mesh reconstruction with transformers

    Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-end hu- man pose and mesh reconstruction with transformers. In CVPR, 2021

  30. [30]

    Mesh graphormer

    Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. ICCV, 2021

  31. [31]

    An intriguing failing of convolutional neural networks and the coordconv solution

    Rosanne Liu, Joel Lehman, Piero Molino, Felipe Pet- roski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution. NeurIPS, 2018

  32. [32]

    Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set

    Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Han- qing Lu, and Shuicheng Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. 9

  33. [33]

    Structural causal 3d reconstruction

    Weiyang Liu, Zhen Liu, Liam Paull, Adrian Weller, and Bernhard Sch¨olkopf. Structural causal 3d reconstruction. In ECCV, 2022

  34. [34]

    Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015

  35. [35]

    Macro-micro adversarial network for human parsing

    Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Jun- qing Yu, and Yi Yang. Macro-micro adversarial network for human parsing. In ECCV, 2018

  36. [36]

    Pose guided person image genera- tion

    Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuyte- laars, and Luc Van Gool. Pose guided person image genera- tion. NeurIPS, 2017

  37. [37]

    Generative interventions for causal learning

    Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, and Carl V ondrick. Generative interventions for causal learning. In CVPR, 2021

  38. [38]

    The expectation-maximization algorithm

    Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996

  39. [39]

    Counterfactual vqa: A cause- effect look at language bias

    Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian- Sheng Hua, and Ji-Rong Wen. Counterfactual vqa: A cause- effect look at language bias. In CVPR, 2021

  40. [40]

    Two at once: Enhancing learning and generalization capacities via ibn-net

    Xingang Pan, Ping Luo, Jianping Shi, and Xiaoou Tang. Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, pages 464–479, 2018

  41. [41]

    Pytorch: An im- perative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai- son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im- perative style, high-p...

  42. [42]

    Causality

    Judea Pearl. Causality. Cambridge university press, 2009

  43. [43]

    The book of why: the new science of cause and effect

    Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018

  44. [44]

    El- ements of causal inference: foundations and learning algo- rithms

    Jonas Peters, Dominik Janzing, and Bernhard Sch ¨olkopf. El- ements of causal inference: foundations and learning algo- rithms. The MIT Press, 2017

  45. [45]

    Pose- normalized image generation for person re-identification

    Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. Pose- normalized image generation for person re-identification. In ECCV, 2018

  46. [46]

    Fine- tuning cnn image retrieval with no human annotation

    Filip Radenovi ´c, Giorgos Tolias, and Ond ˇrej Chum. Fine- tuning cnn image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence , 41(7):1655–1668, 2018

  47. [47]

    An interactive design for visualizable person re-identification

    Haolin Ren, Zheng Wang, Zhixiang Wang, Lixiong Chen, Shin’ichi Satoh, and Daning Hu. An interactive design for visualizable person re-identification. In ACM MM, 2020

  48. [48]

    Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness

    Qibing Ren, Yiting Chen, Yichuan Mo, Qitian Wu, and Junchi Yan. Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness. In SIGKDD, 2022

  49. [49]

    U- net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. In International Conference on Medical image com- puting and computer-assisted intervention , pages 234–241. Springer, 2015

  50. [50]

    Devil in the details: Towards accurate single and multiple human parsing

    Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4814–4821, 2019

  51. [51]

    Humangan: A generative model of hu- man images

    Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. Humangan: A generative model of hu- man images. In 3DV, 2021

  52. [52]

    Progressive domain adaptation for robot vision person re-identification

    Zijun Sha, Zelong Zeng, Zheng Wang, Yoichi Natori, Ya- suhiro Taniguchi, and Shin’ichi Satoh. Progressive domain adaptation for robot vision person re-identification. In ACM MM, 2020

  53. [53]

    Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction

    Keng Hua Sing and Wei Xie. Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction. In ACM CHI, 2016

  54. [54]

    Dissecting person re- identification from the viewpoint of viewpoint

    Xiaoxiao Sun and Liang Zheng. Dissecting person re- identification from the viewpoint of viewpoint. In CVPR, 2019

  55. [55]

    Monocular, one-stage, regression of multiple 3d people

    Yu Sun, Qian Bao, Wu Liu, Yili Fu, Black Michael J., and Tao Mei. Monocular, one-stage, regression of multiple 3d people. In ICCV, 2021

  56. [56]

    Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)

    Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). InECCV, pages 480–496, 2018

  57. [57]

    Efficientdet: Scalable and efficient object detection

    Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In CVPR, 2020

  58. [58]

    Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

    Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS, 2017

  59. [59]

    De- biasing nlu models via causal intervention and counterfactual reasoning

    Bing Tian, Yixin Cao, Yong Zhang, and Chunxiao Xing. De- biasing nlu models via causal intervention and counterfactual reasoning. In Proceedings of the AAAI Conference on Artifi- cial Intelligence, volume 36, pages 11376–11384, 2022

  60. [60]

    Im- plicit mesh reconstruction from unannotated image collec- tions

    Shubham Tulsiani, Nilesh Kulkarni, and Abhinav Gupta. Im- plicit mesh reconstruction from unannotated image collec- tions. arXiv:2007.08504, 2020

  61. [61]

    Multi-view supervision for single-view recon- struction via differentiable ray consistency

    Shubham Tulsiani, Tinghui Zhou, Alexei A Efros, and Ji- tendra Malik. Multi-view supervision for single-view recon- struction via differentiable ray consistency. In CVPR, 2017

  62. [62]

    C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Re- port CNS-TR-2011-001, California Institute of Technology, 2011

  63. [63]

    Deep high-resolution repre- sentation learning for visual recognition

    Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution repre- sentation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence , 43(10):3349– 3364, 2020

  64. [64]

    Pixel2mesh: Generating 3d mesh models from single rgb images

    Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018. 10

  65. [65]

    Real-esrgan: Training real-world blind super-resolution with pure synthetic data

    Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In International Conference on Com- puter Vision Workshops (ICCVW)

  66. [66]

    Image quality assessment: from error visibility to structural similarity

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004

  67. [67]

    Synthesizing counterfactual samples for effective image-text matching

    Hao Wei, Shuhui Wang, Xinzhe Han, Zhe Xue, Bin Ma, Xi- aoming Wei, and Xiaolin Wei. Synthesizing counterfactual samples for effective image-text matching. InProceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 4355–4364, New York, NY , USA, 2022. Associa- tion for Computing Machinery

  68. [68]

    Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion

    Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. In ICCV, 2019

  69. [69]

    Icon: implicit clothed humans obtained from normals

    Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J Black. Icon: implicit clothed humans obtained from normals. In CVPR, 2022

  70. [70]

    3d human texture es- timation from a single image with transformers

    Xiangyu Xu and Chen Change Loy. 3d human texture es- timation from a single image with transformers. In ICCV, 2021

  71. [71]

    Em algorithms of gaussian mixture model and hidden markov model

    Guorong Xuan, Wei Zhang, and Peiqi Chai. Em algorithms of gaussian mixture model and hidden markov model. In ICIP, 2001

  72. [72]

    Ulip: Learning unified representation of language, image and point cloud for 3d understanding

    Le Xue, Mingfei Gao, Chen Xing, Roberto Mart ´ın-Mart´ın, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. Ulip: Learning unified representation of language, image and point cloud for 3d understanding. arXiv:2212.05171, 2022

  73. [73]

    Causalvae: Disentangled rep- resentation learning via neural structural causal models

    Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. Causalvae: Disentangled rep- resentation learning via neural structural causal models. In CVPR, 2021

  74. [74]

    Shelf- supervised mesh prediction in the wild

    Yufei Ye, Shubham Tulsiani, and Abhinav Gupta. Shelf- supervised mesh prediction in the wild. In CVPR, 2021

  75. [75]

    Lite-hrnet: A lightweight high-resolution network

    Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. Lite-hrnet: A lightweight high-resolution network. In CVPR, 2021

  76. [76]

    Causal intervention for weakly- supervised semantic segmentation

    Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, and Qianru Sun. Causal intervention for weakly- supervised semantic segmentation. NeurIPS, 2020

  77. [77]

    Monocular 3d object recon- struction with gan inversion

    Junzhe Zhang, Daxuan Ren, Zhongang Cai, Chai Kiat Yeo, Bo Dai, and Chen Change Loy. Monocular 3d object recon- struction with gan inversion. In ECCV, 2022

  78. [78]

    Causerec: Counterfactual user sequence synthesis for sequential recommendation

    Shengyu Zhang, Dong Yao, Zhou Zhao, Tat-Seng Chua, and Fei Wu. Causerec: Counterfactual user sequence synthesis for sequential recommendation. In SIGIR, 2021

  79. [79]

    Scalable person re-identification: A benchmark

    Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jing- dong Wang, and Qi Tian. Scalable person re-identification: A benchmark. In ICCV, 2015

  80. [80]

    Parameter-efficient person re-identification in the 3d space

    Zhedong Zheng, Xiaohan Wang, Nenggan Zheng, and Yi Yang. Parameter-efficient person re-identification in the 3d space. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , 2022. doi: 10.1109/TNNLS.2022. 3214834

Showing first 80 references.