3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective
Pith reviewed 2026-05-24 12:08 UTC · model grok-4.3
The pith
A self-supervised method uses a structural causal map to reconstruct 3D clothing from single 2D images without 3D annotations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The causality-aware self-supervised learning method can adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations by following an explainable structural causal map and embedding two expectation-maximization loops to disentangle four encoders and facilitate the prior template.
What carries the argument
The structural causal map (SCM) that explicitly models the relationships among camera position, shape, texture, and illumination, with two embedded expectation-maximization loops that disentangle four encoders and refine the prior template.
If this is right
- The method reconstructs non-rigid clothing without access to 3D ground-truth meshes.
- Disentangling camera, shape, texture, and illumination reduces the ambiguity in single-view reconstruction.
- High-fidelity 3D results are achieved on the ATR and Market-HQ fashion benchmarks.
- The same approach scales to fine-grained object datasets such as the CUB bird images.
Where Pith is reading between the lines
- If the causal relationships hold, the same disentanglement strategy could extend to other single-image 3D tasks like vehicle reconstruction.
- Refining the prior template via EM loops may allow the model to adapt to new clothing styles not seen in training.
- Applying the method to video sequences could test whether the disentangled factors remain consistent across frames.
- Integrating more detailed lighting models might further improve texture accuracy under varying illumination.
Load-bearing premise
The structural causal map correctly encodes the generative relationships among camera position, shape, texture, and illumination such that the two EM loops can reliably separate the four latent variables and improve reconstruction.
What would settle it
Training the model with and without the causal structure and EM loops on the same fashion datasets and comparing the 3D reconstruction quality using available evaluation metrics; if the causal version shows no consistent advantage, the central claim does not hold.
Figures
read the original abstract
This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image. Compared with existing methods, we observe that three primary challenges remain: (1) 3D ground-truth meshes of clothing are usually inaccessible due to annotation difficulties and time costs; (2) Conventional template-based methods are limited to modeling non-rigid objects, e.g., handbags and dresses, which are common in fashion images; (3) The inherent ambiguity compromises the model training, such as the dilemma between a large shape with a remote camera or a small shape with a close camera. In an attempt to address the above limitations, we propose a causality-aware self-supervised learning method to adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations. In particular, to solve the inherent ambiguity among four implicit variables, i.e., camera position, shape, texture, and illumination, we introduce an explainable structural causal map (SCM) to build our model. The proposed model structure follows the spirit of the causal map, which explicitly considers the prior template in the camera estimation and shape prediction. When optimization, the causality intervention tool, i.e., two expectation-maximization loops, is deeply embedded in our algorithm to (1) disentangle four encoders and (2) facilitate the prior template. Extensive experiments on two 2D fashion benchmarks (ATR and Market-HQ) show that the proposed method could yield high-fidelity 3D reconstruction. Furthermore, we also verify the scalability of the proposed method on a fine-grained bird dataset, i.e., CUB. The code is available at https://github.com/layumi/ 3D-Magic-Mirror .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a causality-aware self-supervised method for reconstructing 3D clothing geometry and texture from a single 2D image. It introduces a structural causal map (SCM) to model relationships among camera position, shape, texture, and illumination, then embeds two expectation-maximization loops to disentangle four encoders and incorporate a prior template. The approach targets non-rigid objects without requiring 3D ground-truth annotations and is evaluated on the ATR and Market-HQ fashion datasets plus the CUB bird dataset, with code released publicly.
Significance. If the claimed disentanglement and reconstruction quality hold under quantitative scrutiny, the work would provide a concrete example of using SCM-guided architecture and dual EM loops to address inherent ambiguities in monocular 3D reconstruction of deformable objects. This could reduce dependence on expensive 3D annotations in fashion and fine-grained object modeling. The public code release is a clear strength that supports reproducibility.
major comments (2)
- [§4] §4 (Experiments): The abstract and method description assert 'high-fidelity 3D reconstruction' on ATR and Market-HQ, yet the provided manuscript excerpt supplies no quantitative metrics (e.g., Chamfer distance, normal consistency, or IoU against any baseline), ablation results on the two EM loops, or statistical significance tests. This evidence gap is load-bearing for the central claim that the SCM + EM design reliably separates the four latent variables.
- [§3.2] §3.2 (SCM and EM loops): The structural causal map is presented as encoding generative relationships among camera, shape, texture, and illumination, but the exact mathematical form of the interventions performed by the two EM loops (e.g., how the expectation step updates the four encoders or how the maximization step enforces the prior template) is not derived or shown to be parameter-free. Without these equations, it is impossible to verify that the loops achieve the claimed disentanglement rather than implicit fitting.
minor comments (2)
- The paper should include a clear notation table or diagram legend for the four encoders and the two EM loops to improve readability.
- Figure captions for qualitative results should explicitly state the input image, reconstructed mesh, and any texture map shown, rather than relying on visual inspection alone.
Simulated Author's Rebuttal
We thank the referee for the constructive comments highlighting the need for stronger quantitative evidence and explicit mathematical derivations. We address each major comment below and will revise the manuscript to incorporate additional details and results.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The abstract and method description assert 'high-fidelity 3D reconstruction' on ATR and Market-HQ, yet the provided manuscript excerpt supplies no quantitative metrics (e.g., Chamfer distance, normal consistency, or IoU against any baseline), ablation results on the two EM loops, or statistical significance tests. This evidence gap is load-bearing for the central claim that the SCM + EM design reliably separates the four latent variables.
Authors: We acknowledge the absence of quantitative metrics and ablations in the reviewed version. Because the approach is fully self-supervised without 3D ground-truth meshes, standard 3D metrics such as Chamfer distance cannot be computed directly against ground truth. We will add 2D-based quantitative evaluations (reprojection error, perceptual similarity scores) against baselines, include ablation studies isolating each EM loop, and report statistical significance (e.g., paired t-tests) on the ATR and Market-HQ datasets in the revision. revision: yes
-
Referee: [§3.2] §3.2 (SCM and EM loops): The structural causal map is presented as encoding generative relationships among camera, shape, texture, and illumination, but the exact mathematical form of the interventions performed by the two EM loops (e.g., how the expectation step updates the four encoders or how the maximization step enforces the prior template) is not derived or shown to be parameter-free. Without these equations, it is impossible to verify that the loops achieve the claimed disentanglement rather than implicit fitting.
Authors: We agree that the precise update rules and intervention mechanics of the two EM loops require explicit derivation. In the revised manuscript we will provide the full mathematical formulation: the E-step expectations over the four encoders, the M-step maximization that incorporates the template prior, and the explicit intervention operators on the SCM. We will also clarify which parameters are learned versus fixed. revision: yes
Circularity Check
No significant circularity; derivation is self-contained by design choices
full rationale
The paper introduces an SCM as an explanatory modeling tool and embeds two EM loops to enforce disentanglement of camera/shape/texture/illumination variables. These are explicit architectural decisions that follow from the stated causal framing rather than any derivation that reduces a claimed prediction back to fitted inputs or self-citations by construction. No equations, uniqueness theorems, or parameter-renaming steps are exhibited that would make the reconstruction output tautological with the training signals. The self-supervised claim is internally consistent with the absence of 3D annotations and does not rely on load-bearing self-citations for its central mechanism.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Wasserstein generative adversarial networks
Martin Arjovsky, Soumith Chintala, and L ´eon Bottou. Wasserstein generative adversarial networks. InICML, 2017
work page 2017
-
[2]
View generalization for single image textured 3d models
Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, and Bryan Catanzaro. View generalization for single image textured 3d models. In CVPR, 2021
work page 2021
-
[3]
Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop
Benjamin Biggs, Ollie Boyne, James Charles, Andrew Fitzgibbon, and Roberto Cipolla. Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop. In ECCV, 2020
work page 2020
-
[4]
Emerging applications of bedside 3d printing in plas- tic surgery
Michael P Chae, Warren M Rozen, Paul G McMenamin, Michael W Findlay, Robert T Spychal, and David J Hunter- Smith. Emerging applications of bedside 3d printing in plas- tic surgery. Frontiers in surgery, 2:25, 2015
work page 2015
-
[5]
Counterfactual samples synthesiz- ing for robust visual question answering
Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. Counterfactual samples synthesiz- ing for robust visual question answering. In CVPR, 2020
work page 2020
-
[6]
Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer
Wenzheng Chen, Jun Gao, Huan Ling, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer. In NeurIPS, 2019
work page 2019
-
[7]
Image search with text feedback by visiolinguistic attention learn- ing
Yanbei Chen, Shaogang Gong, and Loris Bazzani. Image search with text feedback by visiolinguistic attention learn- ing. In CVPR, 2020
work page 2020
-
[8]
Stylegan-human: A data-centric odyssey of human genera- tion
Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, and Ziwei Liu. Stylegan-human: A data-centric odyssey of human genera- tion. In ECCV, 2022
work page 2022
-
[9]
Kaolin: A pytorch library for accelerating 3d deep learning re- search
Clement Fuji Tsang, Maria Shugrina, Jean Francois Lafleche, Towaki Takikawa, Jiehan Wang, Charles Loop, Wenzheng Chen, Krishna Murthy Jatavallabhula, Edward Smith, Artem Rozantsev, Or Perel, Tianchang Shen, Jun Gao, Sanja Fidler, Gavriel State, Jason Gorski, Tommy Xi- ang, Jianing Li, Michael Li, and Rev Lebaredian. Kaolin: A pytorch library for accelerati...
work page 2022
-
[10]
3d shape induction from 2d views of multiple objects
Matheus Gadelha, Subhransu Maji, and Rui Wang. 3d shape induction from 2d views of multiple objects. In 3DV, 2017
work page 2017
-
[11]
Fd-gan: Pose-guided feature distilling gan for robust person re-identification
Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, et al. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. NeurIPS, 2018
work page 2018
-
[12]
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
3d shape temporal aggregation for video-based clothing-change person re-identification
Ke Han, Yan Huang, Shaogang Gong, Liang Wang, and Tie- niu Tan. 3d shape temporal aggregation for video-based clothing-change person re-identification. In ACCV, 2022
work page 2022
-
[14]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
work page 2016
-
[15]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. NeurIPS, 30, 2017
work page 2017
-
[16]
Eva3d: Compositional 3d human generation from 2d image collections
Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, and Ziwei Liu. Eva3d: Compositional 3d human generation from 2d image collections. arXiv:2210.04888, 2022
-
[17]
Self-supervised 3d mesh reconstruction from single images
Tao Hu, Liwei Wang, Xiaogang Xu, Shu Liu, and Jiaya Jia. Self-supervised 3d mesh reconstruction from single images. In CVPR, 2021
work page 2021
-
[18]
Scops: Self-supervised co-part segmentation
Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, and Jan Kautz. Scops: Self-supervised co-part segmentation. In CVPR, 2019
work page 2019
-
[19]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, 2018
work page 2018
-
[20]
Learning category-specific mesh reconstruc- tion from image collections
Angjoo Kanazawa, Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. Learning category-specific mesh reconstruc- tion from image collections. In ECCV, 2018
work page 2018
-
[21]
Learning view priors for single-view 3d reconstruction
Hiroharu Kato and Tatsuya Harada. Learning view priors for single-view 3d reconstruction. In CVPR, 2019
work page 2019
-
[22]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[23]
CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training
Murat Kocaoglu, Christopher Snyder, Alexandros G Di- makis, and Sriram Vishwanath. Causalgan: Learning causal implicit generative models with adversarial training. arXiv:1709.02023, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
Canonical surface mapping via geometric cycle consistency
Nilesh Kulkarni, Abhinav Gupta, and Shubham Tulsiani. Canonical surface mapping via geometric cycle consistency. In ICCV, 2019
work page 2019
-
[25]
Online adaptation for consistent mesh reconstruction in the wild
Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xi- aolong Wang, Ming-Hsuan Yang, and Jan Kautz. Online adaptation for consistent mesh reconstruction in the wild. NeurIPS, 2020
work page 2020
-
[26]
Self-supervised single-view 3d reconstruction via semantic consistency
Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, and Jan Kautz. Self-supervised single-view 3d reconstruction via semantic consistency. In ECCV, 2020
work page 2020
-
[27]
Invariant grounding for video question answering
Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, and Tat-Seng Chua. Invariant grounding for video question answering. In CVPR, 2022
work page 2022
-
[28]
Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015
work page 2015
-
[29]
End-to-end hu- man pose and mesh reconstruction with transformers
Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-end hu- man pose and mesh reconstruction with transformers. In CVPR, 2021
work page 2021
- [30]
-
[31]
An intriguing failing of convolutional neural networks and the coordconv solution
Rosanne Liu, Joel Lehman, Piero Molino, Felipe Pet- roski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution. NeurIPS, 2018
work page 2018
-
[32]
Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set
Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Han- qing Lu, and Shuicheng Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. 9
work page 2012
-
[33]
Structural causal 3d reconstruction
Weiyang Liu, Zhen Liu, Liam Paull, Adrian Weller, and Bernhard Sch¨olkopf. Structural causal 3d reconstruction. In ECCV, 2022
work page 2022
-
[34]
Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015
work page 2015
-
[35]
Macro-micro adversarial network for human parsing
Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Jun- qing Yu, and Yi Yang. Macro-micro adversarial network for human parsing. In ECCV, 2018
work page 2018
-
[36]
Pose guided person image genera- tion
Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuyte- laars, and Luc Van Gool. Pose guided person image genera- tion. NeurIPS, 2017
work page 2017
-
[37]
Generative interventions for causal learning
Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, and Carl V ondrick. Generative interventions for causal learning. In CVPR, 2021
work page 2021
-
[38]
The expectation-maximization algorithm
Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996
work page 1996
-
[39]
Counterfactual vqa: A cause- effect look at language bias
Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian- Sheng Hua, and Ji-Rong Wen. Counterfactual vqa: A cause- effect look at language bias. In CVPR, 2021
work page 2021
-
[40]
Two at once: Enhancing learning and generalization capacities via ibn-net
Xingang Pan, Ping Luo, Jianping Shi, and Xiaoou Tang. Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, pages 464–479, 2018
work page 2018
-
[41]
Pytorch: An im- perative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai- son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im- perative style, high-p...
work page 2019
- [42]
-
[43]
The book of why: the new science of cause and effect
Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018
work page 2018
-
[44]
El- ements of causal inference: foundations and learning algo- rithms
Jonas Peters, Dominik Janzing, and Bernhard Sch ¨olkopf. El- ements of causal inference: foundations and learning algo- rithms. The MIT Press, 2017
work page 2017
-
[45]
Pose- normalized image generation for person re-identification
Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. Pose- normalized image generation for person re-identification. In ECCV, 2018
work page 2018
-
[46]
Fine- tuning cnn image retrieval with no human annotation
Filip Radenovi ´c, Giorgos Tolias, and Ond ˇrej Chum. Fine- tuning cnn image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence , 41(7):1655–1668, 2018
work page 2018
-
[47]
An interactive design for visualizable person re-identification
Haolin Ren, Zheng Wang, Zhixiang Wang, Lixiong Chen, Shin’ichi Satoh, and Daning Hu. An interactive design for visualizable person re-identification. In ACM MM, 2020
work page 2020
-
[48]
Qibing Ren, Yiting Chen, Yichuan Mo, Qitian Wu, and Junchi Yan. Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness. In SIGKDD, 2022
work page 2022
-
[49]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. In International Conference on Medical image com- puting and computer-assisted intervention , pages 234–241. Springer, 2015
work page 2015
-
[50]
Devil in the details: Towards accurate single and multiple human parsing
Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4814–4821, 2019
work page 2019
-
[51]
Humangan: A generative model of hu- man images
Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. Humangan: A generative model of hu- man images. In 3DV, 2021
work page 2021
-
[52]
Progressive domain adaptation for robot vision person re-identification
Zijun Sha, Zelong Zeng, Zheng Wang, Yoichi Natori, Ya- suhiro Taniguchi, and Shin’ichi Satoh. Progressive domain adaptation for robot vision person re-identification. In ACM MM, 2020
work page 2020
-
[53]
Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction
Keng Hua Sing and Wei Xie. Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction. In ACM CHI, 2016
work page 2016
-
[54]
Dissecting person re- identification from the viewpoint of viewpoint
Xiaoxiao Sun and Liang Zheng. Dissecting person re- identification from the viewpoint of viewpoint. In CVPR, 2019
work page 2019
-
[55]
Monocular, one-stage, regression of multiple 3d people
Yu Sun, Qian Bao, Wu Liu, Yili Fu, Black Michael J., and Tao Mei. Monocular, one-stage, regression of multiple 3d people. In ICCV, 2021
work page 2021
-
[56]
Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). InECCV, pages 480–496, 2018
work page 2018
-
[57]
Efficientdet: Scalable and efficient object detection
Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In CVPR, 2020
work page 2020
-
[58]
Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS, 2017
work page 2017
-
[59]
De- biasing nlu models via causal intervention and counterfactual reasoning
Bing Tian, Yixin Cao, Yong Zhang, and Chunxiao Xing. De- biasing nlu models via causal intervention and counterfactual reasoning. In Proceedings of the AAAI Conference on Artifi- cial Intelligence, volume 36, pages 11376–11384, 2022
work page 2022
-
[60]
Im- plicit mesh reconstruction from unannotated image collec- tions
Shubham Tulsiani, Nilesh Kulkarni, and Abhinav Gupta. Im- plicit mesh reconstruction from unannotated image collec- tions. arXiv:2007.08504, 2020
-
[61]
Multi-view supervision for single-view recon- struction via differentiable ray consistency
Shubham Tulsiani, Tinghui Zhou, Alexei A Efros, and Ji- tendra Malik. Multi-view supervision for single-view recon- struction via differentiable ray consistency. In CVPR, 2017
work page 2017
-
[62]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Re- port CNS-TR-2011-001, California Institute of Technology, 2011
work page 2011
-
[63]
Deep high-resolution repre- sentation learning for visual recognition
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution repre- sentation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence , 43(10):3349– 3364, 2020
work page 2020
-
[64]
Pixel2mesh: Generating 3d mesh models from single rgb images
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018. 10
work page 2018
-
[65]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In International Conference on Com- puter Vision Workshops (ICCVW)
-
[66]
Image quality assessment: from error visibility to structural similarity
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004
work page 2004
-
[67]
Synthesizing counterfactual samples for effective image-text matching
Hao Wei, Shuhui Wang, Xinzhe Han, Zhe Xue, Bin Ma, Xi- aoming Wei, and Xiaolin Wei. Synthesizing counterfactual samples for effective image-text matching. InProceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 4355–4364, New York, NY , USA, 2022. Associa- tion for Computing Machinery
work page 2022
-
[68]
Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion
Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. In ICCV, 2019
work page 2019
-
[69]
Icon: implicit clothed humans obtained from normals
Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J Black. Icon: implicit clothed humans obtained from normals. In CVPR, 2022
work page 2022
-
[70]
3d human texture es- timation from a single image with transformers
Xiangyu Xu and Chen Change Loy. 3d human texture es- timation from a single image with transformers. In ICCV, 2021
work page 2021
-
[71]
Em algorithms of gaussian mixture model and hidden markov model
Guorong Xuan, Wei Zhang, and Peiqi Chai. Em algorithms of gaussian mixture model and hidden markov model. In ICIP, 2001
work page 2001
-
[72]
Ulip: Learning unified representation of language, image and point cloud for 3d understanding
Le Xue, Mingfei Gao, Chen Xing, Roberto Mart ´ın-Mart´ın, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. Ulip: Learning unified representation of language, image and point cloud for 3d understanding. arXiv:2212.05171, 2022
-
[73]
Causalvae: Disentangled rep- resentation learning via neural structural causal models
Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. Causalvae: Disentangled rep- resentation learning via neural structural causal models. In CVPR, 2021
work page 2021
-
[74]
Shelf- supervised mesh prediction in the wild
Yufei Ye, Shubham Tulsiani, and Abhinav Gupta. Shelf- supervised mesh prediction in the wild. In CVPR, 2021
work page 2021
-
[75]
Lite-hrnet: A lightweight high-resolution network
Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. Lite-hrnet: A lightweight high-resolution network. In CVPR, 2021
work page 2021
-
[76]
Causal intervention for weakly- supervised semantic segmentation
Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, and Qianru Sun. Causal intervention for weakly- supervised semantic segmentation. NeurIPS, 2020
work page 2020
-
[77]
Monocular 3d object recon- struction with gan inversion
Junzhe Zhang, Daxuan Ren, Zhongang Cai, Chai Kiat Yeo, Bo Dai, and Chen Change Loy. Monocular 3d object recon- struction with gan inversion. In ECCV, 2022
work page 2022
-
[78]
Causerec: Counterfactual user sequence synthesis for sequential recommendation
Shengyu Zhang, Dong Yao, Zhou Zhao, Tat-Seng Chua, and Fei Wu. Causerec: Counterfactual user sequence synthesis for sequential recommendation. In SIGIR, 2021
work page 2021
-
[79]
Scalable person re-identification: A benchmark
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jing- dong Wang, and Qi Tian. Scalable person re-identification: A benchmark. In ICCV, 2015
work page 2015
-
[80]
Parameter-efficient person re-identification in the 3d space
Zhedong Zheng, Xiaohan Wang, Nenggan Zheng, and Yi Yang. Parameter-efficient person re-identification in the 3d space. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , 2022. doi: 10.1109/TNNLS.2022. 3214834
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.