3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

Jiayin Zhu; Tat-Seng Chua; Wei Ji; Yi Yang; Zhedong Zheng

arxiv: 2204.13096 · v2 · submitted 2022-04-27 · 💻 cs.CV

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

Zhedong Zheng , Jiayin Zhu , Wei Ji , Yi Yang , Tat-Seng Chua This is my paper

Pith reviewed 2026-05-24 12:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructionself-supervised learningclothingcausal modelnon-rigid objectssingle imagefashion imagesexpectation maximization

0 comments

The pith

A self-supervised method uses a structural causal map to reconstruct 3D clothing from single 2D images without 3D annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to recover both the 3D shape and texture of clothing from one photograph. Existing approaches struggle because 3D labels are hard to obtain, templates fail on non-rigid items, and camera distance confuses shape estimates. By following an explicit causal structure among camera, shape, texture, and illumination, and using two expectation-maximization loops to separate these factors while refining a template, the method trains without 3D data. Tests on fashion image sets produce detailed 3D models, and the same pipeline works on bird photographs.

Core claim

The causality-aware self-supervised learning method can adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations by following an explainable structural causal map and embedding two expectation-maximization loops to disentangle four encoders and facilitate the prior template.

What carries the argument

The structural causal map (SCM) that explicitly models the relationships among camera position, shape, texture, and illumination, with two embedded expectation-maximization loops that disentangle four encoders and refine the prior template.

If this is right

The method reconstructs non-rigid clothing without access to 3D ground-truth meshes.
Disentangling camera, shape, texture, and illumination reduces the ambiguity in single-view reconstruction.
High-fidelity 3D results are achieved on the ATR and Market-HQ fashion benchmarks.
The same approach scales to fine-grained object datasets such as the CUB bird images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the causal relationships hold, the same disentanglement strategy could extend to other single-image 3D tasks like vehicle reconstruction.
Refining the prior template via EM loops may allow the model to adapt to new clothing styles not seen in training.
Applying the method to video sequences could test whether the disentangled factors remain consistent across frames.
Integrating more detailed lighting models might further improve texture accuracy under varying illumination.

Load-bearing premise

The structural causal map correctly encodes the generative relationships among camera position, shape, texture, and illumination such that the two EM loops can reliably separate the four latent variables and improve reconstruction.

What would settle it

Training the model with and without the causal structure and EM loops on the same fashion datasets and comparing the 3D reconstruction quality using available evaluation metrics; if the causal version shows no consistent advantage, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2204.13096 by Jiayin Zhu, Tat-Seng Chua, Wei Ji, Yi Yang, Zhedong Zheng.

**Figure 2.** Figure 2: Explanation of the Collider Connection. Here we show a [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The Structural Causal Map (SCM). We compare the proposed method with three typical 3D reconstruction works, including [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Overview. Here we show a “2D→3D→2D” loop. We follow the causal map in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: Optimization Objectives. Here we show three kinds of [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Novel-view 3D clothing generation from single images on the unseen test set of Market-HQ and ATR. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Novel-view 3D clothing generation from single images on the unseen test set of Market-HQ. Here we gradually “Do” / change [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 10.** Figure 10: Novel-view 3D bird generation on the test set of CUB. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Histogram of 3D Camera Attributes C on Market-HQ. Here we show the distribution of azimuths, distances, elevations, Offsets-X and Offsets-Y. Besides, we also provide the distribution of the mean shape offset ∆S over the test set. replaces some visual changes with a more stable prediction. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

read the original abstract

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image. Compared with existing methods, we observe that three primary challenges remain: (1) 3D ground-truth meshes of clothing are usually inaccessible due to annotation difficulties and time costs; (2) Conventional template-based methods are limited to modeling non-rigid objects, e.g., handbags and dresses, which are common in fashion images; (3) The inherent ambiguity compromises the model training, such as the dilemma between a large shape with a remote camera or a small shape with a close camera. In an attempt to address the above limitations, we propose a causality-aware self-supervised learning method to adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations. In particular, to solve the inherent ambiguity among four implicit variables, i.e., camera position, shape, texture, and illumination, we introduce an explainable structural causal map (SCM) to build our model. The proposed model structure follows the spirit of the causal map, which explicitly considers the prior template in the camera estimation and shape prediction. When optimization, the causality intervention tool, i.e., two expectation-maximization loops, is deeply embedded in our algorithm to (1) disentangle four encoders and (2) facilitate the prior template. Extensive experiments on two 2D fashion benchmarks (ATR and Market-HQ) show that the proposed method could yield high-fidelity 3D reconstruction. Furthermore, we also verify the scalability of the proposed method on a fine-grained bird dataset, i.e., CUB. The code is available at https://github.com/layumi/ 3D-Magic-Mirror .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a self-supervised 3D clothing reconstruction method that uses an SCM plus dual EM loops to disentangle camera, shape, texture, and illumination without 3D labels.

read the letter

The main point with this paper is that it gives a practical self-supervised route to 3D clothing models from one photo by using a structural causal map to sort out the mess between camera position, shape, texture, and light. They build the model around that causal structure, putting the template prior into the camera and shape parts, then run two EM loops during training to pull apart the four encoders and help with the template. This directly targets the three issues they list: no 3D ground truth available, trouble with non-rigid items like dresses, and the shape-versus-distance ambiguity. The experiments cover the ATR and Market-HQ fashion sets plus the CUB bird set to check if it generalizes. What works here is the self-supervised angle that skips 3D annotations entirely, which is useful for real-world fashion photos where meshes are hard to get. The causal framing makes the disentanglement steps more interpretable than black-box alternatives. The soft spot is that the whole thing depends on the SCM being a good match for how the images are actually generated. If the causal arrows they drew don't line up with the data, the EM steps could separate the wrong things or leave residual ambiguities. The abstract claims high-fidelity results but doesn't show the numbers or comparisons, so the strength of the improvement over existing template methods isn't clear from the summary alone. I'd want to check the full results section for ablations on the EM loops and the SCM. This paper is aimed at people doing single-image 3D reconstruction in computer vision, particularly for clothing and other deformable objects. It has a clear technical contribution and some experimental validation, so it deserves to go through peer review even if revisions are needed on the evaluation details. Recommendation: Send it out for review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a causality-aware self-supervised method for reconstructing 3D clothing geometry and texture from a single 2D image. It introduces a structural causal map (SCM) to model relationships among camera position, shape, texture, and illumination, then embeds two expectation-maximization loops to disentangle four encoders and incorporate a prior template. The approach targets non-rigid objects without requiring 3D ground-truth annotations and is evaluated on the ATR and Market-HQ fashion datasets plus the CUB bird dataset, with code released publicly.

Significance. If the claimed disentanglement and reconstruction quality hold under quantitative scrutiny, the work would provide a concrete example of using SCM-guided architecture and dual EM loops to address inherent ambiguities in monocular 3D reconstruction of deformable objects. This could reduce dependence on expensive 3D annotations in fashion and fine-grained object modeling. The public code release is a clear strength that supports reproducibility.

major comments (2)

[§4] §4 (Experiments): The abstract and method description assert 'high-fidelity 3D reconstruction' on ATR and Market-HQ, yet the provided manuscript excerpt supplies no quantitative metrics (e.g., Chamfer distance, normal consistency, or IoU against any baseline), ablation results on the two EM loops, or statistical significance tests. This evidence gap is load-bearing for the central claim that the SCM + EM design reliably separates the four latent variables.
[§3.2] §3.2 (SCM and EM loops): The structural causal map is presented as encoding generative relationships among camera, shape, texture, and illumination, but the exact mathematical form of the interventions performed by the two EM loops (e.g., how the expectation step updates the four encoders or how the maximization step enforces the prior template) is not derived or shown to be parameter-free. Without these equations, it is impossible to verify that the loops achieve the claimed disentanglement rather than implicit fitting.

minor comments (2)

The paper should include a clear notation table or diagram legend for the four encoders and the two EM loops to improve readability.
Figure captions for qualitative results should explicitly state the input image, reconstructed mesh, and any texture map shown, rather than relying on visual inspection alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for stronger quantitative evidence and explicit mathematical derivations. We address each major comment below and will revise the manuscript to incorporate additional details and results.

read point-by-point responses

Referee: [§4] §4 (Experiments): The abstract and method description assert 'high-fidelity 3D reconstruction' on ATR and Market-HQ, yet the provided manuscript excerpt supplies no quantitative metrics (e.g., Chamfer distance, normal consistency, or IoU against any baseline), ablation results on the two EM loops, or statistical significance tests. This evidence gap is load-bearing for the central claim that the SCM + EM design reliably separates the four latent variables.

Authors: We acknowledge the absence of quantitative metrics and ablations in the reviewed version. Because the approach is fully self-supervised without 3D ground-truth meshes, standard 3D metrics such as Chamfer distance cannot be computed directly against ground truth. We will add 2D-based quantitative evaluations (reprojection error, perceptual similarity scores) against baselines, include ablation studies isolating each EM loop, and report statistical significance (e.g., paired t-tests) on the ATR and Market-HQ datasets in the revision. revision: yes
Referee: [§3.2] §3.2 (SCM and EM loops): The structural causal map is presented as encoding generative relationships among camera, shape, texture, and illumination, but the exact mathematical form of the interventions performed by the two EM loops (e.g., how the expectation step updates the four encoders or how the maximization step enforces the prior template) is not derived or shown to be parameter-free. Without these equations, it is impossible to verify that the loops achieve the claimed disentanglement rather than implicit fitting.

Authors: We agree that the precise update rules and intervention mechanics of the two EM loops require explicit derivation. In the revised manuscript we will provide the full mathematical formulation: the E-step expectations over the four encoders, the M-step maximization that incorporates the template prior, and the explicit intervention operators on the SCM. We will also clarify which parameters are learned versus fixed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained by design choices

full rationale

The paper introduces an SCM as an explanatory modeling tool and embeds two EM loops to enforce disentanglement of camera/shape/texture/illumination variables. These are explicit architectural decisions that follow from the stated causal framing rather than any derivation that reduces a claimed prediction back to fitted inputs or self-citations by construction. No equations, uniqueness theorems, or parameter-renaming steps are exhibited that would make the reconstruction output tautological with the training signals. The self-supervised claim is internally consistent with the absence of 3D annotations and does not rely on load-bearing self-citations for its central mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the high-level mention of an SCM and prior template; all ledger entries are therefore empty.

pith-pipeline@v0.9.0 · 5871 in / 1186 out tokens · 17302 ms · 2026-05-24T12:08:27.883896+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 3 internal anchors

[1]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and L ´eon Bottou. Wasserstein generative adversarial networks. InICML, 2017

work page 2017
[2]

View generalization for single image textured 3d models

Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, and Bryan Catanzaro. View generalization for single image textured 3d models. In CVPR, 2021

work page 2021
[3]

Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop

Benjamin Biggs, Ollie Boyne, James Charles, Andrew Fitzgibbon, and Roberto Cipolla. Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop. In ECCV, 2020

work page 2020
[4]

Emerging applications of bedside 3d printing in plas- tic surgery

Michael P Chae, Warren M Rozen, Paul G McMenamin, Michael W Findlay, Robert T Spychal, and David J Hunter- Smith. Emerging applications of bedside 3d printing in plas- tic surgery. Frontiers in surgery, 2:25, 2015

work page 2015
[5]

Counterfactual samples synthesiz- ing for robust visual question answering

Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. Counterfactual samples synthesiz- ing for robust visual question answering. In CVPR, 2020

work page 2020
[6]

Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer

Wenzheng Chen, Jun Gao, Huan Ling, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer. In NeurIPS, 2019

work page 2019
[7]

Image search with text feedback by visiolinguistic attention learn- ing

Yanbei Chen, Shaogang Gong, and Loris Bazzani. Image search with text feedback by visiolinguistic attention learn- ing. In CVPR, 2020

work page 2020
[8]

Stylegan-human: A data-centric odyssey of human genera- tion

Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, and Ziwei Liu. Stylegan-human: A data-centric odyssey of human genera- tion. In ECCV, 2022

work page 2022
[9]

Kaolin: A pytorch library for accelerating 3d deep learning re- search

Clement Fuji Tsang, Maria Shugrina, Jean Francois Laﬂeche, Towaki Takikawa, Jiehan Wang, Charles Loop, Wenzheng Chen, Krishna Murthy Jatavallabhula, Edward Smith, Artem Rozantsev, Or Perel, Tianchang Shen, Jun Gao, Sanja Fidler, Gavriel State, Jason Gorski, Tommy Xi- ang, Jianing Li, Michael Li, and Rev Lebaredian. Kaolin: A pytorch library for accelerati...

work page 2022
[10]

3d shape induction from 2d views of multiple objects

Matheus Gadelha, Subhransu Maji, and Rui Wang. 3d shape induction from 2d views of multiple objects. In 3DV, 2017

work page 2017
[11]

Fd-gan: Pose-guided feature distilling gan for robust person re-identiﬁcation

Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, et al. Fd-gan: Pose-guided feature distilling gan for robust person re-identiﬁcation. NeurIPS, 2018

work page 2018
[12]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

3d shape temporal aggregation for video-based clothing-change person re-identiﬁcation

Ke Han, Yan Huang, Shaogang Gong, Liang Wang, and Tie- niu Tan. 3d shape temporal aggregation for video-based clothing-change person re-identiﬁcation. In ACCV, 2022

work page 2022
[14]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[15]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. NeurIPS, 30, 2017

work page 2017
[16]

Eva3d: Compositional 3d human generation from 2d image collections

Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, and Ziwei Liu. Eva3d: Compositional 3d human generation from 2d image collections. arXiv:2210.04888, 2022

work page arXiv 2022
[17]

Self-supervised 3d mesh reconstruction from single images

Tao Hu, Liwei Wang, Xiaogang Xu, Shu Liu, and Jiaya Jia. Self-supervised 3d mesh reconstruction from single images. In CVPR, 2021

work page 2021
[18]

Scops: Self-supervised co-part segmentation

Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, and Jan Kautz. Scops: Self-supervised co-part segmentation. In CVPR, 2019

work page 2019
[19]

Black, David W

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, 2018

work page 2018
[20]

Learning category-speciﬁc mesh reconstruc- tion from image collections

Angjoo Kanazawa, Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. Learning category-speciﬁc mesh reconstruc- tion from image collections. In ECCV, 2018

work page 2018
[21]

Learning view priors for single-view 3d reconstruction

Hiroharu Kato and Tatsuya Harada. Learning view priors for single-view 3d reconstruction. In CVPR, 2019

work page 2019
[22]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

Murat Kocaoglu, Christopher Snyder, Alexandros G Di- makis, and Sriram Vishwanath. Causalgan: Learning causal implicit generative models with adversarial training. arXiv:1709.02023, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Canonical surface mapping via geometric cycle consistency

Nilesh Kulkarni, Abhinav Gupta, and Shubham Tulsiani. Canonical surface mapping via geometric cycle consistency. In ICCV, 2019

work page 2019
[25]

Online adaptation for consistent mesh reconstruction in the wild

Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xi- aolong Wang, Ming-Hsuan Yang, and Jan Kautz. Online adaptation for consistent mesh reconstruction in the wild. NeurIPS, 2020

work page 2020
[26]

Self-supervised single-view 3d reconstruction via semantic consistency

Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, and Jan Kautz. Self-supervised single-view 3d reconstruction via semantic consistency. In ECCV, 2020

work page 2020
[27]

Invariant grounding for video question answering

Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, and Tat-Seng Chua. Invariant grounding for video question answering. In CVPR, 2022

work page 2022
[28]

Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015

Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015

work page 2015
[29]

End-to-end hu- man pose and mesh reconstruction with transformers

Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-end hu- man pose and mesh reconstruction with transformers. In CVPR, 2021

work page 2021
[30]

Mesh graphormer

Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. ICCV, 2021

work page 2021
[31]

An intriguing failing of convolutional neural networks and the coordconv solution

Rosanne Liu, Joel Lehman, Piero Molino, Felipe Pet- roski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution. NeurIPS, 2018

work page 2018
[32]

Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set

Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Han- qing Lu, and Shuicheng Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. 9

work page 2012
[33]

Structural causal 3d reconstruction

Weiyang Liu, Zhen Liu, Liam Paull, Adrian Weller, and Bernhard Sch¨olkopf. Structural causal 3d reconstruction. In ECCV, 2022

work page 2022
[34]

Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015

work page 2015
[35]

Macro-micro adversarial network for human parsing

Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Jun- qing Yu, and Yi Yang. Macro-micro adversarial network for human parsing. In ECCV, 2018

work page 2018
[36]

Pose guided person image genera- tion

Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuyte- laars, and Luc Van Gool. Pose guided person image genera- tion. NeurIPS, 2017

work page 2017
[37]

Generative interventions for causal learning

Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, and Carl V ondrick. Generative interventions for causal learning. In CVPR, 2021

work page 2021
[38]

The expectation-maximization algorithm

Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996

work page 1996
[39]

Counterfactual vqa: A cause- effect look at language bias

Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian- Sheng Hua, and Ji-Rong Wen. Counterfactual vqa: A cause- effect look at language bias. In CVPR, 2021

work page 2021
[40]

Two at once: Enhancing learning and generalization capacities via ibn-net

Xingang Pan, Ping Luo, Jianping Shi, and Xiaoou Tang. Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, pages 464–479, 2018

work page 2018
[41]

Pytorch: An im- perative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai- son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im- perative style, high-p...

work page 2019
[42]

Causality

Judea Pearl. Causality. Cambridge university press, 2009

work page 2009
[43]

The book of why: the new science of cause and effect

Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018

work page 2018
[44]

El- ements of causal inference: foundations and learning algo- rithms

Jonas Peters, Dominik Janzing, and Bernhard Sch ¨olkopf. El- ements of causal inference: foundations and learning algo- rithms. The MIT Press, 2017

work page 2017
[45]

Pose- normalized image generation for person re-identiﬁcation

Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. Pose- normalized image generation for person re-identiﬁcation. In ECCV, 2018

work page 2018
[46]

Fine- tuning cnn image retrieval with no human annotation

Filip Radenovi ´c, Giorgos Tolias, and Ond ˇrej Chum. Fine- tuning cnn image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence , 41(7):1655–1668, 2018

work page 2018
[47]

An interactive design for visualizable person re-identiﬁcation

Haolin Ren, Zheng Wang, Zhixiang Wang, Lixiong Chen, Shin’ichi Satoh, and Daning Hu. An interactive design for visualizable person re-identiﬁcation. In ACM MM, 2020

work page 2020
[48]

Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness

Qibing Ren, Yiting Chen, Yichuan Mo, Qitian Wu, and Junchi Yan. Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness. In SIGKDD, 2022

work page 2022
[49]

U- net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. In International Conference on Medical image com- puting and computer-assisted intervention , pages 234–241. Springer, 2015

work page 2015
[50]

Devil in the details: Towards accurate single and multiple human parsing

Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, pages 4814–4821, 2019

work page 2019
[51]

Humangan: A generative model of hu- man images

Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. Humangan: A generative model of hu- man images. In 3DV, 2021

work page 2021
[52]

Progressive domain adaptation for robot vision person re-identiﬁcation

Zijun Sha, Zelong Zeng, Zheng Wang, Yoichi Natori, Ya- suhiro Taniguchi, and Shin’ichi Satoh. Progressive domain adaptation for robot vision person re-identiﬁcation. In ACM MM, 2020

work page 2020
[53]

Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction

Keng Hua Sing and Wei Xie. Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction. In ACM CHI, 2016

work page 2016
[54]

Dissecting person re- identiﬁcation from the viewpoint of viewpoint

Xiaoxiao Sun and Liang Zheng. Dissecting person re- identiﬁcation from the viewpoint of viewpoint. In CVPR, 2019

work page 2019
[55]

Monocular, one-stage, regression of multiple 3d people

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Black Michael J., and Tao Mei. Monocular, one-stage, regression of multiple 3d people. In ICCV, 2021

work page 2021
[56]

Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline)

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline). InECCV, pages 480–496, 2018

work page 2018
[57]

Efﬁcientdet: Scalable and efﬁcient object detection

Mingxing Tan, Ruoming Pang, and Quoc V Le. Efﬁcientdet: Scalable and efﬁcient object detection. In CVPR, 2020

work page 2020
[58]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS, 2017

work page 2017
[59]

De- biasing nlu models via causal intervention and counterfactual reasoning

Bing Tian, Yixin Cao, Yong Zhang, and Chunxiao Xing. De- biasing nlu models via causal intervention and counterfactual reasoning. In Proceedings of the AAAI Conference on Artiﬁ- cial Intelligence, volume 36, pages 11376–11384, 2022

work page 2022
[60]

Im- plicit mesh reconstruction from unannotated image collec- tions

Shubham Tulsiani, Nilesh Kulkarni, and Abhinav Gupta. Im- plicit mesh reconstruction from unannotated image collec- tions. arXiv:2007.08504, 2020

work page arXiv 2007
[61]

Multi-view supervision for single-view recon- struction via differentiable ray consistency

Shubham Tulsiani, Tinghui Zhou, Alexei A Efros, and Ji- tendra Malik. Multi-view supervision for single-view recon- struction via differentiable ray consistency. In CVPR, 2017

work page 2017
[62]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Re- port CNS-TR-2011-001, California Institute of Technology, 2011

work page 2011
[63]

Deep high-resolution repre- sentation learning for visual recognition

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution repre- sentation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence , 43(10):3349– 3364, 2020

work page 2020
[64]

Pixel2mesh: Generating 3d mesh models from single rgb images

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018. 10

work page 2018
[65]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In International Conference on Com- puter Vision Workshops (ICCVW)

work page
[66]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004

work page 2004
[67]

Synthesizing counterfactual samples for effective image-text matching

Hao Wei, Shuhui Wang, Xinzhe Han, Zhe Xue, Bin Ma, Xi- aoming Wei, and Xiaolin Wei. Synthesizing counterfactual samples for effective image-text matching. InProceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 4355–4364, New York, NY , USA, 2022. Associa- tion for Computing Machinery

work page 2022
[68]

Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion

Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. In ICCV, 2019

work page 2019
[69]

Icon: implicit clothed humans obtained from normals

Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J Black. Icon: implicit clothed humans obtained from normals. In CVPR, 2022

work page 2022
[70]

3d human texture es- timation from a single image with transformers

Xiangyu Xu and Chen Change Loy. 3d human texture es- timation from a single image with transformers. In ICCV, 2021

work page 2021
[71]

Em algorithms of gaussian mixture model and hidden markov model

Guorong Xuan, Wei Zhang, and Peiqi Chai. Em algorithms of gaussian mixture model and hidden markov model. In ICIP, 2001

work page 2001
[72]

Ulip: Learning uniﬁed representation of language, image and point cloud for 3d understanding

Le Xue, Mingfei Gao, Chen Xing, Roberto Mart ´ın-Mart´ın, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. Ulip: Learning uniﬁed representation of language, image and point cloud for 3d understanding. arXiv:2212.05171, 2022

work page arXiv 2022
[73]

Causalvae: Disentangled rep- resentation learning via neural structural causal models

Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. Causalvae: Disentangled rep- resentation learning via neural structural causal models. In CVPR, 2021

work page 2021
[74]

Shelf- supervised mesh prediction in the wild

Yufei Ye, Shubham Tulsiani, and Abhinav Gupta. Shelf- supervised mesh prediction in the wild. In CVPR, 2021

work page 2021
[75]

Lite-hrnet: A lightweight high-resolution network

Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. Lite-hrnet: A lightweight high-resolution network. In CVPR, 2021

work page 2021
[76]

Causal intervention for weakly- supervised semantic segmentation

Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, and Qianru Sun. Causal intervention for weakly- supervised semantic segmentation. NeurIPS, 2020

work page 2020
[77]

Monocular 3d object recon- struction with gan inversion

Junzhe Zhang, Daxuan Ren, Zhongang Cai, Chai Kiat Yeo, Bo Dai, and Chen Change Loy. Monocular 3d object recon- struction with gan inversion. In ECCV, 2022

work page 2022
[78]

Causerec: Counterfactual user sequence synthesis for sequential recommendation

Shengyu Zhang, Dong Yao, Zhou Zhao, Tat-Seng Chua, and Fei Wu. Causerec: Counterfactual user sequence synthesis for sequential recommendation. In SIGIR, 2021

work page 2021
[79]

Scalable person re-identiﬁcation: A benchmark

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jing- dong Wang, and Qi Tian. Scalable person re-identiﬁcation: A benchmark. In ICCV, 2015

work page 2015
[80]

Parameter-efﬁcient person re-identiﬁcation in the 3d space

Zhedong Zheng, Xiaohan Wang, Nenggan Zheng, and Yi Yang. Parameter-efﬁcient person re-identiﬁcation in the 3d space. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , 2022. doi: 10.1109/TNNLS.2022. 3214834

work page doi:10.1109/tnnls.2022 2022

Showing first 80 references.

[1] [1]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and L ´eon Bottou. Wasserstein generative adversarial networks. InICML, 2017

work page 2017

[2] [2]

View generalization for single image textured 3d models

Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, and Bryan Catanzaro. View generalization for single image textured 3d models. In CVPR, 2021

work page 2021

[3] [3]

Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop

Benjamin Biggs, Ollie Boyne, James Charles, Andrew Fitzgibbon, and Roberto Cipolla. Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop. In ECCV, 2020

work page 2020

[4] [4]

Emerging applications of bedside 3d printing in plas- tic surgery

Michael P Chae, Warren M Rozen, Paul G McMenamin, Michael W Findlay, Robert T Spychal, and David J Hunter- Smith. Emerging applications of bedside 3d printing in plas- tic surgery. Frontiers in surgery, 2:25, 2015

work page 2015

[5] [5]

Counterfactual samples synthesiz- ing for robust visual question answering

Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. Counterfactual samples synthesiz- ing for robust visual question answering. In CVPR, 2020

work page 2020

[6] [6]

Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer

Wenzheng Chen, Jun Gao, Huan Ling, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. Learn- ing to predict 3d objects with an interpolation-based differ- entiable renderer. In NeurIPS, 2019

work page 2019

[7] [7]

Image search with text feedback by visiolinguistic attention learn- ing

Yanbei Chen, Shaogang Gong, and Loris Bazzani. Image search with text feedback by visiolinguistic attention learn- ing. In CVPR, 2020

work page 2020

[8] [8]

Stylegan-human: A data-centric odyssey of human genera- tion

Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, and Ziwei Liu. Stylegan-human: A data-centric odyssey of human genera- tion. In ECCV, 2022

work page 2022

[9] [9]

Kaolin: A pytorch library for accelerating 3d deep learning re- search

Clement Fuji Tsang, Maria Shugrina, Jean Francois Laﬂeche, Towaki Takikawa, Jiehan Wang, Charles Loop, Wenzheng Chen, Krishna Murthy Jatavallabhula, Edward Smith, Artem Rozantsev, Or Perel, Tianchang Shen, Jun Gao, Sanja Fidler, Gavriel State, Jason Gorski, Tommy Xi- ang, Jianing Li, Michael Li, and Rev Lebaredian. Kaolin: A pytorch library for accelerati...

work page 2022

[10] [10]

3d shape induction from 2d views of multiple objects

Matheus Gadelha, Subhransu Maji, and Rui Wang. 3d shape induction from 2d views of multiple objects. In 3DV, 2017

work page 2017

[11] [11]

Fd-gan: Pose-guided feature distilling gan for robust person re-identiﬁcation

Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, et al. Fd-gan: Pose-guided feature distilling gan for robust person re-identiﬁcation. NeurIPS, 2018

work page 2018

[12] [12]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

3d shape temporal aggregation for video-based clothing-change person re-identiﬁcation

Ke Han, Yan Huang, Shaogang Gong, Liang Wang, and Tie- niu Tan. 3d shape temporal aggregation for video-based clothing-change person re-identiﬁcation. In ACCV, 2022

work page 2022

[14] [14]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[15] [15]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. NeurIPS, 30, 2017

work page 2017

[16] [16]

Eva3d: Compositional 3d human generation from 2d image collections

Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, and Ziwei Liu. Eva3d: Compositional 3d human generation from 2d image collections. arXiv:2210.04888, 2022

work page arXiv 2022

[17] [17]

Self-supervised 3d mesh reconstruction from single images

Tao Hu, Liwei Wang, Xiaogang Xu, Shu Liu, and Jiaya Jia. Self-supervised 3d mesh reconstruction from single images. In CVPR, 2021

work page 2021

[18] [18]

Scops: Self-supervised co-part segmentation

Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, and Jan Kautz. Scops: Self-supervised co-part segmentation. In CVPR, 2019

work page 2019

[19] [19]

Black, David W

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, 2018

work page 2018

[20] [20]

Learning category-speciﬁc mesh reconstruc- tion from image collections

Angjoo Kanazawa, Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. Learning category-speciﬁc mesh reconstruc- tion from image collections. In ECCV, 2018

work page 2018

[21] [21]

Learning view priors for single-view 3d reconstruction

Hiroharu Kato and Tatsuya Harada. Learning view priors for single-view 3d reconstruction. In CVPR, 2019

work page 2019

[22] [22]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[23] [23]

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

Murat Kocaoglu, Christopher Snyder, Alexandros G Di- makis, and Sriram Vishwanath. Causalgan: Learning causal implicit generative models with adversarial training. arXiv:1709.02023, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

Canonical surface mapping via geometric cycle consistency

Nilesh Kulkarni, Abhinav Gupta, and Shubham Tulsiani. Canonical surface mapping via geometric cycle consistency. In ICCV, 2019

work page 2019

[25] [25]

Online adaptation for consistent mesh reconstruction in the wild

Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xi- aolong Wang, Ming-Hsuan Yang, and Jan Kautz. Online adaptation for consistent mesh reconstruction in the wild. NeurIPS, 2020

work page 2020

[26] [26]

Self-supervised single-view 3d reconstruction via semantic consistency

Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, and Jan Kautz. Self-supervised single-view 3d reconstruction via semantic consistency. In ECCV, 2020

work page 2020

[27] [27]

Invariant grounding for video question answering

Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, and Tat-Seng Chua. Invariant grounding for video question answering. In CVPR, 2022

work page 2022

[28] [28]

Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015

Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. Deep human parsing with active template regression.Pattern Analysis and Machine Intelligence, IEEE Transactions on , 37(12):2402– 2414, Dec 2015

work page 2015

[29] [29]

End-to-end hu- man pose and mesh reconstruction with transformers

Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-end hu- man pose and mesh reconstruction with transformers. In CVPR, 2021

work page 2021

[30] [30]

Mesh graphormer

Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. ICCV, 2021

work page 2021

[31] [31]

An intriguing failing of convolutional neural networks and the coordconv solution

Rosanne Liu, Joel Lehman, Piero Molino, Felipe Pet- roski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution. NeurIPS, 2018

work page 2018

[32] [32]

Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set

Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Han- qing Lu, and Shuicheng Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. 9

work page 2012

[33] [33]

Structural causal 3d reconstruction

Weiyang Liu, Zhen Liu, Liam Paull, Adrian Weller, and Bernhard Sch¨olkopf. Structural causal 3d reconstruction. In ECCV, 2022

work page 2022

[34] [34]

Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015

work page 2015

[35] [35]

Macro-micro adversarial network for human parsing

Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Jun- qing Yu, and Yi Yang. Macro-micro adversarial network for human parsing. In ECCV, 2018

work page 2018

[36] [36]

Pose guided person image genera- tion

Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuyte- laars, and Luc Van Gool. Pose guided person image genera- tion. NeurIPS, 2017

work page 2017

[37] [37]

Generative interventions for causal learning

Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, and Carl V ondrick. Generative interventions for causal learning. In CVPR, 2021

work page 2021

[38] [38]

The expectation-maximization algorithm

Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996

work page 1996

[39] [39]

Counterfactual vqa: A cause- effect look at language bias

Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian- Sheng Hua, and Ji-Rong Wen. Counterfactual vqa: A cause- effect look at language bias. In CVPR, 2021

work page 2021

[40] [40]

Two at once: Enhancing learning and generalization capacities via ibn-net

Xingang Pan, Ping Luo, Jianping Shi, and Xiaoou Tang. Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, pages 464–479, 2018

work page 2018

[41] [41]

Pytorch: An im- perative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai- son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An im- perative style, high-p...

work page 2019

[42] [42]

Causality

Judea Pearl. Causality. Cambridge university press, 2009

work page 2009

[43] [43]

The book of why: the new science of cause and effect

Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018

work page 2018

[44] [44]

El- ements of causal inference: foundations and learning algo- rithms

Jonas Peters, Dominik Janzing, and Bernhard Sch ¨olkopf. El- ements of causal inference: foundations and learning algo- rithms. The MIT Press, 2017

work page 2017

[45] [45]

Pose- normalized image generation for person re-identiﬁcation

Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. Pose- normalized image generation for person re-identiﬁcation. In ECCV, 2018

work page 2018

[46] [46]

Fine- tuning cnn image retrieval with no human annotation

Filip Radenovi ´c, Giorgos Tolias, and Ond ˇrej Chum. Fine- tuning cnn image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence , 41(7):1655–1668, 2018

work page 2018

[47] [47]

An interactive design for visualizable person re-identiﬁcation

Haolin Ren, Zheng Wang, Zhixiang Wang, Lixiong Chen, Shin’ichi Satoh, and Daning Hu. An interactive design for visualizable person re-identiﬁcation. In ACM MM, 2020

work page 2020

[48] [48]

Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness

Qibing Ren, Yiting Chen, Yichuan Mo, Qitian Wu, and Junchi Yan. Dice: Domain-attack invariant causal learning for improved data privacy protection and adversarial robust- ness. In SIGKDD, 2022

work page 2022

[49] [49]

U- net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. In International Conference on Medical image com- puting and computer-assisted intervention , pages 234–241. Springer, 2015

work page 2015

[50] [50]

Devil in the details: Towards accurate single and multiple human parsing

Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, pages 4814–4821, 2019

work page 2019

[51] [51]

Humangan: A generative model of hu- man images

Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. Humangan: A generative model of hu- man images. In 3DV, 2021

work page 2021

[52] [52]

Progressive domain adaptation for robot vision person re-identiﬁcation

Zijun Sha, Zelong Zeng, Zheng Wang, Yoichi Natori, Ya- suhiro Taniguchi, and Shin’ichi Satoh. Progressive domain adaptation for robot vision person re-identiﬁcation. In ACM MM, 2020

work page 2020

[53] [53]

Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction

Keng Hua Sing and Wei Xie. Garden: A mixed reality ex- perience combining virtual reality and 3d reconstruction. In ACM CHI, 2016

work page 2016

[54] [54]

Dissecting person re- identiﬁcation from the viewpoint of viewpoint

Xiaoxiao Sun and Liang Zheng. Dissecting person re- identiﬁcation from the viewpoint of viewpoint. In CVPR, 2019

work page 2019

[55] [55]

Monocular, one-stage, regression of multiple 3d people

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Black Michael J., and Tao Mei. Monocular, one-stage, regression of multiple 3d people. In ICCV, 2021

work page 2021

[56] [56]

Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline)

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with reﬁned part pooling (and a strong convolutional baseline). InECCV, pages 480–496, 2018

work page 2018

[57] [57]

Efﬁcientdet: Scalable and efﬁcient object detection

Mingxing Tan, Ruoming Pang, and Quoc V Le. Efﬁcientdet: Scalable and efﬁcient object detection. In CVPR, 2020

work page 2020

[58] [58]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS, 2017

work page 2017

[59] [59]

De- biasing nlu models via causal intervention and counterfactual reasoning

Bing Tian, Yixin Cao, Yong Zhang, and Chunxiao Xing. De- biasing nlu models via causal intervention and counterfactual reasoning. In Proceedings of the AAAI Conference on Artiﬁ- cial Intelligence, volume 36, pages 11376–11384, 2022

work page 2022

[60] [60]

Im- plicit mesh reconstruction from unannotated image collec- tions

Shubham Tulsiani, Nilesh Kulkarni, and Abhinav Gupta. Im- plicit mesh reconstruction from unannotated image collec- tions. arXiv:2007.08504, 2020

work page arXiv 2007

[61] [61]

Multi-view supervision for single-view recon- struction via differentiable ray consistency

Shubham Tulsiani, Tinghui Zhou, Alexei A Efros, and Ji- tendra Malik. Multi-view supervision for single-view recon- struction via differentiable ray consistency. In CVPR, 2017

work page 2017

[62] [62]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Re- port CNS-TR-2011-001, California Institute of Technology, 2011

work page 2011

[63] [63]

Deep high-resolution repre- sentation learning for visual recognition

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. Deep high-resolution repre- sentation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence , 43(10):3349– 3364, 2020

work page 2020

[64] [64]

Pixel2mesh: Generating 3d mesh models from single rgb images

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018. 10

work page 2018

[65] [65]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In International Conference on Com- puter Vision Workshops (ICCVW)

work page

[66] [66]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004

work page 2004

[67] [67]

Synthesizing counterfactual samples for effective image-text matching

Hao Wei, Shuhui Wang, Xinzhe Han, Zhe Xue, Bin Ma, Xi- aoming Wei, and Xiaolin Wei. Synthesizing counterfactual samples for effective image-text matching. InProceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 4355–4364, New York, NY , USA, 2022. Associa- tion for Computing Machinery

work page 2022

[68] [68]

Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion

Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. In ICCV, 2019

work page 2019

[69] [69]

Icon: implicit clothed humans obtained from normals

Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J Black. Icon: implicit clothed humans obtained from normals. In CVPR, 2022

work page 2022

[70] [70]

3d human texture es- timation from a single image with transformers

Xiangyu Xu and Chen Change Loy. 3d human texture es- timation from a single image with transformers. In ICCV, 2021

work page 2021

[71] [71]

Em algorithms of gaussian mixture model and hidden markov model

Guorong Xuan, Wei Zhang, and Peiqi Chai. Em algorithms of gaussian mixture model and hidden markov model. In ICIP, 2001

work page 2001

[72] [72]

Ulip: Learning uniﬁed representation of language, image and point cloud for 3d understanding

Le Xue, Mingfei Gao, Chen Xing, Roberto Mart ´ın-Mart´ın, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. Ulip: Learning uniﬁed representation of language, image and point cloud for 3d understanding. arXiv:2212.05171, 2022

work page arXiv 2022

[73] [73]

Causalvae: Disentangled rep- resentation learning via neural structural causal models

Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang. Causalvae: Disentangled rep- resentation learning via neural structural causal models. In CVPR, 2021

work page 2021

[74] [74]

Shelf- supervised mesh prediction in the wild

Yufei Ye, Shubham Tulsiani, and Abhinav Gupta. Shelf- supervised mesh prediction in the wild. In CVPR, 2021

work page 2021

[75] [75]

Lite-hrnet: A lightweight high-resolution network

Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, and Jingdong Wang. Lite-hrnet: A lightweight high-resolution network. In CVPR, 2021

work page 2021

[76] [76]

Causal intervention for weakly- supervised semantic segmentation

Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, and Qianru Sun. Causal intervention for weakly- supervised semantic segmentation. NeurIPS, 2020

work page 2020

[77] [77]

Monocular 3d object recon- struction with gan inversion

Junzhe Zhang, Daxuan Ren, Zhongang Cai, Chai Kiat Yeo, Bo Dai, and Chen Change Loy. Monocular 3d object recon- struction with gan inversion. In ECCV, 2022

work page 2022

[78] [78]

Causerec: Counterfactual user sequence synthesis for sequential recommendation

Shengyu Zhang, Dong Yao, Zhou Zhao, Tat-Seng Chua, and Fei Wu. Causerec: Counterfactual user sequence synthesis for sequential recommendation. In SIGIR, 2021

work page 2021

[79] [79]

Scalable person re-identiﬁcation: A benchmark

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jing- dong Wang, and Qi Tian. Scalable person re-identiﬁcation: A benchmark. In ICCV, 2015

work page 2015

[80] [80]

Parameter-efﬁcient person re-identiﬁcation in the 3d space

Zhedong Zheng, Xiaohan Wang, Nenggan Zheng, and Yi Yang. Parameter-efﬁcient person re-identiﬁcation in the 3d space. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) , 2022. doi: 10.1109/TNNLS.2022. 3214834

work page doi:10.1109/tnnls.2022 2022