Recognition: unknown
UDAPose: Unsupervised Domain Adaptation for Low-Light Human Pose Estimation
Pith reviewed 2026-05-10 14:59 UTC · model grok-4.3
The pith
UDAPose adapts human pose estimation to low light without annotations by creating realistic synthetic images and dynamically using pose priors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that combining a Direct-Current-based High-Pass Filter and Low-light Characteristics Injection Module for realistic low-light image synthesis with a Dynamic Control of Attention module in the transformer allows effective unsupervised adaptation of pose estimators from normal to low-light domains.
What carries the argument
The key machinery consists of the DHF and LCIM for detail injection in image synthesis and the DCA module for adaptive cue-prior balancing in attention layers.
If this is right
- Improved generalization to real low-light scenes compared to previous synthesis methods.
- Enhanced robustness in the transformer by reducing dependence on degraded image cues.
- Demonstrated effectiveness through superior results on hard low-light test sets and cross-dataset scenarios.
- A practical way to leverage abundant well-lit annotations for challenging lighting conditions.
Where Pith is reading between the lines
- The approach highlights the importance of high-frequency characteristics in low-light domain adaptation, which might apply to other image degradations.
- Dynamic attention balancing could be tested in other pose or detection tasks under varying conditions.
- If the modules are modular, they might integrate into existing pose estimators to extend their usability.
Load-bearing premise
The assumption that the synthesized low-light images are realistic enough to train a model that performs well on actual low-light scenes without picking up on synthesis artifacts.
What would settle it
A direct comparison on real low-light pose estimation benchmarks where the UDAPose model shows no advantage or performs worse than baselines trained with simpler domain adaptation techniques would falsify the central claim.
Figures
read the original abstract
Low-visibility scenarios, such as low-light conditions, pose significant challenges to human pose estimation due to the scarcity of annotated low-light datasets and the loss of visual information under poor illumination. Recent domain adaptation techniques attempt to utilize well-lit labels by augmenting well-lit images to mimic low-light conditions. But handcrafted augmentations oversimplify noise patterns, while learning-based methods often fail to preserve high-frequency low-light characteristics, producing unrealistic images that lead pose models to generalize poorly to real low-light scenes. Moreover, recent pose estimators rely on image cues through image-to-keypoint cross-attention, but these cues become unreliable under low-light conditions. To address these issues, we propose Unsupervised Domain Adaptation for Pose Estimation (UDAPose), a novel framework that synthesizes low-light images and dynamically fuses visual cues with pose priors for improved pose estimation. Specifically, our synthesis method incorporates a Direct-Current-based High-Pass Filter (DHF) and a Low-light Characteristics Injection Module (LCIM) to inject high-frequency details from input low-light images, overcoming rigidity or the detail loss in existing approaches. Furthermore, we introduce a Dynamic Control of Attention (DCA) module that adaptively balances image cues with learned pose priors in the Transformer architecture. Experiments show that UDAPose outperforms state-of-the-art methods, with notable AP gains of 10.1 (56.4%) on the ExLPose-test hard set (LL-H) and 7.4 (31.4%) in cross-dataset validation on EHPT-XC. Code: https://github.com/Vision-and-Multimodal-Intelligence-Lab/UDAPose
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces UDAPose, an unsupervised domain adaptation framework for human pose estimation under low-light conditions. It proposes a synthesis pipeline consisting of a Direct-Current-based High-Pass Filter (DHF) and Low-light Characteristics Injection Module (LCIM) to generate training images that inject high-frequency details from real low-light inputs into well-lit source data, together with a Dynamic Control of Attention (DCA) module that adaptively balances unreliable image cues against learned pose priors inside a Transformer backbone. Experiments report substantial gains over prior methods, including +10.1 AP (56.4%) on the ExLPose-test hard set (LL-H) and +7.4 AP (31.4%) in cross-dataset evaluation on EHPT-XC.
Significance. If the synthesis modules demonstrably close the domain gap and the DCA module reliably mitigates cue unreliability, the work would constitute a concrete advance for pose estimation in low-visibility settings, with clear downstream relevance to surveillance, robotics, and autonomous systems. The reported absolute gains on public benchmarks are large enough to be practically interesting, and the public code release supports reproducibility.
major comments (3)
- [Section 3.2] Synthesis pipeline (Section 3.2): The central claim that DHF+LCIM overcomes the rigidity and detail-loss problems of prior handcrafted and learning-based augmentations rests on the unverified assumption that the generated images match the high-frequency noise, illumination statistics, and detail loss of real low-light test data. No quantitative distribution-matching evidence (FID, MMD, or perceptual metrics) or statistical comparison against real low-light images is provided, leaving open the possibility that the 10.1 AP gain on LL-H arises from exploitation of synthetic artifacts rather than genuine domain adaptation.
- [Section 4] Experiments and ablations (Section 4): The headline cross-dataset result (+7.4 AP on EHPT-XC) and the per-module contributions of DHF, LCIM, and DCA are not isolated by controlled ablations that hold the backbone, training schedule, and data volume fixed. Without such tables or error analysis contrasting failure modes on real versus synthesized images, it remains unclear whether the reported outperformance is load-bearing on the proposed components.
- [Section 3.3] DCA module (Section 3.3): The dynamic balancing of image-to-keypoint cross-attention against pose priors is described at a high level, yet the manuscript supplies neither attention-weight visualizations nor quantitative analysis of how the control mechanism behaves when image cues degrade, making it difficult to verify that DCA is the operative factor behind improved generalization on the hard low-light subset.
minor comments (2)
- [Abstract] The abstract states the percentage gains but does not report the absolute baseline AP values; adding these numbers would make the magnitude of improvement immediately interpretable.
- [Figures] Figure captions and axis labels in the qualitative results could be expanded to indicate whether examples are drawn from the hard or easy subsets of the test sets.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below. Where the comments identify gaps in evidence or analysis, we have revised the manuscript to incorporate additional results and clarifications.
read point-by-point responses
-
Referee: [Section 3.2] Synthesis pipeline (Section 3.2): The central claim that DHF+LCIM overcomes the rigidity and detail-loss problems of prior handcrafted and learning-based augmentations rests on the unverified assumption that the generated images match the high-frequency noise, illumination statistics, and detail loss of real low-light test data. No quantitative distribution-matching evidence (FID, MMD, or perceptual metrics) or statistical comparison against real low-light images is provided, leaving open the possibility that the 10.1 AP gain on LL-H arises from exploitation of synthetic artifacts rather than genuine domain adaptation.
Authors: We acknowledge that the original manuscript did not include quantitative distribution-matching metrics such as FID, MMD, or LPIPS to directly compare the synthesized images against real low-light data. While the substantial gains on real test sets (LL-H and EHPT-XC) provide indirect evidence of effective adaptation, we agree that explicit metrics would strengthen the claim. In the revised manuscript, we will add FID, MMD, and LPIPS scores computed between our DHF+LCIM outputs and real low-light images, along with additional visual comparisons and statistical summaries of high-frequency content and illumination statistics. These additions will help rule out reliance on synthetic artifacts. revision: yes
-
Referee: [Section 4] Experiments and ablations (Section 4): The headline cross-dataset result (+7.4 AP on EHPT-XC) and the per-module contributions of DHF, LCIM, and DCA are not isolated by controlled ablations that hold the backbone, training schedule, and data volume fixed. Without such tables or error analysis contrasting failure modes on real versus synthesized images, it remains unclear whether the reported outperformance is load-bearing on the proposed components.
Authors: We recognize the value of more tightly controlled ablations. The original experiments include module-level ablations, but they do not explicitly fix every variable as suggested. In the revision, we will add new tables that hold the backbone architecture, training schedule, optimizer, and total data volume constant while isolating DHF, LCIM, and DCA (individually and in combinations). We will also include a failure-mode analysis section that contrasts error patterns on real low-light test images versus synthesized images to demonstrate the specific contributions of each component to the reported gains. revision: yes
-
Referee: [Section 3.3] DCA module (Section 3.3): The dynamic balancing of image-to-keypoint cross-attention against pose priors is described at a high level, yet the manuscript supplies neither attention-weight visualizations nor quantitative analysis of how the control mechanism behaves when image cues degrade, making it difficult to verify that DCA is the operative factor behind improved generalization on the hard low-light subset.
Authors: We agree that interpretability evidence for DCA would help confirm its role. The revised manuscript will include attention-weight visualizations showing the relative contributions of image-to-keypoint cross-attention and pose-prior attention across varying illumination levels. In addition, we will provide quantitative plots and statistics correlating the learned control weights with image-quality indicators (such as local contrast and estimated noise) to demonstrate that DCA increasingly favors pose priors as image cues degrade, particularly on the hard low-light subset. revision: yes
Circularity Check
No circularity: method and gains rest on external benchmarks and independent synthesis assumptions
full rationale
The paper introduces DHF, LCIM, and DCA modules for low-light synthesis and attention control, then reports AP gains on public external test sets (ExLPose-test hard, EHPT-XC). No equations, fitted parameters, or self-citations are shown that reduce the claimed predictions or uniqueness to the inputs by construction. The derivation chain (synthesis → training → evaluation) remains self-contained against external benchmarks with no self-referential tautologies or load-bearing internal citations.
Axiom & Free-Parameter Ledger
free parameters (1)
- DCA balance hyperparameters
axioms (2)
- domain assumption High-frequency details extracted from real low-light images can be injected to create training data that matches the target domain distribution
- domain assumption Pose priors remain reliable when image cues degrade under low illumination
invented entities (3)
-
Direct-Current-based High-Pass Filter (DHF)
no independent evidence
-
Low-light Characteristics Injection Module (LCIM)
no independent evidence
-
Dynamic Control of Attention (DCA)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Domain-adaptive 2D human pose estimation via dual teachers in extremely low-light conditions
Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, and Robby T Tan. Domain-adaptive 2D human pose estimation via dual teachers in extremely low-light conditions. InEuro- pean Conference on Computer Vision, pages 221–239, 2024. 2, 3, 6, 7, 8
2024
-
[2]
Rethinking the paradigm of content constraints in un- paired image-to-image translation
Xiuding Cai, Yaoyao Zhu, Dong Miao, Linjie Fu, and Yu Yao. Rethinking the paradigm of content constraints in un- paired image-to-image translation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 891–899,
-
[3]
Retinexformer: One-stage Retinex- based transformer for low-light image enhancement
Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Tim- ofte, and Yulun Zhang. Retinexformer: One-stage Retinex- based transformer for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12504–12513, 2023. 2, 3, 6, 7
2023
-
[4]
Cross-domain adaptation for animal pose estimation
Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9498– 9507, 2019. 3
2019
-
[5]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean Confer- ence on Computer Vision, pages 213–229, 2020. 2, 5
2020
-
[6]
Contextual and variational contrast enhancement.IEEE Transactions on Image Process- ing, 20(12):3431–3441, 2011
Turgay Celik and Tardi Tjahjadi. Contextual and variational contrast enhancement.IEEE Transactions on Image Process- ing, 20(12):3431–3441, 2011. 3
2011
-
[7]
A simple and effective his- togram equalization approach to image enhancement.Digi- tal Signal Processing, 14(2):158–170, 2004
Heng-Da Cheng and XJ Shi. A simple and effective his- togram equalization approach to image enhancement.Digi- tal Signal Processing, 14(2):158–170, 2004. 3
2004
-
[8]
A benchmark dataset for event-guided human pose estimation and tracking in extreme conditions
Hoonhee Cho, Taewoo Kim, Yuhwan Jeong, and Kuk-Jin Yoon. A benchmark dataset for event-guided human pose estimation and tracking in extreme conditions. InAdvances in Neural Information Processing Systems, pages 134826– 134840, 2024. 2, 3, 6, 7, 1
2024
-
[9]
Style injec- tion in diffusion: A training-free approach for adapting large- scale diffusion models for style transfer
Jiwoo Chung, Sangeek Hyun, and Jae-Pil Heo. Style injec- tion in diffusion: A training-free approach for adapting large- scale diffusion models for style transfer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8795–8805, 2024. 2, 3, 8, 6, 9, 10, 11
2024
-
[10]
Where are we with human pose estimation in real- world surveillance? InIEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pages 591–601,
Mickael Cormier, Aris Clepe, Andreas Specker, and J ¨urgen Beyerer. Where are we with human pose estimation in real- world surveillance? InIEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pages 591–601,
-
[11]
ImageNet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pages 248–255, 2009. 6, 2, 4
2009
-
[12]
A mathemati- cal framework for transformer circuits.Transformer Circuits Thread, 2021.https://transformer-circuits
Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Ka- mal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah....
2021
-
[13]
DarkIR: Robust low-light image restoration
Daniel Feijoo, Juan C Benito, Alvaro Garcia, and Marcos V Conde. DarkIR: Robust low-light image restoration. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10879–10889, 2025. 2, 3, 6, 7
2025
-
[14]
Bottom-up human pose estimation via disentan- gled keypoint regression
Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, and Jing- dong Wang. Bottom-up human pose estimation via disentan- gled keypoint regression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14676–14686, 2021. 3, 4
2021
-
[15]
Learning transferable parameters for unsupervised domain adaptation
Zhongyi Han, Haoliang Sun, and Yilong Yin. Learning transferable parameters for unsupervised domain adaptation. IEEE Transactions on Image Processing, 31:6424–6439,
-
[16]
Squeeze-and-excitation networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8):2011– 2023, 2020
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. Squeeze-and-excitation networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8):2011– 2023, 2020. 5
2011
-
[17]
Arbitrary style transfer in real-time with adaptive instance normalization
Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InProceed- ings of the IEEE International Conference on Computer Vi- sion, pages 1501–1510, 2017. 3
2017
-
[18]
En- hancing hurdles athletes’ performance analysis: A compara- tive study of CNN-based pose estimation frameworks.Mul- timedia Tools and Applications, 84(28):34573–34591, 2025
Pouya Jafarzadeh, Luca Zelioli, Petra Virjonen, Fahimeh Farahnakian, Paavo Nevalainen, and Jukka Heikkonen. En- hancing hurdles athletes’ performance analysis: A compara- tive study of CNN-based pose estimation frameworks.Mul- timedia Tools and Applications, 84(28):34573–34591, 2025. 1
2025
-
[19]
LightenDiffusion: Unsupervised low-light image enhancement with latent-retinex diffusion models
Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, and Shuaicheng Liu. LightenDiffusion: Unsupervised low-light image enhancement with latent-retinex diffusion models. In European Conference on Computer Vision, pages 161–179,
-
[20]
Regressive domain adapta- tion for unsupervised keypoint detection
Junguang Jiang, Yifei Ji, Ximei Wang, Yufeng Liu, Jianmin Wang, and Mingsheng Long. Regressive domain adapta- tion for unsupervised keypoint detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6780–6789, 2021. 3
2021
-
[21]
EnlightenGAN: Deep light enhancement without paired supervision.IEEE Transactions on Image Process- ing, 30:2340–2349, 2021
Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang. EnlightenGAN: Deep light enhancement without paired supervision.IEEE Transactions on Image Process- ing, 30:2340–2349, 2021. 3
2021
-
[22]
Multi- branch adversarial regression for domain adaptative hand 9 pose estimation.IEEE Transactions on Circuits and Systems for Video Technology, 32(9):6125–6136, 2022
Rui Jin, Jing Zhang, Jianyu Yang, and Dacheng Tao. Multi- branch adversarial regression for domain adaptative hand 9 pose estimation.IEEE Transactions on Circuits and Systems for Video Technology, 32(9):6125–6136, 2022. 3
2022
-
[23]
Unsupervised night image enhancement: When layer decomposition meets light-effects suppression
Yeying Jin, Wenhan Yang, and Robby T Tan. Unsupervised night image enhancement: When layer decomposition meets light-effects suppression. InEuropean Conference on Com- puter Vision, pages 404–421, 2022. 3
2022
-
[24]
Clustered pose and nonlinear appearance models for human pose estimation
Sam Johnson and Mark Everingham. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference, pages 12.1–12.11, 2010. 3
2010
-
[25]
Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, and Shunsuke Saito
Rawal Khirodkar, Timur M. Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, and Shunsuke Saito. Sapiens: Foundation for human vision models. InEuropean Conference on Computer Vision, pages 206–228, 2024. 3
2024
-
[26]
Unpaired image-to-image translation via neural schr ¨odinger bridge
Beomsu Kim, Gihyun Kwon, Kwanyoung Kim, and Jong Chul Ye. Unpaired image-to-image translation via neural schr ¨odinger bridge. InInternational Conference on Learning Representations, 2024. 1, 2, 3, 6, 7, 8, 9, 10, 11
2024
-
[27]
A unified framework for domain adaptive pose estimation
Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, and Stan Sclaroff. A unified framework for domain adaptive pose estimation. InEuropean Conference on Com- puter Vision, pages 603–620, 2022. 2, 3, 6, 7
2022
-
[28]
Peri- ln: Revisiting normalization layer in the transformer archi- tecture
Jeonghoon Kim, Byeongchan Lee, Cheonbok Park, Yeon- taek Oh, Beomjun Kim, Taehwan Yoo, Seongjin Shin, Dongyoon Han, Jinwoo Shin, and Kang Min Yoo. Peri- ln: Revisiting normalization layer in the transformer archi- tecture. InInternational Conference on Machine Learning, pages 30400–30436, 2025. 5
2025
-
[29]
Adam: A method for stochastic optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations, 2015. 6, 2
2015
-
[30]
Human pose estimation in extremely low-light condi- tions
Sohyun Lee, Jaesung Rim, Boseung Jeong, Geonu Kim, ByungJu Woo, Haechan Lee, Sunghyun Cho, and Suha Kwak. Human pose estimation in extremely low-light condi- tions. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 704–714, 2023. 2, 3, 6, 8, 1
2023
-
[31]
From synthetic to real: Unsu- pervised domain adaptation for animal pose estimation
Chen Li and Gim Hee Lee. From synthetic to real: Unsu- pervised domain adaptation for animal pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1482–1491, 2021. 3
2021
-
[32]
Learning to enhance low-light image via zero-reference deep curve esti- mation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4225–4238, 2022
Chongyi Li, Chunle Guo, and Chen Change Loy. Learning to enhance low-light image via zero-reference deep curve esti- mation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4225–4238, 2022. 2
2022
-
[33]
DN-DETR: Accelerate DETR training by in- troducing query denoising
Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M Ni, and Lei Zhang. DN-DETR: Accelerate DETR training by in- troducing query denoising. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022. 2
2022
-
[34]
CrowdPose: Efficient crowded scenes pose estimation and a new benchmark
Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. CrowdPose: Efficient crowded scenes pose estimation and a new benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10863–10872, 2019. 1, 3, 8
2019
-
[35]
Structure-revealing low-light image en- hancement via robust Retinex model.IEEE Transactions on Image Processing, 27(6):2828–2841, 2018
Mading Li, Jiaying Liu, Wenhan Yang, Xiaoyan Sun, and Zongming Guo. Structure-revealing low-light image en- hancement via robust Retinex model.IEEE Transactions on Image Processing, 27(6):2828–2841, 2018. 3
2018
-
[36]
Microsoft COCO: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, pages 740–755,
-
[37]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017. 1
2017
-
[38]
Group pose: A sim- ple baseline for end-to-end multi-person pose estimation
Huan Liu, Qiang Chen, Zichang Tan, Jiang-Jiang Liu, Jian Wang, Xiangbo Su, Xiaolong Li, Kun Yao, Junyu Han, Errui Ding, Yao Zhao, and Jingdong Wang. Group pose: A sim- ple baseline for end-to-end multi-person pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15029–15038, 2023. 1, 2, 3, 5
2023
-
[39]
Unsupervised image-to-image translation networks
Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. InAdvances in Neural Information Processing Systems, pages 700–708, 2017. 6, 7, 2, 8, 9, 10, 11
2017
-
[40]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. 6, 2
2021
-
[41]
Decoupled weight de- cay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019. 2
2019
-
[42]
Rethinking the heatmap regres- sion for bottom-up human pose estimation
Zhengxiong Luo, Zhicheng Wang, Yan Huang, Liang Wang, Tieniu Tan, and Erjin Zhou. Rethinking the heatmap regres- sion for bottom-up human pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13264–13273, 2021. 1, 3
2021
-
[43]
FCPose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions
Weian Mao, Zhi Tian, Xinlong Wang, and Chunhua Shen. FCPose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 9034–9043, 2021. 3
2021
-
[44]
Poseur: Di- rect human pose regression with transformers
Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, and Anton van den Hengel. Poseur: Di- rect human pose regression with transformers. InEuropean Conference on Computer Vision, pages 72–88, 2022. 1, 3
2022
-
[45]
Pose estimation for augmented reality: A hands-on survey
Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. Pose estimation for augmented reality: A hands-on survey. IEEE Transactions on Visualization and Computer Graph- ics, 22(12):2633–2651, 2016. 1
2016
-
[46]
DeepLPF: Deep local para- metric filters for image enhancement
Sean Moran, Pierre Marza, Steven McDonagh, Sarah Parisot, and Gregory Slabaugh. DeepLPF: Deep local para- metric filters for image enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12826–12835, 2020. 3
2020
-
[47]
Learning from synthetic animals
Jiteng Mu, Weichao Qiu, Gregory D Hager, and Alan L Yuille. Learning from synthetic animals. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12386–12395, 2020. 3 10
2020
-
[48]
FSD collisions in reduced roadway visibility conditions.https://www.nhtsa.gov/?nhtsaId= PE24031, 2024
NHTSA. FSD collisions in reduced roadway visibility conditions.https://www.nhtsa.gov/?nhtsaId= PE24031, 2024. [Accessed 31-01-2025]. 2
2024
-
[49]
Source-free do- main adaptive human pose estimation
Qucheng Peng, Ce Zheng, and Chen Chen. Source-free do- main adaptive human pose estimation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4826–4836, 2023. 3
2023
-
[50]
ProbPose: A probabilis- tic approach to 2D human pose estimation
Miroslav Purkrabek and Jiri Matas. ProbPose: A probabilis- tic approach to 2D human pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27124–27133, 2025. 1, 3
2025
-
[51]
Prior-guided source-free domain adaptation for human pose estimation
Dripta S Raychaudhuri, Calvin-Khang Ta, Arindam Dutta, Rohit Lal, and Amit K Roy-Chowdhury. Prior-guided source-free domain adaptation for human pose estimation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14996–15006, 2023. 3
2023
-
[52]
Generalized in- tersection over union: A metric and a loss for bounding box regression
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized in- tersection over union: A metric and a loss for bounding box regression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 658–666,
-
[53]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10674–10685, 2022. 2
2022
-
[54]
Nighttime visibility en- hancement by increasing the dynamic range and suppression of light effects
Aashish Sharma and Robby T Tan. Nighttime visibility en- hancement by increasing the dynamic range and suppression of light effects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11977– 11986, 2021. 3
2021
-
[55]
InsPose: Instance-aware networks for single-stage multi-person pose estimation
Dahu Shi, Xing Wei, Xiaodong Yu, Wenming Tan, Ye Ren, and Shiliang Pu. InsPose: Instance-aware networks for single-stage multi-person pose estimation. InACM Multi- media Conference, pages 3079–3087, 2021. 3
2021
-
[56]
End-to-end multi-person pose estimation with transformers
Dahu Shi, Xing Wei, Liangqi Li, Ye Ren, and Wenming Tan. End-to-end multi-person pose estimation with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11069–11078, 2022. 3, 6, 1, 2
2022
-
[57]
Denois- ing diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InInternational Conference on Learning Representations, 2021. 6
2021
-
[58]
Human pose estimation and its application to action recogni- tion: A survey.Journal of Visual Communication and Image Representation, 76:103055, 2021
Liangchen Song, Gang Yu, Junsong Yuan, and Zicheng Liu. Human pose estimation and its application to action recogni- tion: A survey.Journal of Visual Communication and Image Representation, 76:103055, 2021. 1
2021
-
[59]
Applications of pose estimation in human health and performance across the lifespan.Sensors, 21(21):7315,
Jan Stenum, Kendra M Cherry-Allen, Connor O Pyles, Rachel D Reetzke, Michael F Vignos, and Ryan T Roem- mich. Applications of pose estimation in human health and performance across the lifespan.Sensors, 21(21):7315,
-
[60]
Diffu- sionRegPose: Enhancing multi-person pose estimation using a diffusion-based end-to-end regression approach
Dayi Tan, Hansheng Chen, Wei Tian, and Lu Xiong. Diffu- sionRegPose: Enhancing multi-person pose estimation using a diffusion-based end-to-end regression approach. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2230–2239, 2024. 1, 2, 3, 5
2024
-
[61]
DirectPose: Di- rect end-to-end multi-person pose estimation.arXiv preprint arXiv:1911.07451, 2019
Zhi Tian, Hao Chen, and Chunhua Shen. DirectPose: Di- rect end-to-end multi-person pose estimation.arXiv preprint arXiv:1911.07451, 2019. 3
-
[62]
LocLLM: Exploiting generalizable human keypoint localization via large language model
Dongkai Wang, Shiyu Xuan, and Shiliang Zhang. LocLLM: Exploiting generalizable human keypoint localization via large language model. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 614–623, 2024. 3
2024
-
[63]
Underexposed photo enhance- ment using deep illumination estimation
Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, and Jiaya Jia. Underexposed photo enhance- ment using deep illumination estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6849–6857, 2019. 3
2019
-
[64]
Nat- uralness preserved enhancement algorithm for non-uniform illumination images.IEEE Transactions on Image Process- ing, 22(9):3538–3548, 2013
Shuhang Wang, Jin Zheng, Hai-Miao Hu, and Bo Li. Nat- uralness preserved enhancement algorithm for non-uniform illumination images.IEEE Transactions on Image Process- ing, 22(9):3538–3548, 2013. 3
2013
-
[65]
Zero-reference low-light enhancement via physical quadru- ple priors
Wenjing Wang, Huan Yang, Jianlong Fu, and Jiaying Liu. Zero-reference low-light enhancement via physical quadru- ple priors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26057– 26066, 2024. 1, 2, 3, 6, 7, 8, 9, 10, 11
2024
-
[66]
Yufei Wang, Renjie Wan, Wenhan Yang, Haoliang Li, Lap- Pui Chau, and Alex C. Kot. Low-light image enhancement with normalizing flow. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 2604–2612, 2022. 3
2022
-
[67]
Physics-based noise modeling for extreme low-light photog- raphy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8520–8537, 2022
Kaixuan Wei, Ying Fu, Yinqiang Zheng, and Jiaolong Yang. Physics-based noise modeling for extreme low-light photog- raphy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8520–8537, 2022. 2
2022
-
[68]
CBAM: Convolutional block attention module
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. CBAM: Convolutional block attention module. InEuropean Conference on Computer Vision, pages 3–19,
-
[69]
DiffI2I: Efficient diffusion model for image- to-image translation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1578–1593, 2025
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xing- long Wu, Yapeng Tian, Wenming Yang, Radu Timofte, and Luc Van Gool. DiffI2I: Efficient diffusion model for image- to-image translation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1578–1593, 2025. 3
2025
-
[70]
A diffusion model translator for efficient image-to- image translation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):10272–10283, 2024
Mengfei Xia, Yu Zhou, Ran Yi, Yong-Jin Liu, and Wenping Wang. A diffusion model translator for efficient image-to- image translation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):10272–10283, 2024. 3
2024
-
[71]
QueryPose: Sparse multi- person pose regression via spatial-aware part-level query
Yabo Xiao, Kai Su, Xiaojuan Wang, Dongdong Yu, Lei Jin, Mingshu He, and Zehuan Yuan. QueryPose: Sparse multi- person pose regression via spatial-aware part-level query. In Advances in Neural Information Processing Systems, pages 12464–12477, 2022. 3
2022
-
[72]
ViTPose: Simple vision transformer baselines for human pose estimation
Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. ViTPose: Simple vision transformer baselines for human pose estimation. InAdvances in Neural Information Pro- cessing Systems, 2022. 3
2022
-
[73]
Learning local-global contextual adaptation for multi-person pose estimation
Nan Xue, Tianfu Wu, Gui-Song Xia, and Liangpei Zhang. Learning local-global contextual adaptation for multi-person pose estimation. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 13055–13064, 2022. 1, 3 11
2022
-
[74]
Explicit box detection unifies end-to-end multi-person pose estimation
Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, and Lei Zhang. Explicit box detection unifies end-to-end multi-person pose estimation. InInternational Conference on Learning Representations, 2023. 2, 3, 5, 6
2023
-
[75]
Implicit neural representation for coopera- tive low-light image enhancement
Shuzhou Yang, Moxuan Ding, Yanmin Wu, Zihan Li, and Jian Zhang. Implicit neural representation for coopera- tive low-light image enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12918–12927, 2023. 2, 3
2023
-
[76]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP- Adapter: Text compatible image prompt adapter for text-to- image diffusion models.arXiv preprint arXiv:2308.06721,
work page internal anchor Pith review arXiv
-
[77]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3813–3824, 2023. 5
2023
-
[78]
Pedestrian crossing intention prediction at red-light using pose estimation.IEEE Transactions on Intelligent Trans- portation Systems, 23(3):2331–2339, 2022
Shile Zhang, Mohamed Abdel-Aty, Yina Wu, and Ou Zheng. Pedestrian crossing intention prediction at red-light using pose estimation.IEEE Transactions on Intelligent Trans- portation Systems, 23(3):2331–2339, 2022. 1
2022
-
[79]
Pose2Seg: Detection free human instance segmentation
Song-Hai Zhang, Ruilong Li, Xin Dong, Paul Rosin, Zixi Cai, Xi Han, Dingcheng Yang, Haozhi Huang, and Shi-Min Hu. Pose2Seg: Detection free human instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 889–898, 2019. 1, 3
2019
-
[80]
Unpaired image-to-image translation using cycle- consistent adversarial networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networks. InProceedings of the IEEE International Conference on Computer Vision, pages 2223– 2232, 2017. 2, 3, 6, 7, 8, 9, 10, 11
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.