Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval
Pith reviewed 2026-05-10 03:12 UTC · model grok-4.3
The pith
A frozen Stable Diffusion model enhanced with CLIP and BLIP features retrieves 3D shapes from sketches without any category supervision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large-scale pretrained diffusion models exhibit open-vocabulary capability and strong shape bias that suit zero-shot visual retrieval. A frozen Stable Diffusion backbone extracts and aggregates discriminative representations from intermediate U-Net layers for both sketches and rendered 3D views. To bridge the domain gap without retraining, a multimodal feature-enhanced strategy injects global and local visual features from a pretrained CLIP encoder and incorporates enriched textual guidance from learnable soft prompts plus hard textual descriptions generated by BLIP. The Circle-T loss dynamically strengthens positive-pair attraction once negative samples are separated. Experiments on two公共基准
What carries the argument
Multimodal feature-enhanced conditioning of a frozen Stable Diffusion U-Net that aggregates intermediate layer representations for sketches and 3D views while injecting CLIP visual features and BLIP text cues.
If this is right
- 3D shape retrieval becomes feasible for object categories never seen during any training phase.
- The system focuses on sketch contours and semantic context despite high abstraction by using injected multimodal cues.
- Dynamic adjustment of positive-pair attraction adapts alignment to the noise present in hand-drawn sketches.
- Consistent gains appear across multiple standard benchmarks without task-specific fine-tuning of the backbone.
Where Pith is reading between the lines
- The same frozen-backbone strategy with multimodal injections could extend to other zero-shot cross-modal tasks such as text queries to 3D shapes.
- If the enhancements prove stable, similar conditioning might reduce the need for full retraining when applying diffusion models to sparse or abstract inputs in related vision problems.
- Direct processing of 3D data instead of rendered views could be tested as a next step if suitable encoders are paired with the same loss and conditioning approach.
Load-bearing premise
That adding CLIP visual features and BLIP text to a frozen diffusion backbone is enough to overcome the extreme domain gap and sparsity of sketches without any retraining of the model.
What would settle it
Running the method on the same two public benchmarks and finding that it fails to outperform existing zero-shot sketch-based 3D retrieval approaches on retrieval metrics for unseen categories.
Figures
read the original abstract
This paper presents the first exploration of text-to-image diffusion models for zero-shot sketch-based 3D shape retrieval (ZS-SBSR). Existing sketch-based 3D shape retrieval methods struggle in zero-shot settings due to the absence of category supervision and the extreme sparsity of sketch inputs. Our key insight is that large-scale pretrained diffusion models inherently exhibit open-vocabulary capability and strong shape bias, making them well suited for zero-shot visual retrieval. We leverage a frozen Stable Diffusion backbone to extract and aggregate discriminative representations from intermediate U-Net layers for both sketches and rendered 3D views. Diffusion models struggle with sketches due to their extreme abstraction and sparsity, compounded by a significant domain gap from natural images. To address this limitation without costly retraining, we introduce a multimodal feature-enhanced strategy that conditions the frozen diffusion backbone with complementary visual and textual cues from CLIP, explicitly enhancing the ability of semantic context capture and concentrating on sketch contours. Specifically, we inject global and local visual features derived from a pretrained CLIP visual encoder, and incorporate enriched textual guidance by combining learnable soft prompts with hard textual descriptions generated by BLIP. Furthermore, we employ the Circle-T loss to dynamically strengthen positive-pair attraction once negative samples are sufficiently separated, thereby adapting to sketch noise and enabling more effective sketch-3D alignment. Extensive experiments on two public benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches in ZS-SBSR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Diff-SBSR, the first application of text-to-image diffusion models to zero-shot sketch-based 3D shape retrieval (ZS-SBSR). It freezes a Stable Diffusion U-Net backbone, conditions it multimodally by injecting global/local CLIP visual features and BLIP-generated text augmented with learnable soft prompts, extracts and aggregates intermediate U-Net layer representations for sketches and rendered 3D views, and optimizes with a Circle-T loss that strengthens positive-pair attraction after negatives are separated. Extensive experiments on two public benchmarks are reported to show consistent outperformance over prior SOTA methods.
Significance. If the performance gains are reproducible and the diffusion backbone demonstrably contributes shape bias beyond the CLIP/BLIP conditioning, the work would establish a practical route for repurposing large frozen generative models in sparse-input zero-shot retrieval without retraining. The emphasis on no-backbone fine-tuning and adaptation to sketch noise via the loss are pragmatic strengths that could influence follow-on work in cross-modal retrieval.
major comments (3)
- [§3] §3 (multimodal conditioning and U-Net feature extraction): The central claim that intermediate U-Net activations supply discriminative contour/shape information beyond the injected CLIP visual features and BLIP text is load-bearing for the key insight, yet no ablation isolates the U-Net contribution (e.g., conditioned U-Net features vs. direct CLIP embeddings on sketches). Without this, it remains possible that gains derive primarily from the external conditioning signals rather than the diffusion prior.
- [§4] §4 (experiments and tables): The reported outperformance on the two benchmarks is presented without statistical significance tests, standard deviations across runs, or explicit confirmation that baselines were re-implemented with identical protocols and hyperparameter tuning. This weakens the strength of the SOTA claim, especially given the domain gap and sparsity issues highlighted in the abstract.
- [§3.3] §3.3 (Circle-T loss): The loss is motivated as adapting to sketch noise, but the manuscript provides no sensitivity analysis on its hyperparameters or comparison against standard contrastive losses under the same multimodal conditioning, leaving unclear whether the dynamic positive-pair strengthening is essential to the reported gains.
minor comments (3)
- [Figure 2] The architecture diagram (Figure 2) would benefit from explicit arrows and labels indicating where CLIP local/global features and BLIP text are injected into the U-Net.
- [§3.2] Notation for feature aggregation across U-Net layers (e.g., the pooling or concatenation operation) is introduced without a compact equation or pseudocode, complicating reproducibility.
- [§2] Related-work discussion of prior diffusion-based retrieval or sketch-3D methods could be expanded with more recent citations to contextualize the novelty claim.
Simulated Author's Rebuttal
We sincerely thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.
read point-by-point responses
-
Referee: [§3] §3 (multimodal conditioning and U-Net feature extraction): The central claim that intermediate U-Net activations supply discriminative contour/shape information beyond the injected CLIP visual features and BLIP text is load-bearing for the key insight, yet no ablation isolates the U-Net contribution (e.g., conditioned U-Net features vs. direct CLIP embeddings on sketches). Without this, it remains possible that gains derive primarily from the external conditioning signals rather than the diffusion prior.
Authors: We agree that an explicit ablation isolating the contribution of the U-Net features is important to substantiate the role of the diffusion prior. In the revised manuscript, we will include an additional ablation study comparing the performance using only the injected CLIP and BLIP features against the full model that extracts and aggregates intermediate U-Net layer representations. This will clarify the incremental benefit provided by the frozen diffusion backbone. revision: yes
-
Referee: [§4] §4 (experiments and tables): The reported outperformance on the two benchmarks is presented without statistical significance tests, standard deviations across runs, or explicit confirmation that baselines were re-implemented with identical protocols and hyperparameter tuning. This weakens the strength of the SOTA claim, especially given the domain gap and sparsity issues highlighted in the abstract.
Authors: We acknowledge the importance of statistical rigor in reporting results. We will re-implement all baseline methods using the same experimental protocols and hyperparameter settings as described in our paper. Additionally, we will conduct multiple runs with different random seeds to report mean performance with standard deviations and perform statistical significance tests (e.g., paired t-tests) to validate the improvements. These details will be added to the experimental section and tables in the revised version. revision: yes
-
Referee: [§3.3] §3.3 (Circle-T loss): The loss is motivated as adapting to sketch noise, but the manuscript provides no sensitivity analysis on its hyperparameters or comparison against standard contrastive losses under the same multimodal conditioning, leaving unclear whether the dynamic positive-pair strengthening is essential to the reported gains.
Authors: We appreciate this point regarding the necessity of the Circle-T loss. In the revision, we will provide a sensitivity analysis on the key hyperparameters of the Circle-T loss, such as the margin and temperature parameters. Furthermore, we will include a direct comparison against standard contrastive losses (e.g., InfoNCE) under identical multimodal conditioning and backbone settings to demonstrate the advantages of the dynamic positive-pair strengthening mechanism. revision: yes
Circularity Check
No significant circularity; derivation relies on external pretrained models and novel adaptations.
full rationale
The paper's central derivation introduces a multimodal conditioning strategy (CLIP visual features + BLIP text) and Circle-T loss applied to a frozen Stable Diffusion U-Net, without any equations or steps that reduce claimed representations or performance to quantities fitted directly on the ZS-SBSR benchmarks. No self-citations are load-bearing for uniqueness theorems, no ansatzes are smuggled via prior author work, and no predictions are statistically forced by input fitting. The approach is self-contained against external benchmarks via new loss and conditioning, with experiments providing independent validation.
Axiom & Free-Parameter Ledger
free parameters (2)
- learnable soft prompts
- Circle-T loss hyperparameters
axioms (2)
- domain assumption Large-scale pretrained diffusion models inherently exhibit open-vocabulary capability and strong shape bias suitable for zero-shot visual retrieval.
- domain assumption CLIP visual features and BLIP-generated text provide complementary cues that close the domain gap for sparse sketches without retraining the diffusion model.
Reference graph
Works this paper leans on
-
[1]
Shaojin Bai and Jing Bai. 2023. HDA2L: Hierarchical domain-augmented adaptive learning for sketch-based 3D shape retrieval.Knowledge-Based Systems264 (2023), 110302
work page 2023
-
[2]
Shaojin Bai, Jing Bai, Hao Xu, Jiwen Tuo, and Min Liu. 2023. PAGML: Precise alignment guided metric learning for sketch-based 3D shape retrieval.Image and Vision Computing136 (2023), 104756
work page 2023
-
[3]
Shaojin Bai, Yalu Li, Rihao Chang, Qi Liang, and Weizhi Nie. 2025. SCDL: Sketch Causal Disentangled Learning for Sketch-based 3D Shape Retrieval.IEEE Trans- actions on Circuits and Systems for Video Technology(2025)
work page 2025
-
[4]
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Anee- shan Sain, Tao Xiang, Timothy M. Hospedales, and Yi-Zhe Song. 2024. SketchINR: A first look into sketches as implicit neural representations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12565–12574
work page 2024
-
[5]
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Anee- shan Sain, Tao Xiang, and Yi-Zhe Song. 2024. What Sketch Explainability Really Means for Downstream Tasks?. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10997–11008
work page 2024
-
[6]
Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Kumar Bhunia, et al. 2024. Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes. In CVPR
work page 2024
-
[7]
Dmitry Baranchuk, Ivan Rubachev, Andrey Voynov, et al. 2021. Label-Efficient Semantic Segmentation with Diffusion Models. InICLR
work page 2021
-
[8]
Yiyang Cai, Jiaming Lu, Jiewen Wang, and Shuang Liang. 2023. Uncertainty- aware cross-modal transfer network for sketch-based 3D shape retrieval. InIEEE International Conference on Multimedia and Expo. 132–137
work page 2023
-
[9]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision. 9650–9660
work page 2021
-
[10]
Haoxin Chen, Yong Zhang, Xiaodong Cun, et al. 2024. VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models. InCVPR
work page 2024
-
[11]
Jie Chen and Yi Fang. 2018. Deep Cross-Modality Adaptation via Semantics Pre- serving Adversarial Learning for Sketch-Based 3D Shape Retrieval. InProceedings of the European Conference on Computer Vision. 605–620
work page 2018
-
[12]
Liang Chen et al. 2023. Masked Reconstruction in Diffusion Models. InNeurIPS
work page 2023
-
[13]
Gene Chou, Yuval Bahat, and Felix Heide. 2023. Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions. InCVPR
work page 2023
-
[14]
Pinaki Nath Chowdhury et al. 2023. Democratising 2D Sketch to 3D Shape Re- trieval through Pivoting. InProceedings of the IEEE/CVF International Conference on Computer Vision
work page 2023
-
[15]
Pinaki Nath Chowdhury, Ayan Kumar Bhunia, et al. 2023. What Can Human Sketches Do for Object Detection?. InCVPR
work page 2023
-
[16]
Guoxian Dai, Jin Xie, and Yi Fang. 2018. Deep correlated holistic metric learning for sketch-based 3D shape retrieval.IEEE Transactions on Image Processing27, 7 (2018), 3374–3386
work page 2018
-
[17]
Guoxian Dai, Jin Xie, Fan Zhu, and Yi Fang. 2017. Deep correlated metric learning for sketch-based 3D shape retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 31
work page 2017
-
[18]
Weidong Dai and Shuang Liang. 2020. Cross-modal guidance network for sketch- based 3D shape retrieval. InIEEE International Conference on Multimedia and Expo. 1–6
work page 2020
-
[19]
Tal Darom and Yosi Keller. 2012. Scale-invariant features for 3-D mesh models. IEEE Transactions on Image Processing21, 5 (2012), 2758–2769
work page 2012
-
[20]
Bram de Wilde, Anindo Saha, et al. 2024. Medical Diffusion on a Budget: Textual Inversion for Medical Image Generation. InMIDL
work page 2024
-
[21]
Cheng Deng, Xinxun Xu, Hao Wang, Muli Yang, and Dacheng Tao. 2020. Progres- sive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Transactions on Image Processing29 (2020), 8892–8902
work page 2020
-
[22]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. InNeurIPS
work page 2021
-
[23]
Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Dani- ilidis. 2018. Learning so (3) equivariant representations with spherical cnns. In Proceedings of the european conference on computer vision (ECCV). 52–68
work page 2018
-
[24]
Cusuh Ham, Gemma Canet Tarres, et al. 2022. CoGS: Controllable Generation and Search from Sketch and Style. InECCV
work page 2022
-
[25]
Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet- center loss for multi-view 3d object retrieval. InProceedings of the IEEE conference on computer vision and pattern recognition. 1945–1954
work page 2018
-
[26]
Amir Hertz, Ron Mokady, Jay Tenenbaum, et al. 2022. Prompt-to-Prompt Image Editing with Cross Attention Control. InICLR
work page 2022
-
[27]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InNeurIPS
work page 2020
-
[28]
Conghui Hu, Da Li, Yongxin Yang, et al. 2020. Sketch-a-Segmenter: Sketch-Based Photo Segmenter Generation.IEEE TIP(2020)
work page 2020
-
[29]
Drew A. Hudson, Daniel Zoran, et al. 2024. SODA: Bottleneck Diffusion Models for Representation Learning. InCVPR
work page 2024
-
[30]
Bahjat Kawar, Shiran Zada, Oran Lang, et al. 2023. Imagic: Text-Based Real Image Editing with Diffusion Models. InCVPR
work page 2023
-
[31]
Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. InProceedings of the IEEE interna- tional conference on computer vision. 863–872
work page 2017
-
[32]
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowd- hury, Tao Xiang, and Yi-Zhe Song. 2024. Text-to-image diffusion models are great sketch-photo matchmakers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16826–16837
work page 2024
-
[33]
Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, and Yi-Zhe Song. 2025. SketchFusion: Learning Univer- sal Sketch Features through Fusing Foundation Models. InProceedings of the Computer Vision and Pattern Recognition Conference. 2556–2567
work page 2025
-
[34]
Y. Lei, Z. Zhou, P. Zhang, P. Guo, Z. Ma, and L. Liu. 2019. Deep Point-to-Subspace Metric Learning for Sketch-Based 3D Shape Retrieval.Pattern Recognition96 (2019), 106–116
work page 2019
-
[35]
Bo Li, Yijuan Lu, Afzal Godil, Thomas Schreck, et al . 2014. A Comparison of Methods for Sketch-Based 3D Shape Retrieval.Computer Vision and Image Understanding119, 6 (2014), 57–80
work page 2014
- [36]
-
[37]
Bo Li, Yijuan Lu, Chen Li, Afzal Godil, et al. 2014. SHREC’14 Track: Extended Large Scale Sketch-Based 3D Shape Retrieval. InEurographics Workshop on 3D Object Retrieval. 121–130
work page 2014
-
[38]
Junnan Li, Dongxu Li, Caiming Xiong, et al. 2022. BLIP: Bootstrapping Language- Image Pre-training for Unified Vision-Language Understanding and Generation. InInternational Conference on Machine Learning. 12888–12900
work page 2022
-
[39]
Xue Li, Jiong Yu, Ziyang Li, Hongchun Lu, and Ruifeng Yuan. 2024. Dr. clip: Clip- driven universal framework for zero-shot sketch image retrieval. InProceedings of the 32nd ACM international conference on multimedia. 9554–9562
work page 2024
-
[40]
Shuang Liang, Weidong Dai, Yiyang Cai, and Chi Xie. 2024. Sketch-based 3D shape retrieval via teacher–student learning.Computer Vision and Image Under- standing239 (2024), 103903
work page 2024
-
[41]
Shuang Liang, Weidong Dai, and Yichen Wei. 2021. Uncertainty learning for noise resistant sketch-based 3D shape retrieval.IEEE Transactions on Image Processing30 (2021), 8632–8643
work page 2021
-
[42]
Fengyin Lin, Mingkang Li, Da Li, Timothy Hospedales, Yi-Zhe Song, and Yong- gang Qi. 2023. Zero-shot everything sketch-based image retrieval, and in ex- plainable style. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23349–23358
work page 2023
-
[43]
Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). Ieee, 922–928
work page 2015
-
[44]
Min Meng, Wenhang Chen, Jigang Liu, Jun Yu, and Jigang Wu. 2025. CoDi: Contrastive Disentanglement Generative Adversarial Networks for Zero-Shot Sketch-Based 3D Shape Retrieval.IEEE Transactions on Circuits and Systems for Video Technology35, 2 (2025), 1910–1920. doi:10.1109/TCSVT.2024.3472036
-
[45]
Hospedales, Tao Xiang, and Yi-Zhe Song
Anran Qi, Yulia Gryaditskaya, Jeifei Song, Yongxin Yang, Yonggang Qi, Timo- thy M. Hospedales, Tao Xiang, and Yi-Zhe Song. 2021. Toward Fine-Grained Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval SIGIR ’26, June 03–05, 2026, Melbourne, Australia Sketch-Based 3D Shape Retrieval.IEEE Transactio...
work page 2021
-
[46]
Anran Qi, Yi-Zhe Song, and Tao Xiang. 2018. Semantic Embedding for Sketch- Based 3D Shape Retrieval. InBritish Machine Vision Conference, Vol. 3. 11–12
work page 2018
-
[47]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition. 652–660
work page 2017
-
[48]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems30 (2017)
work page 2017
-
[49]
Jie Qin, Shuaihang Yuan, Jiaxin Chen, Boulbaba Ben Amor, Yi Fang, Nhat Hoang- Xuan, Chi-Bien Chu, Khoi-Nguyen Nguyen-Ngoc, Thien-Tri Cao, Nhat-Khang Ngo, et al. 2022. SHREC’22 track: Sketch-based 3D shape retrieval in the wild. Computers & Graphics107 (2022), 104–115
work page 2022
-
[50]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. InCVPR
work page 2022
-
[51]
Nataniel Ruiz, Yuanzhen Li, et al. 2023. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. InCVPR
work page 2023
-
[52]
J. M. Saavedra, B. Bustos, T. Schreck, S. M. Yoon, and M. Scherer. 2012. Sketch- Based 3D Model Retrieval Using Keyshapes for Global and Local Representation. In3D Object Retrieval Workshop at Eurographics. 47–50
work page 2012
-
[53]
Aneeshan Sain et al . 2023. SD-PL: Diffusion Models for Sketch-Based Image Retrieval. InCVPR
work page 2023
-
[54]
Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang, and Yi-Zhe Song. 2023. Clip for all things zero-shot sketch-based image retrieval, fine-grained or not. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2765–2775
work page 2023
-
[55]
Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath Chowd- hury, Tao Xiang, and Yi-Zhe Song. 2022. Sketch3t: Test-time training for zero-shot sbir. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7462–7471
work page 2022
-
[56]
Mainak Singha, Ankit Jha, Divyam Gupta, Pranav Singla, and Biplab Banerjee
-
[57]
InEuropean Conference on Computer Vision
Elevating all zero-shot sketch-based image retrieval through multimodal prompt learning. InEuropean Conference on Computer Vision. Springer, 1–19
-
[58]
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. InProceed- ings of the IEEE international conference on computer vision. 945–953
work page 2015
-
[59]
Yawen Su, Jing Bai, and Gan Lin. 2025. DKD 2 L: Dual Knowledge Distillation Dynamic Learning for sketch-based 3D shape retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5
work page 2025
-
[60]
Yawen Su, Wenjing Li, Jing Bai, and Gan Lin. 2025. SKD-SBSR: Structural Knowl- edge Distillation for Sketch-Based 3D Shape Retrieval.Knowledge-Based Systems 310 (2025), 112891
work page 2025
-
[61]
Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. 2020. Circle loss: A unified perspective of pair similarity optimization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6398–6407
work page 2020
-
[62]
Jialin Tian, Xing Xu, Zheng Wang, Fumin Shen, and Xin Liu. 2021. Relationship- preserving knowledge distillation for zero-shot sketch based image retrieval. In Proceedings of the 29th ACM international conference on multimedia. 5473–5481
work page 2021
-
[63]
Bingrui Wang and Yuan Zhou. 2023. Doodle to Object: Practical Zero-Shot Sketch- Based 3D Shape Retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2474–2482
work page 2023
-
[64]
Fang Wang, Le Kang, and Yi Li. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875–1883
work page 2015
-
[65]
Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. 2017. O-cnn: Octree-based convolutional neural networks for 3d shape analysis.ACM Transactions On Graphics (TOG)36, 4 (2017), 1–11
work page 2017
-
[66]
Xinyu Wang et al. 2023. Test-Time Adaptation for Diffusion Models. InICCV
work page 2023
-
[67]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Pro- ceedings of the European conference on computer vision (ECCV). InProceedings of the European conference on computer vision (ECCV), Vol. 3. 8
work page 2018
-
[68]
Jin Xie, Guoxian Dai, Fan Zhu, Edward K. Wong, and Yi Fang. 2016. Deepshape: Deep-learned shape descriptor for 3D shape retrieval.IEEE Transactions on Pattern Analysis and Machine Intelligence39, 7 (2016)
work page 2016
-
[69]
R. Xu, Z. Han, L. Hui, J. Qian, and J. Xie. 2022. Domain Disentangled Gener- ative Adversarial Network for Zero-Shot Sketch-Based 3D Shape Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2902–2910
work page 2022
-
[70]
Yongzhe Xu, Jiangchuan Hu, Kanoksak Wattanachote, Kun Zeng, and YongYi Gong. 2020. Sketch-based shape retrieval via best view selection and a cross- domain similarity measure.IEEE Transactions on Multimedia22, 11 (2020), 2950– 2962
work page 2020
-
[71]
Sang Min Yoon, Maximilian Scherer, Tobias Schreck, and Arjan Kuijper. 2010. Sketch-based 3D model retrieval using diffusion tensor fields of suggestive con- tours. InProceedings of the 18th ACM international conference on Multimedia. 193–200
work page 2010
-
[72]
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen- Change Loy. 2016. Sketch me that shoe. InProceedings of the IEEE conference on computer vision and pattern recognition. 799–807
work page 2016
-
[73]
Shuaihang Yuan, Congcong Wen, Yu-Shen Liu, and Yi Fang. 2023. Retrieval- specific view learning for sketch-to-shape retrieval.IEEE Transactions on Multi- media27 (2023), 768–779
work page 2023
-
[74]
Long Zeng, Zhi-kai Dong, Jia-yi Yu, Jun Hong, and Hong-yu Wang. 2019. Sketch- based retrieval and instantiation of parametric parts.Computer-Aided Design113 (2019), 82–95
work page 2019
-
[75]
Donglin Zhang, Changxing Li, and Xiao-Jun Wu. 2025. Multi-level Encoding with Hierarchical Alignment for Sketch-Based 3D Shape Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1033–1043
work page 2025
-
[76]
Y. Zhao, Q. Liang, R. Ma, W. Nie, and Y. Su. 2022. JFLN: Joint Feature Learning Net- work for 2D Sketch Based 3D Shape Retrieval.Journal of Visual Communication and Image Representation89 (2022), 103668
work page 2022
-
[77]
Wen Zhou, Jinyuan Jia, Wenying Jiang, and Chenxi Huang. 2020. Sketch augmentation-driven shape retrieval learning framework based on convolutional neural networks.IEEE transactions on visualization and computer graphics27, 8 (2020), 3558–3570
work page 2020
-
[78]
Cunjuan Zhu, Dongdong Cui, Qi Jia, Weimin Wang, Yu Liu, and Michael S Lew
-
[79]
Sketch-based 3d shape retrieval with multi-view fusion transformer. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3005–3009
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.