Best Segmentation Buddies for Image-Shape Correspondence

Dale Decatur; Dongwei Lyu; Itai Lang; Rana Hanocka

arxiv: 2605.18193 · v1 · pith:LYF2KEZ3new · submitted 2026-05-18 · 💻 cs.CV · cs.GR

Best Segmentation Buddies for Image-Shape Correspondence

Itai Lang , Dongwei Lyu , Dale Decatur , Rana Hanocka This is my paper

Pith reviewed 2026-05-20 11:07 UTC · model grok-4.3

classification 💻 cs.CV cs.GR

keywords image-shape correspondence3D segmentationfeature distillationsemantic matchingcross-modalitycomputer visionuntextured shapes

0 comments

The pith

Distilling 2D vision features onto 3D shapes lets Best Segmentation Buddies match image segments to corresponding 3D parts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a way to connect pixels inside a 2D image segment to vertices on an untextured 3D shape that belong to the same semantic part. This connection must hold even when the image and the shape differ sharply in color, form, and viewing angle. The method first copies rich visual features learned by a 2D model onto every point on the 3D surface so that pixel-to-vertex similarity can be measured directly. It then selects the vertices whose closest matching pixel sits inside the image segment; these selected vertices are called Best Segmentation Buddies and serve as reliable anchors for semantic correspondence. The same transferred features are finally used to label the 3D shape into parts without any additional training.

Core claim

The central claim is that distilling deep visual features from a 2D vision model onto the 3D shape surface allows computation of feature similarity between image pixels and shape vertices. Identifying Best Segmentation Buddies—vertices whose most similar image pixel lies within the image segmentation region—enables reliable discovery of vertices in semantically corresponding shape parts across substantial differences in appearance, geometry, and viewpoint. The distilled features are also used to segment the shape directly in 3D, bootstrapping the correspondence process.

What carries the argument

Best Segmentation Buddies: 3D shape vertices whose nearest feature match in the 2D image falls inside the given image segment, used to locate semantically corresponding parts.

If this is right

The approach produces accurate and semantically meaningful correspondences for a wide range of image-shape pairs.
Distilled 3D features from a 2D image segmentation model can be used to segment the untextured 3D shape directly.
Correspondence remains reliable even when appearance, geometry, and viewpoint vary substantially.
The bootstrapping step reduces reliance on manual 3D annotations by transferring 2D segmentation knowledge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation step could be applied frame-by-frame to video, yielding time-consistent 3D part labels.
The method supplies semantic anchors that might improve registration of 3D scans to casual photographs.
Testing the buddies on shapes that contain fine surface details or holes would show where feature transfer begins to break.
Because the 3D segmentation step needs no extra labels, the pipeline could help create large-scale labeled 3D datasets from existing 2D image collections.

Load-bearing premise

The assumption that feature similarity after distillation will place the nearest image pixel inside the correct semantic segment rather than being dominated by viewpoint or geometric differences.

What would settle it

On a collection of image-3D pairs that have hand-labeled ground-truth corresponding segments, count how often the identified Best Segmentation Buddies land outside the correct semantic region; if the error rate is no better than random selection the central claim is false.

Figures

Figures reproduced from arXiv: 2605.18193 by Dale Decatur, Dongwei Lyu, Itai Lang, Rana Hanocka.

**Figure 1.** Figure 1: Best Segmentation Buddies computes segment-to-segment correspondence across different modalities (image-to-shape) and [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Image-shape correspondence gallery. BSB can match semantic parts when the object in the image and the 3D mesh are from different domains, where the corresponding elements differ substantially in appearance, shape, and size. In this work, we address this problem by proposing a segmentation-to-segmentation correspondence method across modalities and domains, matching 2D image regions to 3D semantic parts. Un… view at source ↗

**Figure 4.** Figure 4: Best Segmentation Buddies. Pixel to vertex similarity: we visualize the similarity from a clicked pixel feature (left) to the distilled vision features on the mesh (right) with a heatmap (red being most similar and blue being least similar). Vertex to pixel similarity: we visualize the similarity from the distilled feature of the mesh vertex (left) to all the features in the object image region (right). D… view at source ↗

**Figure 5.** Figure 5: Best Segmentation Buddies matching properties. When a correspondence between an image region and a mesh part exists (left and middle), the matched vertex will map back to a segment (bottom row) that is almost identical to the original segmentation (top row). However, if a match does not exist, such regions will differ substantially (right), implying the absence of correspondence. We discover this property… view at source ↗

**Figure 6.** Figure 6: Complete segment-to-segment correspondence. Our method is capable of generating a complete segmentation-tosegmentation correspondence between an image and a shape (left). We can also match corresponding segmentations across a variety of images of different types (sketch, photo, and drawing), poses, and appearances (right). obtain the mask M2D q ′ , compute the Intersection over Union (IoU) with the mask o… view at source ↗

**Figure 7.** Figure 7: Shape to image correspondence. BSB is highly flexible and operates in both directions. In addition to matching an image segment to a 3D part, it can also match a 3D segmentation to the corresponding semantic image region. v. In our work, we use the best segmentation buddy vp to segment the mesh. The resulting region M3D vp is regarded as the matching 3D part for the 2D segment M2D p in the image, yielding … view at source ↗

**Figure 9.** Figure 9: Qualitative comparison. We adapt baselines to solve our task from existing techniques [3, 53]. These methods produce incorrect correspondences, whereas BSB reliably selects the shape part that semantically matches the target image segment. complete segmentation of the mesh. Quantitative evaluation. As far as we can ascertain, there is no annotated dataset for cross-modality image-shape segment corresponde… view at source ↗

**Figure 8.** Figure 8: Local texturing. Our image-to-shape matching enables automatic, localized texturing of the shape driven by the texture in the image. NBB [3] DIFT [53] Ours [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 10.** Figure 10: Matching the same image to different shapes. Our method can match regions from images to different shapes that contain significant differences in geometric structures (top) and across occlusions in orientation from the query image (bottom). Tab. 1 presents the matching success rate averaged over the evaluation image-shape pairs. NBB relies on a sparse set of mutual nearest neighbor pixels in the neural fe… view at source ↗

**Figure 12.** Figure 12: Texture robustness. BSB matches semantic regions between image and shape despite variations in their appearance and texture. In [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Interactive correspondence. Our BSB matching between pixel clicks and mesh vertices, combined with interactive 2D and 3D segmentation, enables to dynamically update the crossmodality correspondence [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Matching between differently posed objects. BSB finds correspondence between the image and shape when the object in each modality differs substantially. the couch pair, [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

**Figure 15.** Figure 15: Multi-region correspondence. BSB can match multiple regions between the same image-shape pair, when the modalities depict different objects (left), or distinguishing between similar parts of the objects and matching them correctly (right). Correspondence stability [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: A single view to a complete 3D part. Although each image depicts only one view of the object (left), the entire corresponding part is successfully segmented in 3D (right). plied language-driven image segmentation by predicting a bounding box for an object part described by text, and segmenting the part within the bounding box [48]. Then, we used that part mask and its centroid as the pixel click with our… view at source ↗

**Figure 17.** Figure 17: Correspondence stability. Our method is robust to the location of the pixel click in the image region (left). Although different pixels are matched to different vertices, they fall within the corresponding semantic 3D part (right), resulting in a stable matching between the image and the shape. The clicked pixel and the matched vertices are visualized with a green dot [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 18.** Figure 18: Different images to the same shape. BSB accurately matches segmentations from images that contain significant differences in geometry (e.g., the heart-handle on the left) and appearance (e.g., crochet hat on the right) to the same shape. semantic region of the shape, the 3D and 2D segments may not match [PITH_FULL_IMAGE:figures/full_fig_p016_18.png] view at source ↗

**Figure 20.** Figure 20: Backbone model versatility. BSB can utilize a vision backbone other than DINOv2 [43]. In this case, we lift a diffusion model features [53] to the 3D mesh for finding correspondences. The text prompts used to extract features for the images and the renderings of the shape are indicated next to them [PITH_FULL_IMAGE:figures/full_fig_p016_20.png] view at source ↗

**Figure 21.** Figure 21: Text-based 3D segmentation. Combining language-driven image segmentation with our method, we achieve 3D segmentation with text. The prompt above the image was used for its segmentation. at random, and rendered the shape from a set a views, with elevation of {−60◦ , 30◦ , 0 ◦ , 30◦ , 60◦}, and azimuth of {0 ◦ , 30◦ , ..., 330◦}, a total of 5 · 12 = 60 possible views. We randomly selected two of these views… view at source ↗

**Figure 22.** Figure 22: Missing shape part. If a segmented region in the image is missing a matching part in the shape, our method will output an empty 3D segmentation, indicating correctly that correspondence does not exist in this case. the feature space as the match to the pixel click. This baseline achieved a success rate of 0.73. We note that since no existing dataset provides ground-truth annotations for image-shape corr… view at source ↗

**Figure 24.** Figure 24: Correspondence comparison on PartNet. We show the generation process of the input image and 2D click (first three columns), the matched pixel by NBB and DIFT from the generated image to the rendered image of the shape and its unprojection to 3D (fourth to eighth columns), our matching vertex (ninth column) for the pixel click on the generated image, and the ground-truth shape region (tenth column) from wh… view at source ↗

**Figure 25.** Figure 25 [PITH_FULL_IMAGE:figures/full_fig_p018_25.png] view at source ↗

**Figure 27.** Figure 27: Nearest neighbor vertex selection. Selecting the nearest neighbor vertex for a pixel click in the image leads to erroneous correspondences. In contrast, our BSB overcomes the image-shape modality gap and finds correct matches. Method NBB DIFT NN Baseline BSB (ours) Effectiveness ↑ 2.75 2.74 3.26 4.63 [PITH_FULL_IMAGE:figures/full_fig_p019_27.png] view at source ↗

**Figure 28.** Figure 28: Different number of vertex candidates. We evaluate the matching success rate on PartNet for different values of vertex candidates. The performance starts to increase with a higher number of candidates and then saturates. corresponding vertex. F. Implementation Details Vision model distillation. We train a multi-layer perceptron (MLP) to map each mesh vertex to a DINOv2-like feature vector of size dvis … view at source ↗

read the original abstract

Finding correspondences is a fundamental and extensively researched problem in computer vision and graphics. In this work, we examine the underexplored task of estimating segmentation-to-segmentation correspondence between images in the wild and untextured 3D shapes. This task is highly challenging due to substantial differences in appearance, geometry, and viewpoint. Our approach bridges the cross-modality gap by linking pixels in the image segment to vertices in the corresponding semantic part of the 3D shape. To achieve this, we first distill deep visual features from a 2D vision model onto the 3D shape surface, allowing for the computation of feature similarity between image pixels and shape vertices. Then, we identify Best Segmentation Buddies, vertices whose most similar image pixel lies within the image segmentation region, enabling the reliable discovery of vertices in semantically corresponding shape parts. Finally, we leverage distilled 3D features from the 2D image segmentation model to segment the shape directly in 3D, bootstrapping the correspondence process. We demonstrate the generality and robustness of our approach across a wide range of image-shape pairs, showcasing accurate and semantically meaningful correspondences. Our project page is at https://threedle.github.io/bsb/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces Best Segmentation Buddies via 2D feature distillation to 3D shapes for image-to-part correspondence, plus a bootstrapping step to segment the shape in 3D.

read the letter

The punchline here is that the work gives a workable way to match segments from real-world images to corresponding parts on 3D shapes without texture, by distilling deep features and using nearest-neighbor buddies. The new part is defining Best Segmentation Buddies as those vertices where the most similar image pixel falls inside the 2D segment, combined with bootstrapping a 3D segmentation from the 2D model features. This is not just a standard correspondence extension; the bootstrapping step adds a self-reinforcing element. It handles the challenges of appearance, geometry, and viewpoint differences reasonably in the examples shown. The project page likely has visuals that illustrate the robustness across various pairs. The soft spots are around validation. The abstract and likely the paper focus on qualitative demos rather than numbers, ablations, or error analysis. The concern that feature similarity after distillation might still be swayed by viewpoint or normals rather than pure semantics is worth checking. If they didn't run a baseline with projected features without distillation or test against geometric similarity, that leaves room for doubt on what drives the success. Overall, this is aimed at practitioners in 3D vision who need to align 2D observations with 3D models, like in reconstruction pipelines or augmented reality. A reader looking for applied methods rather than theoretical advances will get the most out of it. It is worth a serious referee because the task is real and the method is straightforward to implement and test further. Recommendation: Yes, send it to peer review. The idea has potential, and referees can push for the missing quantitative checks.

Referee Report

2 major / 2 minor

Summary. The paper proposes a pipeline for segmentation-to-segmentation correspondence between in-the-wild 2D images and untextured 3D shapes. It distills features from a pretrained 2D vision model onto the 3D surface, defines Best Segmentation Buddies as the 3D vertices whose nearest image pixel (by distilled feature distance) lies inside a given 2D segment, and uses the resulting correspondences to bootstrap direct 3D segmentation from the image segment. The authors claim the method produces accurate, semantically meaningful matches across large differences in appearance, geometry, and viewpoint.

Significance. If the central claim holds, the work would provide a practical bridge for cross-modal semantic correspondence without requiring texture or dense alignment, which is useful for graphics and vision applications involving untextured meshes. The distillation-plus-nearest-neighbor formulation is conceptually simple and leverages existing 2D models, but its value rests on whether the distilled features actually confer the claimed semantic invariance.

major comments (2)

[Abstract and §3] Abstract and §3 (method description): the claim that Best Segmentation Buddies 'reliably discover vertices in semantically corresponding shape parts' is load-bearing, yet the manuscript supplies no quantitative metrics, success rates, or error analysis on any dataset to show that nearest-neighbor matches exceed a viewpoint/geometry baseline.
[§4] §4 (experiments): no ablation is reported that isolates the distillation step from simply projecting raw 2D features onto the 3D surface; without this comparison it is impossible to verify that the nearest-pixel relation is driven by semantic part identity rather than residual viewpoint or surface-normal effects.

minor comments (2)

[§3] The notation for feature similarity and the exact distillation procedure (e.g., which layers are used, how projection is performed) could be stated more explicitly with a short equation or pseudocode.
[Figures] Figure captions and the project-page reference should include the specific image-shape pairs and ground-truth segments used for qualitative demonstration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional quantitative evaluation and ablation studies as suggested.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method description): the claim that Best Segmentation Buddies 'reliably discover vertices in semantically corresponding shape parts' is load-bearing, yet the manuscript supplies no quantitative metrics, success rates, or error analysis on any dataset to show that nearest-neighbor matches exceed a viewpoint/geometry baseline.

Authors: We agree that the load-bearing claim would benefit from quantitative support. The current manuscript focuses on qualitative demonstrations across diverse in-the-wild image-shape pairs to show semantic correspondence. In the revision we will add quantitative metrics, including precision/recall for vertex-to-segment matching and error analysis on a test set of image-shape pairs with ground-truth annotations, with explicit comparison to a viewpoint/geometry baseline that omits distilled features. revision: yes
Referee: [§4] §4 (experiments): no ablation is reported that isolates the distillation step from simply projecting raw 2D features onto the 3D surface; without this comparison it is impossible to verify that the nearest-pixel relation is driven by semantic part identity rather than residual viewpoint or surface-normal effects.

Authors: We concur that isolating the distillation step is necessary to confirm its role in semantic invariance. The revised manuscript will include an ablation that directly compares the full pipeline (with distilled features) against a variant that projects raw 2D features onto the 3D surface without distillation, measuring the impact on nearest-neighbor correspondence accuracy and semantic consistency. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline uses external pretrained model and explicit definitions without self-referential reductions.

full rationale

The paper presents a methodological pipeline: distill features from an external 2D vision model onto 3D surfaces, then define Best Segmentation Buddies via nearest-neighbor feature similarity within given 2D segments, and bootstrap 3D segmentation. No equations, fitted parameters, or self-citations are shown that would make the discovered correspondences equivalent to inputs by construction. The approach depends on independent external components and is not a closed derivation that reduces outputs to renamed inputs or prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the transferability of 2D visual features to 3D geometry and on the assumption that nearest-neighbor lookup in feature space respects semantic boundaries.

axioms (1)

domain assumption Deep visual features extracted by a pretrained 2D vision model remain semantically meaningful when transferred to vertices of an untextured 3D mesh.
Invoked when the paper states that distilling features onto the shape surface enables similarity computation between pixels and vertices.

invented entities (1)

Best Segmentation Buddies no independent evidence
purpose: Vertices on the 3D shape whose nearest image pixel under distilled features lies inside the given 2D segment.
New term and selection rule introduced to filter correspondences.

pith-pipeline@v0.9.0 · 5746 in / 1422 out tokens · 44393 ms · 2026-05-20T11:07:23.602509+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 4 internal anchors

[1]

Zero-Shot 3D Shape Correspon- dence

Ahmed Abdelreheem, Abdelrahman Eldesokey, Maks Ovs- janikov, and Peter Wonka. Zero-Shot 3D Shape Correspon- dence. InSIGGRAPH Asia 2023 Conference Papers, pages 1–11, New York, NY , USA, 2023. Association for Comput- ing Machinery. 1, 4

work page 2023
[2]

SATR: Zero-Shot Semantic Segmentation of 3D Shapes

Ahmed Abdelreheem, Ivan Skorokhodov, Maks Ovsjanikov, and Peter Wonka. SATR: Zero-Shot Semantic Segmentation of 3D Shapes. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 3

work page 2023
[3]

Neural Best-Buddies: Sparse Cross-Domain Correspondence.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

Kfir Aberman, Jing Liao, Mingyi Shi, Dani Lischinski, Bao- quan Chen, and Daniel Cohen-Or. Neural Best-Buddies: Sparse Cross-Domain Correspondence.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018. 1, 2, 3, 7, 17, 18, 20

work page 2018
[4]

Training-Free Open-V ocabulary Segmentation with Offline Diffusion- Augmented Prototype Generation

Luca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. Training-Free Open-V ocabulary Segmentation with Offline Diffusion- Augmented Prototype Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3689–3699, 2024. 3

work page 2024
[5]

Boscaini, J

D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castel- lani, and P. Vandergheynst. Learning Class-Specific Descrip- tors for Deformable Shapes Using Localized Spectral Con- volutional Networks.Computer Graphics Forum, 34(5):13– 23, 2015. 3

work page 2015
[6]

Learning Shape Correspondence with Anisotropic Convolutional Neural Networks

Davide Boscaini, Jonathan Masci, Emanuele Rodol `a, and Michael Bronstein. Learning Shape Correspondence with Anisotropic Convolutional Neural Networks. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2016. 3

work page 2016
[7]

BRIEF: Binary Robust Independent Elemen- tary Features

Michael Calonder, Vincent Lepetit, Christophe Strecha, and Pascal Fua. BRIEF: Binary Robust Independent Elemen- tary Features. InEuropean conference on computer vision (ECCV), pages 778–792. Springer, 2010. 3

work page 2010
[8]

BAE-NET: Branched Autoen- coder for Shape Co-Segmentation

Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, and Hao Zhang. BAE-NET: Branched Autoen- coder for Shape Co-Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8490–8499, 2019. 3

work page 2019
[9]

3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions

Dale Decatur, Itai Lang, and Rana Hanocka. 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 20930–20939,

work page
[10]

3D Paintbrush: Local Stylization of 3D Shapes with Cas- caded Score Distillation

Dale Decatur, Itai Lang, Kfir Aberman, and Rana Hanocka. 3D Paintbrush: Local Stylization of 3D Shapes with Cas- caded Score Distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4473–4483, 2024. 3, 13

work page 2024
[11]

3D PixBrush: Image-Guided Local Texture Synthesis.arXiv preprint arXiv:2507.03731, 2025

Dale Decatur, Itai Lang, Kfir Aberman, and Rana Hanocka. 3D PixBrush: Image-Guided Local Texture Synthesis.arXiv preprint arXiv:2507.03731, 2025. 8

work page arXiv 2025
[12]

Unsuper- vised Template-assisted Point Cloud Shape Correspondence Network

Jiacheng Deng, Jiahao Lu, and Tianzhu Zhang. Unsuper- vised Template-assisted Point Cloud Shape Correspondence Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5250–5259, 2024. 3

work page 2024
[13]

Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence

Nicolas Donati, Abhishek Sharma, and Maks Ovsjanikov. Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8589–8598, 2020. 1, 3

work page 2020
[14]

Beyond Cartesian Representations for Local Descriptors

Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls. Beyond Cartesian Representations for Local Descriptors. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 253–262, 2019. 3

work page 2019
[15]

Deep Shells: Unsupervised Shape Corre- spondence with Optimal Transport

Marvin Eisenberger, Aysim Toker, Laura Leal-Taix ´e, and Daniel Cremers. Deep Shells: Unsupervised Shape Corre- spondence with Optimal Transport. InAdvances in Neural Information Processing Systems, pages 10491–10502. Cur- ran Associates, Inc., 2020. 3

work page 2020
[16]

DensePose: Dense Human Pose Estimation in the Wild

Rıza Alp G ¨uler, Natalia Neverova, and Iasonas Kokkinos. DensePose: Dense Human Pose Estimation in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 4

work page 2018
[17]

Bron- stein, and Ron Kimmel

Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bron- stein, and Ron Kimmel. Unsupervised Learning of Dense Shape Correspondence. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 3

work page 2019
[18]

MeshCNN: A Network with an Edge.ACM Transactions on Graphics, 38(4):1–12,

Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. MeshCNN: A Network with an Edge.ACM Transactions on Graphics, 38(4):1–12,

work page
[19]

Unsupervised Semantic Correspondence Using Stable Diffu- sion

Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. Unsupervised Semantic Correspondence Using Stable Diffu- sion. InAdvances in Neural Information Processing Systems, pages 8266–8279. Curran Associates, Inc., 2023. 15

work page 2023
[20]

COTR: Correspondence Trans- former for Matching Across Images

Wei Jiang, Eduard Trulls, Jan Hosang, Andrea Tagliasac- chi, and Kwang Moo Yi. COTR: Correspondence Trans- former for Matching Across Images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6207–6217, 2021. 3

work page 2021
[21]

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph., 42(4):139– 1, 2023. 8

work page 2023
[22]

Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment Anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. 2, 4, 6, 14, 15

work page 2023
[23]

PifPaf: Composite Fields for Human Pose Estimation

Sven Kreiss, Lorenzo Bertoni, and Alexandre Alahi. PifPaf: Composite Fields for Human Pose Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 4

work page 2019
[24]

Canonical Surface Mapping via Geometric Cycle Consis- tency

Nilesh Kulkarni, Abhinav Gupta, and Shubham Tulsiani. Canonical Surface Mapping via Geometric Cycle Consis- tency. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 2202–2211,

work page
[25]

Fouhey, and Shubham Tulsiani

Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, and Shubham Tulsiani. Articulation-Aware Canonical Surface Mapping. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4

work page 2020
[26]

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction

Itai Lang, Dvir Ginzburg, Shai Avidan, and Dan Raviv. DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction. InProceedings of the International Confer- ence on 3D Vision (3DV), pages 1442–1451, 2021. 3

work page 2021
[27]

iSeg: Interactive 3D Segmentation via Interac- tive Attention

Itai Lang, Fei Xu, Dale Decatur, Sudarshan Babu, and Rana Hanocka. iSeg: Interactive 3D Segmentation via Interac- tive Attention. InSIGGRAPH Asia 2024 Conference Papers, page 1–11. Association for Computing Machinery, 2024. 2, 3, 4, 5, 6, 7, 16

work page 2024
[28]

SRFeat: Learning Locally Accurate and Globally Consistent Non- Rigid Shape Correspondence

Lei Li, Souhaib Attaiki, and Maks Ovsjanikov. SRFeat: Learning Locally Accurate and Globally Consistent Non- Rigid Shape Correspondence. In2022 International Con- ference on 3D Vision (3DV), pages 144–154, 2022. 3

work page 2022
[29]

Bronstein, and Michael M

Or Litany, Tal Remez, Emanuele Rodol`a, Alex M. Bronstein, and Michael M. Bronstein. Deep Functional Maps: Struc- tured Prediction for Dense Shape Correspondence. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV), pages 5660–5668. IEEE Computer Society,

work page
[30]

OpenShape: Scaling Up 3D Shape Representation To- wards Open-World Understanding

Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xu- anlin Li, Shizhong Han, Hong Cai, Fatih Porikli, and Hao Su. OpenShape: Scaling Up 3D Shape Representation To- wards Open-World Understanding. InAdvances in Neural Information Processing Systems, pages 44860–44879. Cur- ran Associates, Inc., 2023. 4

work page 2023
[31]

PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image- Language Models

Minghua Liu, Yinhao Zhu, Hong Cai, Shizhong Han, Zhan Ling, Fatih Porikli, and Hao Su. PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image- Language Models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 21736–21746, 2023. 3

work page 2023
[32]

Distinctive Image Features from Scale- Invariant Keypoints.International Journal of Computer Vi- sion, 60(2):91–110, 2004

David G Lowe. Distinctive Image Features from Scale- Invariant Keypoints.International Journal of Computer Vi- sion, 60(2):91–110, 2004. 3

work page 2004
[33]

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holyn- ski, and Trevor Darrell. Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence. In Advances in Neural Information Processing Systems, 2023. 15

work page 2023
[34]

Bronstein, and Pierre Vandergheynst

Jonathan Masci, Davide Boscaini, Michael M. Bronstein, and Pierre Vandergheynst. Geodesic Convolutional Neural Networks on Riemannian Manifolds. InProceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pages 37–45, 2015. 3

work page 2015
[35]

SHREC 2019: Matching Humans with Different Connectivity

Simone Melzi, Riccardo Marin, Emanuele Rodol `a, Umberto Castellani, Jing Ren, Adrien Poulenard, Peter Wonka, and Maks Ovsjanikov. SHREC 2019: Matching Humans with Different Connectivity. InEurographics Workshop on 3D Object Retrieval, page 3. The Eurographics Association,

work page 2019
[36]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InProceedings of the European Conference on Computer Vision (ECCV), pages 405–421, 2020. 8

work page 2020
[37]

Working hard to know your neighbor’s mar- gins: Local descriptor learning loss

Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. Working hard to know your neighbor’s mar- gins: Local descriptor learning loss. InAdvances in Neural Information Processing Systems, pages 4826–4837. Curran Associates, Inc., 2017. 3

work page 2017
[38]

Bagdanov

Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, and Andrew D. Bagdanov. Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modal- ity Inversion.arXiv preprint arXiv:2502.04263, 2025. 3

work page arXiv 2025
[39]

Chang, Li Yi, Sub- arna Tripathi, Leonidas J

Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Sub- arna Tripathi, Leonidas J. Guibas, and Hao Su. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchi- cal Part-Level 3D Object Understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 909–918, 2019. 6, 7, 16

work page 2019
[40]

Continu- ous Surface Embeddings

Natalia Neverova, David Novotny, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, and Andrea Vedaldi. Continu- ous Surface Embeddings. InAdvances in Neural Information Processing Systems, pages 17258–17270. Curran Associates, Inc., 2020. 1, 4, 7

work page 2020
[41]

Discovering Rela- tionships between Object Categories via Universal Canoni- cal Maps

Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, and Andrea Vedaldi. Discovering Rela- tionships between Object Categories via Universal Canoni- cal Maps. In2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 404–413, Los Alamitos, CA, USA, 2021. IEEE Computer Society. 4

work page 2021
[43]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Neural Parts: Learning Expres- sive 3D Shape Abstractions with Invertible Neural Networks

Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, and Sanja Fidler. Neural Parts: Learning Expres- sive 3D Shape Abstractions with Invertible Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4521–4530,

work page
[45]

Automatic Differentiation in PyTorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al- ban Desmaison, Luca Antiga, and Adam Lerer. Automatic Differentiation in PyTorch. InNIPS-W, 2017. 6

work page 2017
[46]

ASIA: Adaptive 3D Seg- mentation using Few Image Annotations.SIGGRAPH Asia Conference Papers, 2025

Sai Raj Kishore Perla, Aditya V ora, Sauradip Nag, Ali Mahdavi-Amiri, and Hao Zhang. ASIA: Adaptive 3D Seg- mentation using Few Image Annotations.SIGGRAPH Asia Conference Papers, 2025. 7

work page 2025
[47]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Infor- mation Processing Systems. Curran Associates, Inc., 2017. 3

work page 2017
[48]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. SAM 2: Segment Anything in Images and Videos. arXiv preprint arXiv:...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[49]

Toys4K 3D Object Dataset, 2022

James Matthew Rehg. Toys4K 3D Object Dataset, 2022. https://github.com/rehg-lab/lowshot- shapebias/tree/main/toys4k. 6

work page 2022
[50]

Davis Rempe, Tolga Birdal, Aaron Hertzmann, Jimei Yang, Srinath Sridhar, and Leonidas J. Guibas. HuMoR: 3D Hu- man Motion Model for Robust Pose Estimation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 11488–11499, 2021. 4

work page 2021
[51]

ExtrudeNet: Unsupervised Inverse Sketch- and-Extrude for Shape Parsing

Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, and Junzhe Zhang. ExtrudeNet: Unsupervised Inverse Sketch- and-Extrude for Shape Parsing. InProceedings of the 17th European Conference on Computer Vision (ECCV). Springer, 2022. 3

work page 2022
[52]

SHIC: Shape-Image Correspondences with no Key- point Supervision

Aleksandar Shtedritski, Christian Rupprecht, and Andrea Vedaldi. SHIC: Shape-Image Correspondences with no Key- point Supervision. InEuropean Conference on Computer Vision, pages 129–145. Springer, 2024. 1, 2, 4, 7

work page 2024
[53]

Emergent Correspondence from Image Diffusion

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent Correspondence from Image Diffusion. InAdvances in Neural Information Processing Systems, 2023. 3, 6, 7, 15, 16, 17, 18, 20

work page 2023
[54]

SOSNet: Second Order Similarity Regularization for Local Descriptor Learning

Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11016–11025, 2019. 3

work page 2019
[55]

TurboSquid 3D Model Repository, 2021

TurboSquid. TurboSquid 3D Model Repository, 2021. https://www.turbosquid.com/. 6

work page 2021
[56]

Prior Knowledge for Part Correspondence.Com- puter Graphics Forum, 30(2):553–562, 2011

Oliver van Kaick, Andrea Tagliasacchi, Oana Sidi, Hao Zhang, Daniel Cohen-Or, Lior Wolf, and Ghassan Hamarneh. Prior Knowledge for Part Correspondence.Com- puter Graphics Forum, 30(2):553–562, 2011. 6

work page 2011
[57]

Sclip: Rethinking self- attention for dense vision-language inference,

Feng Wang, Jieru Mei, and Alan Yuille. SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference.arXiv preprint arXiv:2312.01597, 2024. 3

work page arXiv 2024
[58]

Diffusion Model is Secretly a Training-Free Open V ocabulary Semantic Seg- menter.IEEE Transactions on Image Processing, 34:1895– 1907, 2025

Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, and Dong Xu. Diffusion Model is Secretly a Training-Free Open V ocabulary Semantic Seg- menter.IEEE Transactions on Image Processing, 34:1895– 1907, 2025. 3

work page 1907
[59]

SegGPT: Towards Seg- menting Everything in Context

Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, and Tiejun Huang. SegGPT: Towards Seg- menting Everything in Context. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1130–1140, 2023. 2

work page 2023
[60]

Sarma, Michael M

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic Graph CNN for Learning on Point Clouds.ACM Trans. Graph., 38(5), 2019. 3

work page 2019
[61]

Dense Human Body Correspondences Us- ing Convolutional Networks

Lingyu Wei, Qixing Huang, Duygu Ceylan, Etienne V ouga, and Hao Li. Dense Human Body Correspondences Us- ing Convolutional Networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1544–1553, 2016. 3

work page 2016
[62]

3D ShapeNets: A Deep Representation for V olumetric Shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A Deep Representation for V olumetric Shapes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, 2015. 3

work page 1912
[63]

LIFT: Learned Invariant Feature Transform

Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. LIFT: Learned Invariant Feature Transform. InEuro- pean Conference on Computer Vision (ECCV), pages 467–

work page
[64]

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. InPro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 19313–19322. IEEE, 2022. 3

work page 2022
[65]

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In2021 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pages 11426–11436, 2021. 4

work page 2021
[66]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. DINO: DETR with Improved DeNoising Anchor Boxes for End- to-End Object Detection.arXiv preprint arXiv:2203.03605,

work page internal anchor Pith review Pith/arXiv arXiv
[67]

A Tale of Two Features: Stable Diffusion Comple- ments DINO for Zero-Shot Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Pola- nia Cabrera, Varun Jampani, Deqing Sun, and Ming-Hsuan Yang. A Tale of Two Features: Stable Diffusion Comple- ments DINO for Zero-Shot Semantic Correspondence. In Advances in Neural Information Processing Systems, 2023. 15

work page 2023
[68]

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, 2023. 7, 17

work page 2023
[69]

Extract Free Dense Labels from CLIP

Chong Zhou, Chen Change Loy, and Bo Dai. Extract Free Dense Labels from CLIP. InProceedings of the 17th Euro- pean Conference on Computer Vision (ECCV), pages 696– 712, Cham, 2022. Springer Nature Switzerland. 3

work page 2022
[70]

Thingi10K: A Dataset of 10,000 3D-Printing Models

Qingnan Zhou and Alec Jacobson. Thingi10K: A Dataset of 10,000 3D-Printing Models.arXiv preprint arXiv:1605.04797, 2016. 6

work page internal anchor Pith review Pith/arXiv arXiv 2016
[71]

Segment Everything Everywhere All at Once

Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, and Yong Jae Lee. Segment Everything Everywhere All at Once. In Advances in Neural Information Processing Systems, pages 19769–19782. Curran Associates, Inc., 2023. 2 Best Segmentation Buddies for Image-Shape Correspondence Supplementary Material The followi...

work page 2023
[72]

An image of an airplane facing away

with a box input, where the user specifies the top-left and bottom-right coordinates in the image to segment the part maskm 2D p used in our matching scheme. Examples are shown in Fig. 19. Another interface for segmenting the image is text, as we describe next. Text to 3D segmentation.In the main paper, we used a click-based model for segmenting the image...

work page 2048

[1] [1]

Zero-Shot 3D Shape Correspon- dence

Ahmed Abdelreheem, Abdelrahman Eldesokey, Maks Ovs- janikov, and Peter Wonka. Zero-Shot 3D Shape Correspon- dence. InSIGGRAPH Asia 2023 Conference Papers, pages 1–11, New York, NY , USA, 2023. Association for Comput- ing Machinery. 1, 4

work page 2023

[2] [2]

SATR: Zero-Shot Semantic Segmentation of 3D Shapes

Ahmed Abdelreheem, Ivan Skorokhodov, Maks Ovsjanikov, and Peter Wonka. SATR: Zero-Shot Semantic Segmentation of 3D Shapes. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 3

work page 2023

[3] [3]

Neural Best-Buddies: Sparse Cross-Domain Correspondence.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

Kfir Aberman, Jing Liao, Mingyi Shi, Dani Lischinski, Bao- quan Chen, and Daniel Cohen-Or. Neural Best-Buddies: Sparse Cross-Domain Correspondence.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018. 1, 2, 3, 7, 17, 18, 20

work page 2018

[4] [4]

Training-Free Open-V ocabulary Segmentation with Offline Diffusion- Augmented Prototype Generation

Luca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. Training-Free Open-V ocabulary Segmentation with Offline Diffusion- Augmented Prototype Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3689–3699, 2024. 3

work page 2024

[5] [5]

Boscaini, J

D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castel- lani, and P. Vandergheynst. Learning Class-Specific Descrip- tors for Deformable Shapes Using Localized Spectral Con- volutional Networks.Computer Graphics Forum, 34(5):13– 23, 2015. 3

work page 2015

[6] [6]

Learning Shape Correspondence with Anisotropic Convolutional Neural Networks

Davide Boscaini, Jonathan Masci, Emanuele Rodol `a, and Michael Bronstein. Learning Shape Correspondence with Anisotropic Convolutional Neural Networks. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2016. 3

work page 2016

[7] [7]

BRIEF: Binary Robust Independent Elemen- tary Features

Michael Calonder, Vincent Lepetit, Christophe Strecha, and Pascal Fua. BRIEF: Binary Robust Independent Elemen- tary Features. InEuropean conference on computer vision (ECCV), pages 778–792. Springer, 2010. 3

work page 2010

[8] [8]

BAE-NET: Branched Autoen- coder for Shape Co-Segmentation

Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, and Hao Zhang. BAE-NET: Branched Autoen- coder for Shape Co-Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8490–8499, 2019. 3

work page 2019

[9] [9]

3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions

Dale Decatur, Itai Lang, and Rana Hanocka. 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 20930–20939,

work page

[10] [10]

3D Paintbrush: Local Stylization of 3D Shapes with Cas- caded Score Distillation

Dale Decatur, Itai Lang, Kfir Aberman, and Rana Hanocka. 3D Paintbrush: Local Stylization of 3D Shapes with Cas- caded Score Distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4473–4483, 2024. 3, 13

work page 2024

[11] [11]

3D PixBrush: Image-Guided Local Texture Synthesis.arXiv preprint arXiv:2507.03731, 2025

Dale Decatur, Itai Lang, Kfir Aberman, and Rana Hanocka. 3D PixBrush: Image-Guided Local Texture Synthesis.arXiv preprint arXiv:2507.03731, 2025. 8

work page arXiv 2025

[12] [12]

Unsuper- vised Template-assisted Point Cloud Shape Correspondence Network

Jiacheng Deng, Jiahao Lu, and Tianzhu Zhang. Unsuper- vised Template-assisted Point Cloud Shape Correspondence Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5250–5259, 2024. 3

work page 2024

[13] [13]

Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence

Nicolas Donati, Abhishek Sharma, and Maks Ovsjanikov. Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8589–8598, 2020. 1, 3

work page 2020

[14] [14]

Beyond Cartesian Representations for Local Descriptors

Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls. Beyond Cartesian Representations for Local Descriptors. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 253–262, 2019. 3

work page 2019

[15] [15]

Deep Shells: Unsupervised Shape Corre- spondence with Optimal Transport

Marvin Eisenberger, Aysim Toker, Laura Leal-Taix ´e, and Daniel Cremers. Deep Shells: Unsupervised Shape Corre- spondence with Optimal Transport. InAdvances in Neural Information Processing Systems, pages 10491–10502. Cur- ran Associates, Inc., 2020. 3

work page 2020

[16] [16]

DensePose: Dense Human Pose Estimation in the Wild

Rıza Alp G ¨uler, Natalia Neverova, and Iasonas Kokkinos. DensePose: Dense Human Pose Estimation in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 4

work page 2018

[17] [17]

Bron- stein, and Ron Kimmel

Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bron- stein, and Ron Kimmel. Unsupervised Learning of Dense Shape Correspondence. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 3

work page 2019

[18] [18]

MeshCNN: A Network with an Edge.ACM Transactions on Graphics, 38(4):1–12,

Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. MeshCNN: A Network with an Edge.ACM Transactions on Graphics, 38(4):1–12,

work page

[19] [19]

Unsupervised Semantic Correspondence Using Stable Diffu- sion

Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. Unsupervised Semantic Correspondence Using Stable Diffu- sion. InAdvances in Neural Information Processing Systems, pages 8266–8279. Curran Associates, Inc., 2023. 15

work page 2023

[20] [20]

COTR: Correspondence Trans- former for Matching Across Images

Wei Jiang, Eduard Trulls, Jan Hosang, Andrea Tagliasac- chi, and Kwang Moo Yi. COTR: Correspondence Trans- former for Matching Across Images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6207–6217, 2021. 3

work page 2021

[21] [21]

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph., 42(4):139– 1, 2023. 8

work page 2023

[22] [22]

Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment Anything. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023. 2, 4, 6, 14, 15

work page 2023

[23] [23]

PifPaf: Composite Fields for Human Pose Estimation

Sven Kreiss, Lorenzo Bertoni, and Alexandre Alahi. PifPaf: Composite Fields for Human Pose Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 4

work page 2019

[24] [24]

Canonical Surface Mapping via Geometric Cycle Consis- tency

Nilesh Kulkarni, Abhinav Gupta, and Shubham Tulsiani. Canonical Surface Mapping via Geometric Cycle Consis- tency. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 2202–2211,

work page

[25] [25]

Fouhey, and Shubham Tulsiani

Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, and Shubham Tulsiani. Articulation-Aware Canonical Surface Mapping. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 4

work page 2020

[26] [26]

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction

Itai Lang, Dvir Ginzburg, Shai Avidan, and Dan Raviv. DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction. InProceedings of the International Confer- ence on 3D Vision (3DV), pages 1442–1451, 2021. 3

work page 2021

[27] [27]

iSeg: Interactive 3D Segmentation via Interac- tive Attention

Itai Lang, Fei Xu, Dale Decatur, Sudarshan Babu, and Rana Hanocka. iSeg: Interactive 3D Segmentation via Interac- tive Attention. InSIGGRAPH Asia 2024 Conference Papers, page 1–11. Association for Computing Machinery, 2024. 2, 3, 4, 5, 6, 7, 16

work page 2024

[28] [28]

SRFeat: Learning Locally Accurate and Globally Consistent Non- Rigid Shape Correspondence

Lei Li, Souhaib Attaiki, and Maks Ovsjanikov. SRFeat: Learning Locally Accurate and Globally Consistent Non- Rigid Shape Correspondence. In2022 International Con- ference on 3D Vision (3DV), pages 144–154, 2022. 3

work page 2022

[29] [29]

Bronstein, and Michael M

Or Litany, Tal Remez, Emanuele Rodol`a, Alex M. Bronstein, and Michael M. Bronstein. Deep Functional Maps: Struc- tured Prediction for Dense Shape Correspondence. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV), pages 5660–5668. IEEE Computer Society,

work page

[30] [30]

OpenShape: Scaling Up 3D Shape Representation To- wards Open-World Understanding

Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xu- anlin Li, Shizhong Han, Hong Cai, Fatih Porikli, and Hao Su. OpenShape: Scaling Up 3D Shape Representation To- wards Open-World Understanding. InAdvances in Neural Information Processing Systems, pages 44860–44879. Cur- ran Associates, Inc., 2023. 4

work page 2023

[31] [31]

PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image- Language Models

Minghua Liu, Yinhao Zhu, Hong Cai, Shizhong Han, Zhan Ling, Fatih Porikli, and Hao Su. PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image- Language Models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 21736–21746, 2023. 3

work page 2023

[32] [32]

Distinctive Image Features from Scale- Invariant Keypoints.International Journal of Computer Vi- sion, 60(2):91–110, 2004

David G Lowe. Distinctive Image Features from Scale- Invariant Keypoints.International Journal of Computer Vi- sion, 60(2):91–110, 2004. 3

work page 2004

[33] [33]

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holyn- ski, and Trevor Darrell. Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence. In Advances in Neural Information Processing Systems, 2023. 15

work page 2023

[34] [34]

Bronstein, and Pierre Vandergheynst

Jonathan Masci, Davide Boscaini, Michael M. Bronstein, and Pierre Vandergheynst. Geodesic Convolutional Neural Networks on Riemannian Manifolds. InProceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pages 37–45, 2015. 3

work page 2015

[35] [35]

SHREC 2019: Matching Humans with Different Connectivity

Simone Melzi, Riccardo Marin, Emanuele Rodol `a, Umberto Castellani, Jing Ren, Adrien Poulenard, Peter Wonka, and Maks Ovsjanikov. SHREC 2019: Matching Humans with Different Connectivity. InEurographics Workshop on 3D Object Retrieval, page 3. The Eurographics Association,

work page 2019

[36] [36]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InProceedings of the European Conference on Computer Vision (ECCV), pages 405–421, 2020. 8

work page 2020

[37] [37]

Working hard to know your neighbor’s mar- gins: Local descriptor learning loss

Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. Working hard to know your neighbor’s mar- gins: Local descriptor learning loss. InAdvances in Neural Information Processing Systems, pages 4826–4837. Curran Associates, Inc., 2017. 3

work page 2017

[38] [38]

Bagdanov

Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, and Andrew D. Bagdanov. Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modal- ity Inversion.arXiv preprint arXiv:2502.04263, 2025. 3

work page arXiv 2025

[39] [39]

Chang, Li Yi, Sub- arna Tripathi, Leonidas J

Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Sub- arna Tripathi, Leonidas J. Guibas, and Hao Su. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchi- cal Part-Level 3D Object Understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 909–918, 2019. 6, 7, 16

work page 2019

[40] [40]

Continu- ous Surface Embeddings

Natalia Neverova, David Novotny, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, and Andrea Vedaldi. Continu- ous Surface Embeddings. InAdvances in Neural Information Processing Systems, pages 17258–17270. Curran Associates, Inc., 2020. 1, 4, 7

work page 2020

[41] [41]

Discovering Rela- tionships between Object Categories via Universal Canoni- cal Maps

Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, and Andrea Vedaldi. Discovering Rela- tionships between Object Categories via Universal Canoni- cal Maps. In2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 404–413, Los Alamitos, CA, USA, 2021. IEEE Computer Society. 4

work page 2021

[42] [43]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

work page internal anchor Pith review Pith/arXiv arXiv

[43] [44]

Neural Parts: Learning Expres- sive 3D Shape Abstractions with Invertible Neural Networks

Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, and Sanja Fidler. Neural Parts: Learning Expres- sive 3D Shape Abstractions with Invertible Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4521–4530,

work page

[44] [45]

Automatic Differentiation in PyTorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al- ban Desmaison, Luca Antiga, and Adam Lerer. Automatic Differentiation in PyTorch. InNIPS-W, 2017. 6

work page 2017

[45] [46]

ASIA: Adaptive 3D Seg- mentation using Few Image Annotations.SIGGRAPH Asia Conference Papers, 2025

Sai Raj Kishore Perla, Aditya V ora, Sauradip Nag, Ali Mahdavi-Amiri, and Hao Zhang. ASIA: Adaptive 3D Seg- mentation using Few Image Annotations.SIGGRAPH Asia Conference Papers, 2025. 7

work page 2025

[46] [47]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Infor- mation Processing Systems. Curran Associates, Inc., 2017. 3

work page 2017

[47] [48]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. SAM 2: Segment Anything in Images and Videos. arXiv preprint arXiv:...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [49]

Toys4K 3D Object Dataset, 2022

James Matthew Rehg. Toys4K 3D Object Dataset, 2022. https://github.com/rehg-lab/lowshot- shapebias/tree/main/toys4k. 6

work page 2022

[49] [50]

Davis Rempe, Tolga Birdal, Aaron Hertzmann, Jimei Yang, Srinath Sridhar, and Leonidas J. Guibas. HuMoR: 3D Hu- man Motion Model for Robust Pose Estimation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 11488–11499, 2021. 4

work page 2021

[50] [51]

ExtrudeNet: Unsupervised Inverse Sketch- and-Extrude for Shape Parsing

Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, and Junzhe Zhang. ExtrudeNet: Unsupervised Inverse Sketch- and-Extrude for Shape Parsing. InProceedings of the 17th European Conference on Computer Vision (ECCV). Springer, 2022. 3

work page 2022

[51] [52]

SHIC: Shape-Image Correspondences with no Key- point Supervision

Aleksandar Shtedritski, Christian Rupprecht, and Andrea Vedaldi. SHIC: Shape-Image Correspondences with no Key- point Supervision. InEuropean Conference on Computer Vision, pages 129–145. Springer, 2024. 1, 2, 4, 7

work page 2024

[52] [53]

Emergent Correspondence from Image Diffusion

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent Correspondence from Image Diffusion. InAdvances in Neural Information Processing Systems, 2023. 3, 6, 7, 15, 16, 17, 18, 20

work page 2023

[53] [54]

SOSNet: Second Order Similarity Regularization for Local Descriptor Learning

Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11016–11025, 2019. 3

work page 2019

[54] [55]

TurboSquid 3D Model Repository, 2021

TurboSquid. TurboSquid 3D Model Repository, 2021. https://www.turbosquid.com/. 6

work page 2021

[55] [56]

Prior Knowledge for Part Correspondence.Com- puter Graphics Forum, 30(2):553–562, 2011

Oliver van Kaick, Andrea Tagliasacchi, Oana Sidi, Hao Zhang, Daniel Cohen-Or, Lior Wolf, and Ghassan Hamarneh. Prior Knowledge for Part Correspondence.Com- puter Graphics Forum, 30(2):553–562, 2011. 6

work page 2011

[56] [57]

Sclip: Rethinking self- attention for dense vision-language inference,

Feng Wang, Jieru Mei, and Alan Yuille. SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference.arXiv preprint arXiv:2312.01597, 2024. 3

work page arXiv 2024

[57] [58]

Diffusion Model is Secretly a Training-Free Open V ocabulary Semantic Seg- menter.IEEE Transactions on Image Processing, 34:1895– 1907, 2025

Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, and Dong Xu. Diffusion Model is Secretly a Training-Free Open V ocabulary Semantic Seg- menter.IEEE Transactions on Image Processing, 34:1895– 1907, 2025. 3

work page 1907

[58] [59]

SegGPT: Towards Seg- menting Everything in Context

Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, and Tiejun Huang. SegGPT: Towards Seg- menting Everything in Context. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1130–1140, 2023. 2

work page 2023

[59] [60]

Sarma, Michael M

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic Graph CNN for Learning on Point Clouds.ACM Trans. Graph., 38(5), 2019. 3

work page 2019

[60] [61]

Dense Human Body Correspondences Us- ing Convolutional Networks

Lingyu Wei, Qixing Huang, Duygu Ceylan, Etienne V ouga, and Hao Li. Dense Human Body Correspondences Us- ing Convolutional Networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1544–1553, 2016. 3

work page 2016

[61] [62]

3D ShapeNets: A Deep Representation for V olumetric Shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A Deep Representation for V olumetric Shapes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, 2015. 3

work page 1912

[62] [63]

LIFT: Learned Invariant Feature Transform

Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. LIFT: Learned Invariant Feature Transform. InEuro- pean Conference on Computer Vision (ECCV), pages 467–

work page

[63] [64]

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. InPro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 19313–19322. IEEE, 2022. 3

work page 2022

[64] [65]

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In2021 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pages 11426–11436, 2021. 4

work page 2021

[65] [66]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. DINO: DETR with Improved DeNoising Anchor Boxes for End- to-End Object Detection.arXiv preprint arXiv:2203.03605,

work page internal anchor Pith review Pith/arXiv arXiv

[66] [67]

A Tale of Two Features: Stable Diffusion Comple- ments DINO for Zero-Shot Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Pola- nia Cabrera, Varun Jampani, Deqing Sun, and Ming-Hsuan Yang. A Tale of Two Features: Stable Diffusion Comple- ments DINO for Zero-Shot Semantic Correspondence. In Advances in Neural Information Processing Systems, 2023. 15

work page 2023

[67] [68]

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, 2023. 7, 17

work page 2023

[68] [69]

Extract Free Dense Labels from CLIP

Chong Zhou, Chen Change Loy, and Bo Dai. Extract Free Dense Labels from CLIP. InProceedings of the 17th Euro- pean Conference on Computer Vision (ECCV), pages 696– 712, Cham, 2022. Springer Nature Switzerland. 3

work page 2022

[69] [70]

Thingi10K: A Dataset of 10,000 3D-Printing Models

Qingnan Zhou and Alec Jacobson. Thingi10K: A Dataset of 10,000 3D-Printing Models.arXiv preprint arXiv:1605.04797, 2016. 6

work page internal anchor Pith review Pith/arXiv arXiv 2016

[70] [71]

Segment Everything Everywhere All at Once

Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, and Yong Jae Lee. Segment Everything Everywhere All at Once. In Advances in Neural Information Processing Systems, pages 19769–19782. Curran Associates, Inc., 2023. 2 Best Segmentation Buddies for Image-Shape Correspondence Supplementary Material The followi...

work page 2023

[71] [72]

An image of an airplane facing away

with a box input, where the user specifies the top-left and bottom-right coordinates in the image to segment the part maskm 2D p used in our matching scheme. Examples are shown in Fig. 19. Another interface for segmenting the image is text, as we describe next. Text to 3D segmentation.In the main paper, we used a click-based model for segmenting the image...

work page 2048