pith. sign in

arxiv: 2606.05515 · v1 · pith:P7XD5IIWnew · submitted 2026-06-03 · 💻 cs.CV

BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding

Pith reviewed 2026-06-28 05:58 UTC · model grok-4.3

classification 💻 cs.CV
keywords BRepCADcontrastive learningmultimodal embeddingsretrievalzero-shot classificationtransformerboundary representation
0
0 comments X

The pith

BRepCLIP produces embeddings from native CAD boundary representations that align with images and text through contrastive pretraining on tokenized faces and edges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to represent CAD models in their native boundary representation format rather than converting them to points or meshes. Each model becomes a sequence of face and edge tokens drawn from separate vocabularies for surface and curve geometry, with added descriptors for type and position. A transformer encoder turns this sequence into a single embedding that is trained to match the corresponding image and text embeddings from a frozen CLIP model using a contrastive loss. If the approach works, it supplies a structure-aware way to compare and retrieve CAD models that respects exact parametric geometry instead of approximate sampling.

Core claim

BRepCLIP models each CAD object as a sequence of face and edge tokens equipped with discrete vocabularies for surfaces and curves plus spatial and semantic descriptors, feeds the sequence through a transformer encoder to obtain a global embedding, and aligns that embedding to CLIP image and text spaces via a joint contrastive objective, yielding more discriminative representations than point-based baselines on retrieval and classification tasks.

What carries the argument

A transformer encoder that aggregates tokenized BRep faces and edges into a global embedding aligned contrastively to CLIP's image and text encoders.

If this is right

  • Top-1 retrieval accuracy rises by 40.4 percent on ABC, 22.0 percent on CADParser, and 23.9 percent on Automate relative to OpenShape.
  • Zero-shot classification Top-1 score on FabWave rises by 15 percent.
  • The learned embeddings serve as a CAD-aware similarity metric for scoring text- and image-conditioned CAD generation outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Design tools could use the same embeddings to suggest similar parts during modeling without manual feature extraction.
  • The same tokenization scheme might support fine-tuning for other downstream CAD tasks such as segmentation or parameter prediction.
  • If the alignment generalizes, BRep-based pretraining could become a standard step before applying large language models to engineering documents.

Load-bearing premise

That discrete vocabularies for surfaces and curves together with spatial and semantic descriptors are enough for a standard transformer to produce embeddings that align meaningfully with CLIP across different CAD collections.

What would settle it

A point-cloud or mesh method trained on the same CAD datasets that matches or exceeds the reported Top-1 retrieval gains of 40.4 percent on ABC, 22.0 percent on CADParser, and 23.9 percent on Automate.

Figures

Figures reproduced from arXiv: 2606.05515 by Didier Stricker, Mohammad Sadil Khan, Muhammad Usama, Muhammad Zeshan Afzal.

Figure 1
Figure 1. Figure 1: Compared to point clouds, our BRep￾aware representations (edge, face points) pre￾serve both geometry and fine-grained structures (e.g., holes, rounded corners) for accurate CAD representation learning. We introduce BRepCLIP, the first contrastive rep￾resentation learning framework to operate directly on BRep primitives. Each CAD model is repre￾sented as a set of BRep face and edge primitives, where each pr… view at source ↗
Figure 2
Figure 2. Figure 2: Hybrid dual-dVAE tokenization. Face and edge points are tokenized independently using separate discrete VAEs with dedicated codebooks. We encode BRep geometry through a tokenization scheme over faces and edges as shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: BRepCLIP. Face and edge point sets, Gf and Ge, are tokenized by frozen face (FT ) and edge (ET ) tokenizers and encoded by a transformer with modality, spatial, and semantic cues to produce a global BRep embedding. Frozen CLIP text and image encoders provide caption and multi-view image embeddings for BRep–text and BRep–image contrastive training. Lbt = − 1 2N X N i=1 " log exp(Z B i · Z T i /τ ) PN j=1 ex… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative retrieval results. Given a text query, BRepCLIP retrieves CAD models that faithfully match fine-grained geometric details such as hole count, edge topology, and surface type compared to Point-based baselines [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results for zero-shot classification and BRepCLIP-Score. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Score Sensitivity to prompt corruption. A model that looks correct when rendered may still be missing holes, chamfers, or correct edge topology. CLIP score operates on 2D projec￾tions and cannot capture these details. Chamfer Distance measures global shape proximity but is insensitive to local topology. BRepCLIP-Score addresses both limitations by grounding evalua￾tion directly in BRep embeddings, where su… view at source ↗
Figure 7
Figure 7. Figure 7: Distributions of the number of edges per CAD model (left), the number of faces per CAD [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Overview of training, in-domain retrieval, and [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of face primitive types (left) and edge curve types (right) in the 400K ABC [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of edge relation attributes in [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Additional qualitative text-to-CAD retrieval results. Given a text query, BRepCLIP [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Additional qualitative results for BRepCLIP-Score. Higher scores are assigned to CAD [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Additional qualitative results for zero-shot classification on FabWave. BRepCLIP produces [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
read the original abstract

Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representation learning substrate. We introduce BRepCLIP, the first framework to align BRep geometry with language and image embeddings through contrastive pretraining. We model each CAD object as a sequence of face and edge tokens with separate discrete vocabularies for surface and curve geometry, augmented with spatial and semantic descriptors that capture surface types (e.g., cylindrical, torus, NURBS) and curve primitives (e.g., line, arc, B-spline). A transformer encoder aggregates these tokens into a global BRep embedding, aligned with CLIP's text and image encoders via a joint contrastive objective. BRepCLIP generates more discriminative and semantically grounded embeddings than existing point-based alternatives, improving Top-1 retrieval over OpenShape by 40.4%, 22.0%, and 23.9% on ABC, CADParser, and Automate, respectively, and improving zero-shot classification on FabWave by 15% in Top-1 score. We further demonstrate its utility as a CAD-aware similarity metric for evaluating text and image-conditioned CAD generation, establishing the importance of structure-aware pretraining for multimodal CAD understanding. Project page is available at https://muhammadusama100.github.io/BrepClip2026/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces BRepCLIP, the first contrastive pretraining framework to align native BRep representations of CAD models (tokenized faces and edges with discrete surface/curve vocabularies plus spatial and semantic descriptors) with CLIP image and text embeddings via a transformer encoder and joint contrastive loss. It reports substantial gains over point-cloud baselines: +40.4%, +22.0%, and +23.9% Top-1 retrieval on ABC, CADParser, and Automate, plus +15% zero-shot Top-1 classification on FabWave, and demonstrates utility as a similarity metric for text/image-conditioned CAD generation.

Significance. If the empirical gains are reproducible and attributable to the BRep representation rather than implementation details, the work would meaningfully advance CAD representation learning by moving beyond point-cloud or mesh approximations to the exact parametric format used in industrial design. The multimodal alignment and downstream use for generation evaluation are timely contributions.

major comments (2)
  1. [Method] Method (tokenization and encoder description): The headline retrieval and classification improvements rest on the assumption that the chosen discrete vocabularies for surfaces/curves, augmented only by type and primitive descriptors, retain enough continuous geometric information for meaningful alignment with CLIP spaces. No ablation or analysis is presented that isolates the effect of discretization (e.g., loss of exact NURBS coefficients or tolerances) versus the transformer architecture or contrastive objective, leaving open whether the reported 40%+ gains would persist under alternative tokenizations or on out-of-distribution CAD domains.
  2. [Experiments] Experimental section (results tables): The abstract states concrete percentage improvements but the provided text supplies no information on training hyperparameters, dataset splits, statistical significance testing, or variance across runs. Without these, it is impossible to determine whether the gains over OpenShape are robust or sensitive to the particular choice of discrete vocabularies.
minor comments (1)
  1. The project page URL is given but no link to code or pretrained models is mentioned in the text; releasing these would strengthen reproducibility claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback on our manuscript. We address the major comments point by point below, providing clarifications and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Method] Method (tokenization and encoder description): The headline retrieval and classification improvements rest on the assumption that the chosen discrete vocabularies for surfaces/curves, augmented only by type and primitive descriptors, retain enough continuous geometric information for meaningful alignment with CLIP spaces. No ablation or analysis is presented that isolates the effect of discretization (e.g., loss of exact NURBS coefficients or tolerances) versus the transformer architecture or contrastive objective, leaving open whether the reported 40%+ gains would persist under alternative tokenizations or on out-of-distribution CAD domains.

    Authors: We appreciate the referee's point on the potential limitations of discretization in our tokenization approach. The design of our surface and curve vocabularies, combined with type and primitive descriptors, aims to preserve key geometric properties necessary for alignment with CLIP embeddings. The consistent performance gains across multiple datasets support that this representation is effective. However, we agree that dedicated ablations isolating the discretization effects would be beneficial. In the revised manuscript, we will include additional analysis and discussion on the impact of our tokenization choices versus the architecture and loss, as well as note the scope of our current evaluations. revision: partial

  2. Referee: [Experiments] Experimental section (results tables): The abstract states concrete percentage improvements but the provided text supplies no information on training hyperparameters, dataset splits, statistical significance testing, or variance across runs. Without these, it is impossible to determine whether the gains over OpenShape are robust or sensitive to the particular choice of discrete vocabularies.

    Authors: We acknowledge that the experimental details were not presented with sufficient clarity in the submitted manuscript. The revised version will incorporate a dedicated subsection detailing the training hyperparameters, dataset splits used, variance across multiple runs (including standard deviations), and any statistical significance testing. This will provide the necessary information to evaluate the robustness of the results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirical

full rationale

The paper describes an empirical contrastive pretraining framework that tokenizes BRep faces/edges with discrete vocabularies plus spatial/semantic descriptors, feeds them to a transformer, and aligns the resulting embeddings to CLIP spaces via a joint contrastive loss. No equations, derivations, or first-principles predictions are presented anywhere in the provided text. All headline claims (40.4 % / 22.0 % / 23.9 % Top-1 retrieval gains, 15 % zero-shot lift) are reported as measured outcomes on held-out datasets; none reduce by construction to a fitted parameter, self-citation chain, or renamed input. The architecture choices are presented as design decisions, not as mathematically forced consequences of prior results by the same authors. This is the normal, non-circular case for a methods-plus-experiments paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical performance of a contrastive objective applied to a transformer over BRep tokens. No free parameters, axioms, or invented entities are described in the abstract; the work inherits standard assumptions from CLIP-style training and transformer architectures.

pith-pipeline@v0.9.1-grok · 5819 in / 1391 out tokens · 23562 ms · 2026-06-28T05:58:38.533338+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Brep boundary and junction detection for cad reverse engineering

    Sk Aziz Ali, Mohammad Sadil Khan, and Didier Stricker. Brep boundary and junction detection for cad reverse engineering. In2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), 2024. 3

  2. [2]

    A multi-modal retrieval augmented framework for user editable 3d cad model generation

    A Ananthakrishnan. A multi-modal retrieval augmented framework for user editable 3d cad model generation. 2025. 3

  3. [3]

    Chang, and Matthias Nießner

    Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, and Matthias Nießner. Scan2CAD: Learning CAD model alignment in RGB-D scans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2614–2623,

  4. [4]

    Development of a pilot manufacturing cyberinfrastructure with an information rich mechanical cad 3d model repository

    Akshay Bharadwaj, Yang Xu, Atin Angrish, Yong Chen, and Binil Starly. Development of a pilot manufacturing cyberinfrastructure with an information rich mechanical cad 3d model repository. InInternational Manufacturing Science and Engineering Conference, 2019. 2, 7

  5. [5]

    Cad: Do computers aid the design process after all?Intersect: The Stanford Journal of Science, Technology and Society, 2:52–66, 2009

    Polly Ann Brown. Cad: Do computers aid the design process after all?Intersect: The Stanford Journal of Science, Technology and Society, 2:52–66, 2009. 1

  6. [6]

    Cadreview: Automatically reviewing cad programs with error detection and correction

    Jiali Chen, Xusen Hei, Hongfei Liu, Yuancheng Wei, Zikun Deng, Jiayuan Xie, Yi Cai, and Li Qing. Cadreview: Automatically reviewing cad programs with error detection and correction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9909–9927, 2025. 3

  7. [7]

    Sculpting holistic 3d representation in contrastive language-image-3d pre-training

    Yipeng Gao, Zeyu Wang, Wei-Shi Zheng, Cihang Xie, and Yuyin Zhou. Sculpting holistic 3d representation in contrastive language-image-3d pre-training. InCVPR, 2024. 6, 7, 8

  8. [8]

    Geometric deep learning for computer-aided design: A survey.IEEE Access, 13:119305–119334, 2024

    Negar Heidari and Alexandros Iosifidis. Geometric deep learning for computer-aided design: A survey.IEEE Access, 13:119305–119334, 2024. 1, 3

  9. [9]

    Uv-net: Learning from boundary representations

    Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph G Lambourne, Karl DD Willis, Thomas Davies, Hooman Shayani, and Nigel Morris. Uv-net: Learning from boundary representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11703–11712, 2021. 3

  10. [10]

    Automate: A dataset and learning approach for automatic mating of cad assemblies

    Benjamin Jones, Dalton Hildreth, Duowen Chen, Ilya Baran, Vladimir G Kim, and Adriana Schulz. Automate: A dataset and learning approach for automatic mating of cad assemblies. ACM Transactions on Graphics (TOG), 2021. 7

  11. [11]

    Jones, Michael Hu, Vladimir G

    Benjamin T. Jones, Michael Hu, Vladimir G. Kim, and Adriana Schulz. Self-supervised representation learning for CAD.arXiv preprint arXiv:2210.10807, 2022. 3

  12. [12]

    Ten cad challenges.IEEE computer graphics and applications, 25:81–92, 03 2005

    David Kasik, William Buxton, and David Ferguson. Ten cad challenges.IEEE computer graphics and applications, 25:81–92, 03 2005. 1 10

  13. [13]

    Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

    Mohammad S Khan, Sankalp Sinha, Talha U Sheikh, Didier Stricker, Sk A Ali, and Muham- mad Z Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024. 3, 8

  14. [14]

    Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention

    Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4713–4722, June 2024. 3

  15. [15]

    Dreamcad: Scaling multi-modal cad generation using differentiable parametric surfaces.Arxiv, 2026

    Mohammad Sadil Khan, Muhammad Usama, Rolandos Alexandros Potamias, Didier Stricker, Muhammad Zeshan Afzal, Jiankang Deng, and Ismail Elezi. Dreamcad: Scaling multi-modal cad generation using differentiable parametric surfaces.Arxiv, 2026. 3, 5, 6, 7

  16. [16]

    BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

    Mingi Kim, Yongjun Kim, Jungwoo Kang, and Hyungki Kim. Brepcoder: A unified multimodal large language model for multi-task b-rep reasoning.arXiv preprint arXiv:2602.22284, 2026. 3

  17. [17]

    cadrille: Multi-modal cad reconstruction with reinforcement learning

    Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad reconstruction with reinforcement learning. InThe Fourteenth International Conference on Learning Representations, 2025. 8

  18. [18]

    Brepnet: A topological message passing system for solid models

    Joseph G Lambourne, Karl DD Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12773–12782, 2021. 1, 3, 5, 6

  19. [19]

    FastCAD: Real-time CAD retrieval and alignment from scans and videos

    Florian Langer, Jihong Ju, Georgi Dikov, Gerhard Reitmayr, and Mohsen Ghafoorian. FastCAD: Real-time CAD retrieval and alignment from scans and videos. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 3

  20. [20]

    Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

    Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18563–18573, 2025. 3

  21. [21]

    Openshape: Scaling up 3d shape representation towards open-world understanding.Advances in neural information processing systems, 36:44860–44879, 2023

    Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, and Hao Su. Openshape: Scaling up 3d shape representation towards open-world understanding.Advances in neural information processing systems, 36:44860–44879, 2023. 2, 3, 6, 7, 8, 9

  22. [22]

    Point2cad: Reverse engineering cad models from 3d point clouds

    Yujia Liu, Anton Obukhov, Jan Dirk Wegner, and Konrad Schindler. Point2cad: Reverse engineering cad models from 3d point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3763–3772, 2024. 3

  23. [23]

    Brep-bert: Pre-training boundary representation bert with sub-graph node contrastive learning

    Yunzhong Lou, Xueyang Li, Haotian Chen, and Xiangdong Zhou. Brep-bert: Pre-training boundary representation bert with sub-graph node contrastive learning. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 1657–1666, 2023. 3

  24. [24]

    Multicad: Contrastive representa- tion learning for multi-modal 3D computer-aided design models

    Weijian Ma, Minyang Xu, Xueyang Li, and Xiangdong Zhou. Multicad: Contrastive representa- tion learning for multi-modal 3D computer-aided design models. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM). ACM,

  25. [25]

    Rethinking network design and local geometry in point cloud: A simple residual mlp framework

    Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. Rethinking network design and local geometry in point cloud: A simple residual mlp framework. InInternational Conference on Learning Representations, 2022. 6, 7, 8

  26. [26]

    Sharp challenge 2023: Solving cad history and parameters recovery from point clouds and 3d scans

    Dimitrios Mallis, Ali Sk Aziz, Elona Dupont, Kseniya Cherenkova, Ahmet Serdar Karadeniz, Mohammad Sadil Khan, Anis Kacem, Gleb Gusev, and Djamila Aouada. Sharp challenge 2023: Solving cad history and parameters recovery from point clouds and 3d scans. overview, datasets, metrics, and baselines. InProceedings of the IEEE/CVF International Conference on Com...

  27. [27]

    Oscar: Open-set cad retrieval from a language prompt and a single image

    Tessa Pulli, Jean-Baptiste Weibel, Peter Hönig, Matthias Hirschmanner, Markus Vincze, and Andreas Holzinger. Oscar: Open-set cad retrieval from a language prompt and a single image. arXiv preprint arXiv:2601.07333, 2026. 3

  28. [28]

    Pointnet: Deep learning on point sets for 3d classification and segmentation

    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017. 2, 6, 7, 8

  29. [29]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 2

  30. [30]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInterna- tional Conference on Machine Learning, 2021. 2, 4

  31. [31]

    Schinko, T

    C. Schinko, T. V osgien, T. Prante, T. Schreck, and T. Ullrich. Search and retrieval in cad databases - a user-centric state-of-the-art overview. InProceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications,

  32. [32]

    12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications : VISAPP 2017, VISIGRAPP ; Conference date: 27-02-2017 Through 01-03-2017. 3

  33. [33]

    Marvel-40m+: Multi-level visual elaboration for high-fidelity text-to-3d content creation

    Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama, Shino Sam, Didier Stricker, Sk Aziz Ali, and Muhammad Zeshan Afzal. Marvel-40m+: Multi-level visual elaboration for high-fidelity text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8105–8116, 2025. 3

  34. [34]

    Balancing speed and executability in interactive text-to-cad code generation for early-stage parametric cad ideation

    Yuhao Sun, Hao Cheng, Shang Zheng, Hualong Yu, and Haitao Zou. Balancing speed and executability in interactive text-to-cad code generation for early-stage parametric cad ideation. Journal of King Saud University Computer and Information Sciences, 2026. 8

  35. [35]

    Nurbgen: High-fidelity text-to-cad generation through llm-driven nurbs modeling

    Muhammad Usama, Mohammad Sadil Khan, Didier Stricker, and Muhammad Zeshan Afzal. Nurbgen: High-fidelity text-to-cad generation through llm-driven nurbs modeling. InProceed- ings of the AAAI Conference on Artificial Intelligence, volume 40, pages 9603–9611, 2026. 3

  36. [36]

    Text-to-cad generation through infusing visual feedback in large language models.arXiv preprint arXiv:2501.19054, 2025

    Ruiyu Wang, Yu Yuan, Shizhao Sun, and Jiang Bian. Text-to-cad generation through infusing visual feedback in large language models.arXiv preprint arXiv:2501.19054, 2025. 8

  37. [37]

    Cad-gpt: Synthesising cad construction sequence with spatial reasoning-enhanced multimodal llms

    Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, and Jie Yang. Cad-gpt: Synthesising cad construction sequence with spatial reasoning-enhanced multimodal llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 7880–7888,

  38. [38]

    Deepcad: A deep generative network for computer- aided design models

    Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer- aided design models. InProceedings of the IEEE/CVF international conference on computer vision, pages 6772–6782, 2021. 3, 8

  39. [39]

    Cad-mllm: Unify- ing multimodality-conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954,

    Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, and Shenghua Gao. Cad-mllm: Unify- ing multimodality-conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954,

  40. [40]

    Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

    Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1179–1189, 2023. 2, 6, 7, 8

  41. [41]

    Ulip-2: Towards scalable multimodal pre-training for 3d understanding

    Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, et al. Ulip-2: Towards scalable multimodal pre-training for 3d understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27091–27101, 2024. 2, 3 12

  42. [42]

    Point-bert: Pre-training 3d point cloud transformers with masked point modeling

    Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19313–19322, 2022. 2, 4, 6, 7, 8

  43. [43]

    Shuming Zhang, Zhidong Guan, Hao Jiang, Tao Ning, Xiaodong Wang, and Pingan Tan. Brep2seq: a dataset and hierarchical deep learning network for reconstruction and generation of computer-aided design models.Journal of Computational Design and Engineering, 11(1):110– 134, 2024. 3

  44. [44]

    Cadparser: a learning approach of sequence modeling for b-rep cad

    Shengdi Zhou, Tianyi Tang, and Bin Zhou. Cadparser: a learning approach of sequence modeling for b-rep cad. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023. 7

  45. [45]

    Bringing attention to cad: Boundary representation learning via transformer.Computer-Aided Design, 189:103940, December 2025

    Qiang Zou and Lizhen Zhu. Bringing attention to cad: Boundary representation learning via transformer.Computer-Aided Design, 189:103940, December 2025. 3 13 Supplementary Material A Dataset Analysis In this section, we provide additional analysis of the datasets used for training and evaluation. Our training data is built from the high-quality ABC subset ...