Sketch2Arti: Sketch-based Articulation Modeling of CAD Objects
Pith reviewed 2026-05-07 16:44 UTC · model grok-4.3
The pith
Users can articulate 3D CAD models by drawing simple 2D sketches from one viewpoint, which the system turns into movable parts and motion parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sketch2Arti automatically discovers movable parts and predicts their motion parameters from user-provided 2D sketches drawn on a chosen viewpoint of a CAD model, trained in a category-agnostic manner without explicit 3D supervision, while also enabling controllable internal completion for shell models.
What carries the argument
A learned mapping from 2D sketches to 3D part segmentation and motion parameters, with an internal completion component guided by predicted motion constraints.
Load-bearing premise
That sketches drawn from a single viewpoint encode enough information for a model to correctly identify movable parts and compute accurate 3D motion parameters without category labels or direct 3D training data.
What would settle it
Provide sketches for a set of unseen CAD objects and check whether the output parts and motion parameters match the intended articulations, or produce inconsistent internal completions.
Figures
read the original abstract
Articulation modeling aims to infer movable parts and their motion parameters for a 3D object, enabling interactive animation, simulation, and shape editing. In this paper, we present Sketch2Arti, the first sketch-based articulation modeling system for CAD objects. Our key observation is that designers naturally communicate articulation intent through lightweight sketches (e.g., arrows and strokes) that indicate how parts should move, yet translating such sketches into articulated 3D models remains largely manual. Sketch2Arti bridges this gap by enabling users to specify articulation through simple 2D sketches drawn from a chosen viewpoint. Given a CAD model and user sketches, our approach automatically discovers the corresponding movable parts and predicts their motion parameters, allowing iterative modeling of multiple articulations on complex objects with fine-grained control. Importantly, Sketch2Arti is trained in a category-agnostic manner without requiring object category information, leading to strong generalization to diverse objects beyond existing articulation datasets. Moreover, for shell models lacking interior structures, Sketch2Arti supports controllable internal completion guided by user sketches, generating plausible internal components consistent with the existing geometry and predicted motion constraints. Comprehensive experiments and user evaluations demonstrate the effectiveness, controllability, and generalization of Sketch2Arti. The code, dataset, and the prototype system are at https://arlo-yang.github.io/Sketch2Arti.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Sketch2Arti, the first sketch-based articulation modeling system for CAD objects. Given a CAD model and lightweight 2D user sketches (arrows/strokes) drawn from a chosen viewpoint, the method automatically discovers movable parts, predicts their 3D motion parameters, supports iterative multi-articulation modeling, and performs controllable internal geometry completion on shell models. The system is trained category-agnostically without object category labels or explicit 3D supervision and claims strong generalization beyond existing datasets, validated via comprehensive experiments and user evaluations. Code, dataset, and a prototype are released.
Significance. If the central claims hold, this work would be a meaningful advance for interactive CAD modeling, animation, and shape editing by replacing manual articulation specification with intuitive sketch input. The category-agnostic training and internal-completion capability on shell models are notable strengths, as is the public release of code, data, and prototype, which supports reproducibility and follow-on research.
major comments (3)
- [§3 and §4.1] §3 (Method overview) and §4.1 (network architecture): the single-view sketch-to-3D motion regression is fundamentally underconstrained (an arrow can map to rotation about multiple axes, translation, or scaling, with depth ambiguity). The manuscript must explicitly describe the architectural or loss-function mechanisms (e.g., multi-view consistency, physical priors, or uncertainty modeling) that allow reliable disambiguation without category priors or 3D supervision; absent these details the generalization claim rests on unverified empirical behavior.
- [§5.2 and Table 3] §5.2 (generalization experiments) and Table 3: the reported cross-category and novel-articulation results must include quantitative metrics (e.g., part-IoU, motion-parameter error, success rate under viewpoint variation) together with failure-case analysis. If performance degrades sharply on sketches drawn from unseen viewpoints or on articulation types absent from training, both the “strong generalization” and “controllable internal completion” claims are undermined.
- [§4.3] §4.3 (internal completion module): the controllable completion for shell models is presented as guided by user sketches and predicted motion constraints, yet no ablation isolates the contribution of the sketch guidance versus the motion constraints. A controlled experiment removing sketch input or motion constraints is required to substantiate the controllability claim.
minor comments (3)
- The abstract states “comprehensive experiments and user evaluations” but does not highlight any key quantitative numbers; adding one or two headline metrics would improve readability.
- [§3 and §5] Notation for motion parameters (axis, angle, translation vector) should be introduced once in §3 and used consistently; occasional informal descriptions in §5 make cross-referencing harder.
- [Figure 4] Figure 4 (qualitative results) would benefit from an additional column showing the input sketch overlaid on the rendered view to make the correspondence between sketch and output explicit.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the changes planned for the revised manuscript.
read point-by-point responses
-
Referee: [§3 and §4.1] §3 (Method overview) and §4.1 (network architecture): the single-view sketch-to-3D motion regression is fundamentally underconstrained (an arrow can map to rotation about multiple axes, translation, or scaling, with depth ambiguity). The manuscript must explicitly describe the architectural or loss-function mechanisms (e.g., multi-view consistency, physical priors, or uncertainty modeling) that allow reliable disambiguation without category priors or 3D supervision; absent these details the generalization claim rests on unverified empirical behavior.
Authors: We agree that single-view sketch-to-3D motion regression is underconstrained in principle. The current manuscript describes the overall pipeline and network architecture in §3 and §4.1 but does not provide a dedicated, explicit account of the mechanisms that resolve ambiguities without category priors or 3D supervision. In the revision we will expand §4.1 with a new paragraph that details the architectural integration of sketch and CAD geometry features and the training procedure used to achieve disambiguation. This addition will make the technical basis for the reported generalization explicit. revision: yes
-
Referee: [§5.2 and Table 3] §5.2 (generalization experiments) and Table 3: the reported cross-category and novel-articulation results must include quantitative metrics (e.g., part-IoU, motion-parameter error, success rate under viewpoint variation) together with failure-case analysis. If performance degrades sharply on sketches drawn from unseen viewpoints or on articulation types absent from training, both the “strong generalization” and “controllable internal completion” claims are undermined.
Authors: We will strengthen the generalization section by updating §5.2 and Table 3 to include the requested quantitative metrics (part-IoU, motion-parameter error, and success rates under viewpoint variation) together with a dedicated failure-case analysis. These additions will be incorporated in the revised manuscript. revision: yes
-
Referee: [§4.3] §4.3 (internal completion module): the controllable completion for shell models is presented as guided by user sketches and predicted motion constraints, yet no ablation isolates the contribution of the sketch guidance versus the motion constraints. A controlled experiment removing sketch input or motion constraints is required to substantiate the controllability claim.
Authors: We agree that an ablation isolating the contributions of sketch guidance and motion constraints is needed to substantiate controllability. In the revised manuscript we will add a controlled ablation experiment to §4.3 that evaluates the full internal-completion module against variants that remove sketch input or motion constraints, reporting both quantitative and qualitative results. revision: yes
Circularity Check
No circularity: trained model claims rest on external data and evaluations
full rationale
The paper describes a category-agnostic neural network trained to map 2D sketches to 3D articulation parameters and internal completions. No equations, derivations, or first-principles results are presented that reduce outputs to inputs by construction. Generalization and controllability claims are supported by experiments, user studies, and a released dataset/code, which are independent of any self-referential fitting. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the provided text.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption User sketches drawn from a chosen viewpoint accurately communicate intended 3D part motion.
- domain assumption A single model trained without category labels can generalize to diverse unseen CAD objects.
Reference graph
Works this paper leans on
-
[1]
A3vlm: Actionable articulation-aware vision language model.arXiv preprint arXiv:2406.07549, 2024
Learning to predict part mobility from a single static snapshot.ACM Transac- tions On Graphics (TOG)36, 6 (2017), 1–13. Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, and Hongsheng Li. 2024. A3vlm: Actionable articulation-aware vision language model.arXiv preprint arXiv:2406.07549(2024). Zhenyu Jiang, Cheng-Chun...
-
[2]
Real2code: Recon- struct articulated objects via code generation,
Nap: Neural 3d articulated object prior.Advances in Neural Information Processing Systems36 (2023), 31878–31894. Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J Mitra. 2020. Sketch2cad: Sequential cad modeling by sketching in context.ACM Transactions on Graphics (TOG)39, 6 (2020), 1–14. Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J Mitra. 2022. ...
-
[3]
Hao Pan, Yang Liu, Alla Sheffer, Nicholas Vining, Chang-Jian Li, and Wenping Wang
Illustrating how mechanical assemblies work.ACM Transactions on Graphics- TOG29, 4 (2010), 58. Hao Pan, Yang Liu, Alla Sheffer, Nicholas Vining, Chang-Jian Li, and Wenping Wang
2010
-
[4]
Flow aligned surfacing of curve networks.ACM Transactions on Graphics (TOG)34, 4 (2015), 1–10. Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988(2022). Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data cards: Pur- poseful and transparent dataset ...
-
[5]
Partnext: A next-generation dataset for fine-grained and hierarchical 3d part understanding
PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding.arXiv preprint arXiv:2510.20155(2025). Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, and Kai Xu. 2019. Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Patt...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.