pith. sign in

arxiv: 2604.10075 · v1 · submitted 2026-04-11 · 💻 cs.AI

Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD

Pith reviewed 2026-05-10 16:14 UTC · model grok-4.3

classification 💻 cs.AI
keywords text-to-CADhierarchical graphgeometric constraintscode generationcurriculum learningassembly modelingCAD design
0
0 comments X

The pith

Predicting a hierarchical graph with geometric constraints first improves text-to-CAD code generation by reducing errors in complex assemblies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that inserting a hierarchical graph as an intermediate step between text and CAD code reduces cascading failures in long sequences of operations. Direct decoding from text to code often produces invalid assemblies because local mistakes compound without an explicit model of parts and their relationships. By first predicting multi-level parts as nodes and geometric constraints as edges, then using that graph to guide action sequencing and code output, the framework achieves tighter matches to the described geometry. A progressive curriculum that builds graded tasks through structural edits further supports training on harder cases.

Core claim

The central claim is that a hierarchical and geometry-aware graph serves as an effective intermediate representation for text-to-CAD translation: nodes represent multi-level parts and components while edges encode explicit geometric constraints, allowing the model to predict structure and constraints from text before conditioning code generation on them, which improves geometric fidelity and constraint satisfaction over direct decoding approaches.

What carries the argument

The hierarchical and geometry-aware graph, with multi-level parts as nodes and geometric constraints as edges, which acts as the bridge that narrows the search space and conditions subsequent code generation.

Load-bearing premise

That the graph structure and constraints can be accurately predicted from text and that this intermediate form captures the dependencies needed to prevent errors from cascading in the final code.

What would settle it

If a baseline that decodes text directly to code matches or exceeds the proposed method on geometric fidelity and constraint satisfaction metrics when both are trained and tested on the same 12K dataset, the value of the graph intermediate would be refuted.

Figures

Figures reproduced from arXiv: 2604.10075 by Gangyu Zhang, Hongyuan Chen, Huiyuan Zhang, Shengjie Gong, Shuangping Huang, Tianshui Chen, Wenjie Peng, Yunqing Hu.

Figure 1
Figure 1. Figure 1: Geometric decomposition graph. (a) Top-down decomposition of a user instruction (microwave [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of Graph-CAD. The framework comprises three sequential stages: Geometry [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The SAPCL mechanism. This mechanism alternates between two core modules: SFT and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results of Graph-CAD and baseline methods on the CADBench. Our method generates [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of ablated variants. For three CADBench prompts, we show results from [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Detailed visualization analysis of the SAPCL mechanism. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: In the Data Generation stage, we designed three distinct prompt sets, one for each stage of the [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of Failure Cases on Highly Complex Geometries. This figure illustrates current limi [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Data annotation pipeline. Our annotation process begins with user instructions sourced from the [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional Qualitative Comparison with Baselines. This figure presents more qualitative examples [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of Progressive Improvement with SAPCL. This figure illustrates the evolution of the [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative Comparison with Sketch-and-Extrude Methods on the DeepCAD Dataset. This figure [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Object-level metrics as a function of the Unique Part Count on CADBench. We report (a) Object [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Columns show objects with 5, 10, 15, 20, 25, and 35 unique parts. For each object, the top row [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative comparison between the three-stage pipeline (Graph-CAD (SFT)) and the unified [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Representative examples of automated annotations and human corrections. Top row: samples [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗
read the original abstract

Text-to-CAD code generation is a long-horizon task that translates textual instructions into long sequences of interdependent operations. Existing methods typically decode text directly into executable code (e.g., bpy) without explicitly modeling assembly hierarchy or geometric constraints, which enlarges the search space, accumulates local errors, and often causes cascading failures in complex assemblies. To address this issue, we propose a hierarchical and geometry-aware graph as an intermediate representation. The graph models multi-level parts and components as nodes and encodes explicit geometric constraints as edges. Instead of mapping text directly to code, our framework first predicts structure and constraints, then conditions action sequencing and code generation, thereby improving geometric fidelity and constraint satisfaction. We further introduce a structure-aware progressive curriculum learning strategy that constructs graded tasks through controlled structural edits, explores the model's capability boundary, and synthesizes boundary examples for iterative training. In addition, we build a 12K dataset with instructions, decomposition graphs, action sequences, and bpy code, together with graph- and constraint-oriented evaluation metrics. Extensive experiments show that our method consistently outperforms existing approaches in both geometric fidelity and accurate satisfaction of geometric constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims that direct text-to-CAD code generation (e.g., bpy sequences) enlarges the search space and causes cascading errors in complex assemblies because it fails to model hierarchy or geometric constraints explicitly. It proposes a hierarchical geometry-aware graph as an intermediate representation (nodes as multi-level parts/components, edges as explicit geometric constraints), with a two-stage process that first predicts the graph from text and then conditions action sequencing and code generation on it. A structure-aware progressive curriculum learning strategy is introduced to synthesize graded tasks via controlled edits, along with a new 12K dataset containing instructions, decomposition graphs, action sequences, and bpy code, plus graph- and constraint-oriented metrics. Experiments reportedly show consistent outperformance over baselines in geometric fidelity and constraint satisfaction.

Significance. If the central claims hold, the work offers a structured way to mitigate error accumulation in long-horizon text-to-code tasks for CAD, which has practical value for automated design and manufacturing. The explicit introduction of a 12K dataset with hierarchical annotations and the curriculum strategy are concrete contributions that could support future research on intermediate representations for structured generation. The geometry-aware graph formulation directly targets a known weakness in prior direct-decoding approaches.

major comments (1)
  1. Abstract and Experiments section: The central claim that the hierarchical graph 'improves geometric fidelity and constraint satisfaction' by avoiding cascading failures is load-bearing, yet no per-stage accuracy for graph prediction (hierarchy or constraint edges) or oracle-graph ablation is reported. Without these, it is impossible to determine whether the intermediate representation reduces error sources or merely relocates them into the graph predictor, undermining the argument that the two-stage approach is superior to direct decoding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the hierarchical geometry-aware graph and the new dataset. We address the major comment below and will revise the manuscript accordingly to strengthen the evaluation of the two-stage approach.

read point-by-point responses
  1. Referee: Abstract and Experiments section: The central claim that the hierarchical graph 'improves geometric fidelity and constraint satisfaction' by avoiding cascading failures is load-bearing, yet no per-stage accuracy for graph prediction (hierarchy or constraint edges) or oracle-graph ablation is reported. Without these, it is impossible to determine whether the intermediate representation reduces error sources or merely relocates them into the graph predictor, undermining the argument that the two-stage approach is superior to direct decoding.

    Authors: We agree that the absence of per-stage accuracies for the graph prediction module (hierarchy and constraint edges) and an oracle-graph ablation limits the ability to isolate the source of improvements. The current manuscript reports only end-to-end metrics on geometric fidelity and constraint satisfaction. In the revised version, we will add: (1) separate accuracy metrics for hierarchy prediction (e.g., node-level decomposition accuracy) and constraint edge prediction (precision/recall/F1 per constraint type); (2) an oracle ablation in which ground-truth graphs are provided to the action sequencing and code generation stage, with direct comparison to both the predicted-graph setting and the direct-decoding baseline. These additions will quantify whether the intermediate representation reduces cascading errors or primarily shifts the modeling burden. revision: yes

Circularity Check

0 steps flagged

No circularity: method introduces independent graph representation, dataset, and curriculum without reducing claims to fitted inputs or self-citations

full rationale

The paper's core proposal is a new hierarchical geometry-aware graph as an explicit intermediate representation between text and CAD code, trained via a structure-aware progressive curriculum on a newly constructed 12K dataset. This chain does not reduce any prediction to its own inputs by construction, nor does it rely on load-bearing self-citations, uniqueness theorems imported from prior author work, or renaming of known results. The outperformance claims rest on empirical comparisons using graph- and constraint-oriented metrics rather than tautological equivalence. No equations or steps in the abstract or described framework exhibit the self-definitional, fitted-input, or ansatz-smuggling patterns; the approach adds novel modeling elements instead of deriving results from pre-existing fitted parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that explicit graph modeling of hierarchy and constraints will reduce search space and errors compared to direct decoding, plus the creation of a new dataset whose quality is not independently verified here.

axioms (1)
  • domain assumption Modeling multi-level parts as nodes and geometric constraints as edges captures the essential structure for CAD assemblies
    Invoked in the proposal of the graph as intermediate representation to address limitations of direct text-to-code methods.
invented entities (1)
  • Hierarchical and geometry-aware graph no independent evidence
    purpose: Intermediate representation to model assembly hierarchy and constraints before code generation
    Newly introduced in this framework as the core innovation.

pith-pipeline@v0.9.0 · 5517 in / 1260 out tokens · 59249 ms · 2026-05-10T16:14:48.245590+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    Discriminator

    for efficient fine-tuning. With a rank of 64, the number of trainable parameters for each of our three models is approximately 174.6 million. This constitutes only 2.13% of the total parameters of the Qwen3-8B backbone (Yang et al., 2025). The total number of trainable parameters across all three models is therefore approximately 524 million, which is sti...

  2. [2]

    OutputJSON only(no extra text, no code fences)

  3. [3]

    Allowed keys:{param}andreasonsonly (no extra/missing keys)

  4. [4]

    Both lists must match the number andorderof criteria

  5. [5]

    Score 1 if the requirement is met or reasonably satisfied, else 0

  6. [6]

    <param>": [0, 0, ...],

    Each reason must be short, factual, and tied to visible evidence. Do Not Penalize • Primitive simplifications (e.g., boxy panels, cylindrical handles), generally low detail 32 Published as a conference paper at ICLR 2026 • Minor camera clipping/aliasing Criteria With Absolute Units (inch/cm/mm) • Donotcheck absolute values; evaluaterelative proportionsonl...

  7. [7]

    Use theexactkeys and structure shown in the schema

  8. [8]

    derived": {

    Ensurelevelanddelta strengthmatch the Generation Controls. 4.change opsitems must align withallowed ops; keep count withinmax changes(soft cap). Output Schema (exact) { "derived": { "category": "<string>", "user_prompt": "<one paragraph natural language>", "level": 1, "delta_strength": 2, "change_ops": [ { "type": "...", "target": "...", "from": "...", "t...

  9. [9]

    Emitexactly two blocksin order: (1) MATERIAL LIBRARY , (2) Decomposition Graph

  10. [10]

    Composite of<children>; brief assembly phrase

    Outputonlythese two blocks (no extra text). Units • All linear dimensions inmetres (m). Decomposition & Graph Rules • Recursively decompose until leaves are single primitives or basic boolean/auto connect. • Record build orderonlyon parent:assembly order=[group1],[group2],... •No cycles: do not form loops withparent/after/depends on. Block Formats • MATER...

  11. [11]

    Convert given units to metres

  12. [12]

    If partial/none: infer reasonable metre values. Orientation & Rotation 36 Published as a conference paper at ICLR 2026 • Primitives born in native pose (local +Z up).orientation=remaps local +Z: orientation = axis:+X / +Y / -Z orientation = axis:radial_from <obj> | axis:tangent_to <obj> orientation = +X_face:normal_to <obj> | +Z_face:align <other>.+Z_face...

  13. [13]

    •Input (each run):MATERIAL LIBRARY block + multi-layer knowledge graph (FORMAT v4; includesorientation=andoffset(dx,dy,dz)in metres;norepeat=shorthand)

    Decomposition Graph E.4 PROMPT FORACTIONPLANNING Role & I/O • System role: CAD Action build-script generator; output strictly in the specified format. •Input (each run):MATERIAL LIBRARY block + multi-layer knowledge graph (FORMAT v4; includesorientation=andoffset(dx,dy,dz)in metres;norepeat=shorthand). •Output (each run):one plain-text Action script withe...

  14. [14]

    Delete all existing objects (clean scene)

  15. [15]

    BLOCK 1 — Materials • For each material:Define material <mat name>; diffuse color = (R,G,B,A)

    Set length unit tometres. BLOCK 1 — Materials • For each material:Define material <mat name>; diffuse color = (R,G,B,A). BLOCK 2 — Stage-by-Stage Operations • Follow each parent’sassembly order, group by group. • Insert a heading per group:--- SECTION <n> { <summary> --- Command Rules (STRICT)

  16. [16]

    2.Orientation before placement— use the exact sequence for each node: 37 Published as a conference paper at ICLR 2026 (a) Create primitive and name it<id>

    Name every new object in its creation sentence. 2.Orientation before placement— use the exact sequence for each node: 37 Published as a conference paper at ICLR 2026 (a) Create primitive and name it<id>. (b) Rotate<id>so local +Z satisfiesorientation=. (c) Anchor/Align<id>to reference features. (d) Then applyoffset/polar/connect. 3.Iterative patterns(when...

  17. [17]

    After core steps, write additional single-line actions as needed: Boolean-union/subtract, Bevel, Auto-connect, Snap/Align, Validate

  18. [18]

    Assembly guideline satisfied

    If a parent specifies a guideline: quote, validate, end with “Assembly guideline satisfied.”

  19. [19]

    Stage<n>complete

    Close each section with “Stage<n>complete.” End with “All stages complete.” Placement & Assembly • Prefer assembly placement; use independent worldpos/orientationonly when necessary. •Alignbefore final placement: Align(<axes>) <this>.<feature> to <target> <axes> in {X,Y,Z}; <target> in {B.<feature> | B[*].<feature> | B[k].<feature> | Avg(...)} • Thenoffse...

  20. [20]

    Helper functions: make material,boolean subtract, boolean union, add bevel, orient helpers

  21. [21]

    Materials from BLOCK 1

  22. [22]

    Rotate so its

    Geometry by sections from BLOCK 2. Sentence→Action (minimal mapping) • Create primitive (cyl/disc/cube/cone/sphere/hemisphere)→add primitive,orient, place. • Bevel/Chamfer→add bevel(target, radius, segments). • Boolean-subtract/union→boolean subtract/boolean union. • Cut/hole/drill/slot→build cutter + Boolean. • Automatically connect / Connect A.f + B.f→c...